Processes | Free Full-Text | Using a Machine Learning Regression Approach to Predict the Aroma Partitioning in Dairy Matrices
Aroma partitioning in food is a challenging area of research due to the contribution of several physical and chemical factors that affect the binding and release of aroma in food matrices. The partition coefficient measured by the Kmg value refers to the partition coefficient that describes how aroma compounds distribute themselves between matrices and a gas phase, such as between different components of a food matrix and air. This study introduces a regression approach to predict the Kmg value of aroma compounds of a wide range of physicochemical properties in dairy matrices representing products of different compositions and/or processing. The approach consists of data cleaning, grouping based on the temperature of Kmg analysis, pre-processing (log transformation and normalization), and, finally, the development and evaluation of prediction models with regression methods. We compared regression analysis with linear regression (LR) to five machine-learning-based regression algorithms: Random Forest Regressor (RFR), Gradient Boosting Regression (GBR), Extreme Gradient Boosting (XGBoost, XGB), Support Vector Regression (SVR), and Artificial Neural Network Regression (NNR). Explainable AI (XAI) was used to calculate feature importance and therefore identify the features that mainly contribute to the prediction. The top three features that were identified are log P, specific gravity, and molecular weight. For the prediction of the Kmg in dairy matrices, R2 scores of up to 0.99 were reached. For 37.0 °C, which resembles the temperature of the mouth, RFR delivered the best results, and, at lower temperatures of 7.0 °C, typical for a household fridge, XGB performed best. The results from the models work as a proof of concept and show the applicability of a data-driven approach with machine learning to predict the Kmg value of aroma compounds in different dairy matrices.