Edible oils are commonly used in home cooking and industrial food manufacturing worldwide but are susceptible to adulteration. Raman spectroscopy has been introduced as a rapid detection method for oil adulteration. However, the analysis of spectra is mentally labor-intensive and time-consuming due to complicated data analysis. Machine learning has shown great advantages in data analysis and brought about breakthroughs in processing spectra and images. Hence, our central hypothesis is that integrating ML into Raman spectroscopy will significantly increase the accuracy of data analysis and therefore its power in detecting type and adulteration of edible oils.
Fifteen common edible oils with 8 kinds of plant sources were purchased from supermarkets for the classification study. Avocado oil adulterated by canola oil at different levels, and olive oil by soybean oil, were prepared for the adulteration study. All the oils were subjected to collect Raman spectra. Then half of the spectra (total 357) were applied for learning procedure while the rest were implemented for the accuracy test. Ten machine learning approaches including Logistic Regression, L2 Penalty, Elast Net Penalty, Random Forest, Boosting, 2D-CNN were implemented for the classification of these spectra.
The Random Forest method was found having the highest and fastest test accuracy in the classification of various pure edible oils (98.1% in 0.8 s), and in the classification of oil adulteration (94.6% in 0.7 s), followed by Bosting and L2 Penalty. This study demonstrated machine-learning-driven Raman spectroscopy significantly increased the accuracy and speed in detecting type and adulteration of edible oils.