Materials alternative recommender using machine learning based on COSMO-SAC

Document Type : Research Article

Authors

1 Al-Zahrawi University College, Karbala, Iraq

2 Department of Pharmacy, Al-Noor University College, Nineveh, Iraq

3 Collage of Dentist, National University of Science and Technology, Dhi Qar, 64001, Iraq

4 Medical technical college, Al-Farahidi University, Iraq

5 Physics Department, College of Science, University of Halabja, 46018, Halabja, Iraq

Abstract
Finding alternative materials and solvents in a chemistry lab or the process of designing would be a time-consuming matter. The activity coefficient is one of the most important thermodynamic properties that could be used for this purpose. COSMO-SAC modeling is a reliable method to determine the activity coefficient of the mixtures and is used to find alternatives to the organic materials in the present study. A dataset of 96 organic molecules’ activity coefficients in the different solvents (water, ethanol, methanol, toluene, and benzene) mixtures have been obtained in full range composition with COSMO-SAC. The created database has been merged with the FreeSolv dataset to extend the diversity of the properties to enrich the dataset for machine learning training. Unsupervised machine learning methods (clustering) including centroid-based and density-based clustering methods have been conducted to introduce the best alternatives for the studied 96 organic materials. Proper pre-processing for these methods has been utilized to evaluate the optimum parameters of the clustering methods including the elbow method for centroid-based clustering and k-nearest neighbors for the density-based clustering. The centroid-based clustering methods recommend a different variety of materials based on the cluster numbers and sorting the alternatives based on the nearest properties. However, the density-based method works with the optimum distance and the number of the k-nearest neighbors that were 0.08 and 7, respectively for the created dataset. Its results are exclusive and show that the clustering could be used to isolate the clusters based on the chemical families which were 5 clusters and 12 out layers. The out layers are important since no alternatives have been introduced for them in the trained dataset and should be considered as unique materials. The density-based clustering results were more promising using COSMO-SAC data for organic materials alternative recommender.

Graphical Abstract

Materials alternative recommender using machine learning based on COSMO-SAC

Keywords

Subjects


[1].         J.F. Jenck,., F, Agterberg, . & M.J. Droescher, . Products and processes for a sustainable chemical industry. a review of achievements and prospects. Green Chem. 6, (2004) 544–556.
[2].         R. Xiong., I.S. Sandler,. & R.I. Burnett,. An Improvement to COSMO-SAC for Predicting Thermodynamic Properties. Ind. Eng. Chem. Res. 53 (2014), 8265–8278.
[3].         S. Wang., S.I. Sandler,. & C. Chen, C. Refinement of COSMO−SAC and the Applications.  Ind. Eng. Chem. Res. 46, (2007) 7275–7288.
[4].         R. Fingerhut,. et al. Comprehensive Assessment of COSMO-SAC Models for Predictions of Fluid-Phase Equilibria. Ind. Eng. Chem. Res. 56, (2017.) 9868–9884.
[5]          K. El Bouchefry,. &R.S. de Souza, . Chapter 12 - Learning in Big Data: Introduction to Machine Learning. in Knowledge Discovery in Big Data from Astronomy and Earth Observation (eds. Škoda, P. & Adam, F.) (2020) 225–249.
[6].         M. Meuwly,. Machine Learning for Chemical Reactions.. Chem. Rev. 121, (2021)10218–10239 .
[7].         R. Ahuja., A. Chug., S. Gupta., P. Ahuja,. &S.  Kohli,. Classification and Clustering Algorithms of Machine Learning with their Applications. in Nature-Inspired Computation in Data Mining and Machine Learning (eds. Yang, X.-S. & He, X.-S.) (2020) 225–248 (Springer International Publishing,
[8].         M. Serra-Burriel,. & C. Ames,. Machine Learning-Based Clustering Analysis: Foundational Concepts, Methods, and Applications. in Machine Learning in Clinical Neuroscience (eds. Staartjes, V. E., Regli, L. & Serra, C.) (2022) 91–100 (Springer International Publishing.  
[9].         D.L. Mobley, & J.P . Guthrie, J. P. FreeSolv: a database of experimental and calculated hydration free energies, with input files. J. Comput. Aided Mol. Des. 28, (2014) 711–720.
[10].       I. Bell,. H. et al. A Benchmark Open-Source Implementation of COSMO-SAC. J. Chem. Theory Comput. 16,  (2020) 2635–2646.
[11].       T.C. Liu,. & S.T. Lin. A new approach for developing exact local composition models for lattice fluids. J. Taiwan Inst. Chem. Eng. 96, (2019) 63–73.
[12].       S. Balchandani,. & R. Singh,. Thermodynamic analysis using COSMO-RS studies of reversible ionic liquid 3-aminopropyl triethoxysilane blended with amine activators for CO2 absorption. J. Mol. Liq. (2021) 324, 114713.
[13].       W. Hu. et al. Solubility of benorilate in twelve monosolvents: Determination, correlation and COSMO-RS analysis. J. Chem. Thermodyn. 152,( 2021) 106272.
[14].       M. R. Shah,. & G.D. Yadav,. Prediction of Liquid–Liquid Equilibria for Biofuel Applications by Quantum Chemical Calculations Using the Cosmo-SAC Method. Ind. Eng. Chem. Res. 50, (2011) 13066–13075.
[15].       D.M. Saputra., D. Saputra,. & L.D. Oswari,. Effect of Distance Metrics in Determining K-Value in K-Means Clustering Using Elbow and Silhouette Method.2020. in 341–346.
[16].       K. Matsuo, K. Mitsugi., A. Toyama., E. Kulla,  & L.A. Barolli, Simulation System for Optimal Positions of MOAP Robots Using Elbow and Silhouette Theories: Simulation Results Considering Minimum Transmission Power of MOAP Robots.. in Advances on Broad-Band Wireless Computing, Communication and Applications (ed. Barolli, L.) (2022) 321–332.
[17].       S. Bhardwaj., A. Pandey,  & S. Dahiya. Review based on Variations of DBSCAN algorithms.2022.  in 2022 6th International Conference on Intelligent Computing and Control Systems (ICICCS) 733–739 .
[18].       COSMO-SAC. (National Institute of Standards and Technology, 2022).
[19].       D. Marutho., S. Hendra Handaka,., E. Wijaya, , & Muljono. The Determination of Cluster Number at k-Mean Using Elbow Method and Purity Evaluation on Headline News..  in 2018 International Seminar on Application for Technology of Information and Communication (2018) 533–538.
[20].       H.B.. Zhou, & J.T. Gao. Automatic Method for Determining Cluster Number Based on Silhouette Coefficient. Adv. Mater. Res. 951, (2014) 227–230.
[21]. Kareem, R.O., Kebiroglu, H. and Hamad, O.A., Investigation of Electronic and Spectroscopic Properties of Phosphosilicate Glass Molecule (BioGlass 45S5) and Ti-BioGlass 45S5 by Quantum Programming. Journal of Chemistry Letters, 4(4) ( 2024) 200-210.
[22]. Hamad, O., Kareem, R.O. And Kaygili, O.. Density Function Theory Study Of The Physicochemical Characteristics Of 2-Nitrophenol. Journal Of Physical Chemistry And Functional Materials, 6(1), (2023) 70-76.
[23].       A. Sharma. & A. Sharma,. KNN-DBSCAN: Using k-nearest neighbor information for parameter-free density based clustering. in 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT) (2017) 787–792 .
[24].       A. Likas., N. Vlassis,. & J. Verbeek,. The global k-means clustering algorithm.2003. Pattern Recognit. 36, 451–461.
[25] Hamad, O.A., Kareem, R.O. and Omer, P.K.,. Properties, Characterization, and Application of Phthalocyanine and Metal Phthalocyanine. Journal of Chemical Reviews, 6(1).
[26].       B. Lorbeer,. et al. Variations on the Clustering Algorithm BIRCH.. Big Data Res. (2018) 11, 44–53.
[27].       K. Khan., S,U. Rehman., K. Aziz., S. Fong. & S. Sarasvady. DBSCAN: Past, present and future. in The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014) 232–238.
[28]. I.S. Hasan, A.A. Majhoo, M.H. Sami. and A.K.O. Aldulaimi., Predicting Hydration Enthalpy of Low Molecular Weight Organic Molecules using COSMO-SAC Modeling. Chemical Review and Letters, 6(1), (2022).86-94.
[29]. B.M. Ali, and M. Akkaş. Assessing the Impact of Data Sciences and Smart Technologies in Air Conditioning Project Management: A Delphi Method Analysis within the Construction Industry. Buildings, 13(10), (2023).2581.
Volume 7, Issue 2 - Serial Number 2
March and April 2024
Pages 346-358

  • Receive Date 05 March 2024
  • Revise Date 29 March 2024
  • Accept Date 02 April 2024