[Image above] Finding optimal glass compositions through trial-and-error experimental approaches takes a long time. Machine learning algorithms can accelerate the process. Credit: Dick Thompson, Flickr (CC BY-SA 2.0)
For glass scientists, the periodic table is their oyster—virtually all elements turn into a glass if quenched fast enough. Yet with so many options, finding “pearls” (optimal glass compositions) among the essentially limitless possibilities is extremely difficult when relying on trial-and-error experimental approaches.
To accelerate discovery of optimal glass compositions, scientists sometimes use physics-based modeling.
Physics-based models predict a range of promising compositions by using our physical and chemical understanding of glasses. Topological constraint theory and molecular dynamics simulations—two examples of physics-based models—effectively reduce the number of trial-and-error experiments needed to reach a worthwhile composition.
But these models face limits. Topological constraint theory, which predicts glass properties based on atomic structure, cannot easily predict certain properties (e.g., fracture toughness) due to the complex, disordered nature of glass. On the other hand, molecular dynamics simulations, which predict the motion of atoms in a glass, require large computing costs to estimate a small number of atoms.
Beside physics-based modeling, an alternative way to predict glass compositions is data-driven modeling.
Data-driven models are based purely on analysis of existing data, with no understanding of the underlying physics governing a given glass behavior. Machine learning, an example of data-driven modeling, “learns by example” through identifying patterns in datasets.
Numerous recent studies use machine learning to predict glass composition. What these studies collectively teach us, and what they offer to future research, is the subject of a new open-access review paper by researchers at the University of California, Los Angeles.
Main points of the review—by assistant professor of civil and environmental engineering Mathieu Bauchy and his students Xinyi Xu, Kai Yang, Zipeng Fu, and Han Liu—are highlighted in the sections below.
Machine learning techniques: Regression, classification, and clustering
Machine learning algorithms accomplish two types of tasks: either supervised (dataset includes both inputs and outputs) or unsupervised (dataset includes only inputs). Supervised machine learning predicts outputs as a function of inputs, whereas unsupervised machine learning identifies clusters within existing data that share similar characteristics.
Many supervised machine learning techniques are based on regression. Regression functionally relates inputs and outputs by fitting known data points in the dataset. Regression generally falls into two categories:
- Parametric regression, which yields an analytical formula expressing the output in terms of input variables.
- Nonparametric regression, which calculates the output for a given input based on correlation between that input position and its surrounding known points.
Regression is used for problems with numerical (continuous) outputs. For problems with discrete (categorical) outputs—such as “glass is transparent or not transparent”—classification is used.
Examples of algorithms that can be used for both regression and classification problems in supervised machine learning include:
- Artificial neural network (ANN), a complex, nonlinear functional algorithm mapping the relationship between inputs and outputs;
- Support vector machine (SVM), an algorithm that relies on a functional formula representing data divided into different classes in classification problems;
- Decision tree, an algorithm based on an ensemble of several parallel tree paths made of sequentially splitting nodes; and
- Boosting method, an algorithm whose output is based on the weighted average of all outputs from an ensemble of sequentially added weak learners/classifiers (e.g., decision tree, SVM, or other classifiers).
In contrast, unsupervised machine learning commonly uses a clustering method within datasets. In this case, clusters are identified based on distances between data points within the inputs space—no examples of previously identified clusters are needed to train the model.
Current applications and physics-informed models
The first use of machine learning in the context of glass science was likely by Brauer et al. in 2007, on the solubility of glasses in the system P2O5–CaO–MgO–Na2O–TiO2. Since then, most researchers have investigated developing composition-property regression models, specifically ANN algorithms. One particular ANN study highlighted in the review paper—on predicting glass transition temperatures—was the focus of a CTT post last November.
Despite the focus on machine learning in recent literature, Bauchy and his students note it is not the answer for all situations. “Although ‘blind machine learning’ and artificial neural network can offer reliable predictions, this approach requires the existence of a large amount of data—which is not always available,” they write in the paper.
One way to address this limitation is “physics-informed machine learning.”
Physics-informed machine learning combines physics-based and data-driven modeling in one model. In an open-access paper published last month, Bauchy and his students collaborated with researchers from Aalborg University in Denmark, Pacific Northwest National Laboratory, and CEA Marcoule in France to develop a physics-informed machine learning model that relies on three things:
- A simple, analytical model formulation that offers good interpretability;
- Linearizing the relationship between inputs and output based on physical and chemical understanding of the predicted property; and
- Identifying relevant reduced-dimensionality descriptors that capture the atomic structure of glass.
In the open-access paper, the researchers compared the physics-informed model to a blind machine learning algorithm. The review summarizes the impact of the open-access paper’s results: “Overall, this work suggests that embedding some physical knowledge within machine learning offers a promising route to overcome the tradeoff between accuracy, simplicity, and interpretability … which are otherwise often mutually exclusive in traditional, blind machine learning models.”
Future directions: MD simulations and collaboration
In general, extensive experimental datasets are not always available to conduct machine learning. So high-throughput molecular dynamics simulations—the physics-based modeling example mentioned earlier—offer a way to build large glass property datasets that can serve as a training set for machine learning algorithms. Yang et al. investigated this approach in an open-access paper published in Scientific Reports this June.
The review study researchers also emphasize the importance of collaboration among research groups focusing on experiments, theory, simulations, and data analytics to further the use of machine learning in glass science.
“[S]uccessful future applications of machine learning modeling are likely to require closed-loop integrated approaches, wherein (i) experimental or simulation data are used to train machine learning models, (ii) machine learning models are used to pinpoint promising glass compositions, (iii) experiments are conducted to validate these predictions or refine the data-driven models,” they write.
Bauchy says in an email that his team plans to continue blurring the boundary between physics-based and data-driven models. “Our rationale is that we are living in the Information Age: a perfect model should ideally leverage all the available information, that is, it should build both on existing experimental data and on our understanding of the physics of glass,” he says. “We are presently working on applying this paradigm to predict the mechanical properties of glasses.”
The open-access paper, published in Journal of Non-Crystalline Solids: X, is “Machine learning for glass science and engineering: A review” (DOI: 10.1016/j.nocx.2019.100036).