Use of TC-Python in Workflows for Machine Learning
Automate data processing workflows and integrate predictive capabilities into broader ML frameworks
Using Python™ as a ‘binder’ language coupled with the TC-Python API, Thermo-Calc offers a fully customizable foundation for developing workflows that automate data processing workflows (such as cleansing or generating data), and integrating predictive capabilities into broader ML frameworks.
Here are some ways TC-Python and ML can work together:
Automating High-Throughput Calculations
TC-Python enables automated, high-throughput calculations across many compositions and conditions. This automation is essential for generating diverse datasets and covering broad compositional and processing spaces. For instance, scripting with TC-Python allows researchers to simulate phase stability efficiently, saving data automatically for ML training.
Formatting Data
Typical formats for storing training data for ML include structured formats like CSV, Excel spreadsheets, and HDF5 (Hierarchical Data Format). For larger datasets, particularly when handling multidimensional arrays or large-scale simulations, Parquet or SQL databases may also be used. These formats are chosen for their compatibility with ML libraries such as TensorFlow or PyTorch and ease of access in data processing tools like Pandas.
TC-Python can export results directly into CSV or Excel for straightforward integration with ML algorithms and for complex datasets or larger-scale projects, TC-Python can store data in HDF5 or SQL databases, preserving data structure and facilitating fast read/write speeds. Additionally, TC-Python’s integration with Python libraries like Pandas allows for direct manipulation of data, enabling preprocessing (e.g., feature extraction, scaling) before final storage. This versatility makes TC-Python a powerful tool for generating, structuring, and exporting high-quality datasets tailored for ML applications in materials science.
Integrating into ICME Frameworks
In ICME frameworks, TC-Python can also be used to link Thermo-Calc with ML and other simulation tools, enabling end-to-end simulations from processing to properties, enhancing understanding of material behavior.