MLFlow Implementation
Overview
This document provides a comprehensive analysis and implementation guide for MLflow Tracking in the PTIIKInsight Topic Modeling project. The report compares various machine learning experiment tracking tools and demonstrates the practical implementation of MLflow for tracking parameters, metrics, and artifacts in our BERTopic-based topic modeling experiments.
1. Objectives
The primary goal of this implementation is to conduct observation and comparison of several machine learning experiment tracking tools, followed by implementing MLflow Tracking in the PTIIKInsight Topic Modeling training process to record parameters, metrics, and artifacts from our experiments.
2. Survey and Observation Results
2.1 Neptune.ai
Neptune.ai is an experiment tracking tool focused on collaboration and scalability.
Advantages:
Capable of recording various types of experiment metadata, including source code, Jupyter notebook snapshots, and Git information
Easy-to-navigate and customizable UI, allowing users to compare over 100,000 runs with millions of data points
Forking feature allows continuing runs from saved checkpoints and creating new runs from saved steps
Seamless integration with popular ML frameworks
Disadvantages:
Focuses on experiment tracking and model management; requires managing your own infrastructure
Free tier limited to one project and up to 3 users
2.2 Weights & Biases (W&B)
W&B is an MLOps platform providing experiment tracking, dataset versioning, and model management.
Advantages:
Supports tracking various experiment metadata and enables interactive experiment comparison
Provides interactive dashboards and reports for team collaboration
Built-in support for hyperparameter search and model optimization with W&B Sweeps
Integration with various ML/DL frameworks and cloud platforms
Disadvantages:
User management and administration can become complex
Collaboration features require paid tiers
2.3 Comet ML
Comet is a cloud-based MLOps platform that helps data scientists track experiments, manage model lifecycle, and collaborate on ML projects.
Advantages:
Provides clean UI for visualization and analysis of experiment metadata
Tracking client captures comprehensive information about experiment run environments
Includes specialized components for working with LLMs
Disadvantages:
Team collaboration only available in paid packages
UI tends to become slow for large numbers of experiments
Experiment tracking functionality difficult to use independently due to deep platform integration
2.4 Aim
Aim is an open-source experiment tracking tool offering extensive dashboards and plots with multiple run comparisons.
Advantages:
Can be run directly from Jupyter notebooks
Integration with spaCy and support for most deep learning frameworks
Attractive UI that can also be used with MLflow tracking server as backend
Disadvantages:
Aim's future is unclear after the company behind it announced AimOS as its successor
Does not support scikit-learn
No managed offering, and self-hosting requires significant effort
2.5 MLflow
MLflow is an open-source platform that helps manage the entire machine learning lifecycle, including experiments, model storage, reproducibility, and deployment.
Advantages:
Focuses on the entire machine learning process lifecycle
Strong and large user community providing community support
Open interface that can be integrated with any ML library or language
Disadvantages:
Must be self-hosted (although there are managed offerings by Databricks), involving significant overhead
Security and compliance measures must be implemented independently
Lacks user and group management and collaborative features
2.6 DVC Experiments and DVC Studio
DVC Experiments is the experiment tracking component of the open-source Data Version Control (DVC) tool family.
Advantages:
Strong Git-based approach suitable for teams with strong software engineering backgrounds
DVC Studio provides detailed team management and permissions
Disadvantages:
For users unfamiliar with Git or version control, navigating experiments and tracked metadata can be challenging
Compared to dedicated experiment trackers, visualization and experiment comparison features are limited
2.7 TensorBoard
TensorBoard is an open-source visualization tool integrated with TensorFlow, often the first choice for TensorFlow users.
Advantages:
Well-developed features for working with images, text, and embeddings
Tight and deep integration with the latest TensorFlow versions
Can run in any Python environment with TensorFlow installed without requiring database setup or additional libraries
Disadvantages:
Must be self-hosted
No collaboration features, access control, user management, or centralized data storage
Not suitable for frameworks other than TensorFlow
3. MLflow Tracking Implementation
3.1 Implementation Steps
Step 1: Installation
Ensure the MLflow library is installed. If not, use pip:
Step 2: Create MLflow Experiment
Create an MLflow experiment in the train.py file. One experiment can contain many runs. For example, each run uses a different embedding model, so we create an experiment to compare BERTopic embedding models with coherence score metrics. Use mlflow.set_experiment to set the experiment name where all runs will be recorded.
Step 3: Start Experiment Run
Use mlflow.start_run to start one experiment run. In this experiment, we also implement GridSearch so runs are executed as many times as the number of embedding models we want to compare. mlflow.log_param is used to record parameters used, in this case the model type and min_topic_size (although default).
Step 4: Log Metrics
mlflow.log_metric is used to record metric values, in this case the coherence score.
Step 5: Log Artifacts
mlflow.log_artifact is used to save files to the MLflow run, in this case we save the model in pickle format and CSV files from topic modeling results.
Step 6: Execute Training
Run the training. If the mlruns file and experiment haven't been created yet, MLflow will create them automatically.
Step 7: Launch MLflow UI
If the training (experiment) process is complete, run mlflow ui and access the tracking results in the browser.
Step 8: View Experiment Results
Experiment results can be viewed in the browser.
Step 9: Compare Runs
Perform comparison of each run using the compare feature provided by MLflow.
Step 10: Model Evaluation
From the experiment results, we can evaluate the models we tested.
3.2 Code Implementation
Here's an example of how MLflow tracking is implemented in our PTIIKInsight project:
3.3 Running MLflow UI
After training is complete, launch the MLflow UI to view and compare results:
Then access the interface at http://localhost:5000 to:
View all experiment runs
Compare different embedding models
Analyze coherence scores and other metrics
Download saved models and artifacts
3.4 Benefits of MLflow Implementation
Experiment Reproducibility: All parameters and configurations are tracked
Model Comparison: Easy comparison between different embedding models
Artifact Management: Centralized storage of models and results
Collaboration: Team members can access and review experiment results
Model Versioning: Different model versions are automatically tracked
4. Results and Analysis
The MLflow implementation allows us to:
Track and compare the performance of different embedding models
Maintain a history of all experimental runs
Easily reproduce successful experiments
Make data-driven decisions about model selection
Share results with team members and stakeholders
5. Repository Links
Implementation commits can be found at:
Conclusion
MLflow proves to be an excellent choice for experiment tracking in the PTIIKInsight project due to its open-source nature, comprehensive tracking capabilities, and ease of integration with our existing Python-based machine learning pipeline. The implementation enables better experiment management, model comparison, and team collaboration in our topic modeling research.
Last updated