MLFlow Implementation

Overview

This document provides a comprehensive analysis and implementation guide for MLflow Tracking in the PTIIKInsight Topic Modeling project. The report compares various machine learning experiment tracking tools and demonstrates the practical implementation of MLflow for tracking parameters, metrics, and artifacts in our BERTopic-based topic modeling experiments.


1. Objectives

The primary goal of this implementation is to conduct observation and comparison of several machine learning experiment tracking tools, followed by implementing MLflow Tracking in the PTIIKInsight Topic Modeling training process to record parameters, metrics, and artifacts from our experiments.

2. Survey and Observation Results

2.1 Neptune.ai

Neptune.ai is an experiment tracking tool focused on collaboration and scalability.

Advantages:

  • Capable of recording various types of experiment metadata, including source code, Jupyter notebook snapshots, and Git information

  • Easy-to-navigate and customizable UI, allowing users to compare over 100,000 runs with millions of data points

  • Forking feature allows continuing runs from saved checkpoints and creating new runs from saved steps

  • Seamless integration with popular ML frameworks

Disadvantages:

  • Focuses on experiment tracking and model management; requires managing your own infrastructure

  • Free tier limited to one project and up to 3 users

2.2 Weights & Biases (W&B)

W&B is an MLOps platform providing experiment tracking, dataset versioning, and model management.

Advantages:

  • Supports tracking various experiment metadata and enables interactive experiment comparison

  • Provides interactive dashboards and reports for team collaboration

  • Built-in support for hyperparameter search and model optimization with W&B Sweeps

  • Integration with various ML/DL frameworks and cloud platforms

Disadvantages:

  • User management and administration can become complex

  • Collaboration features require paid tiers

2.3 Comet ML

Comet is a cloud-based MLOps platform that helps data scientists track experiments, manage model lifecycle, and collaborate on ML projects.

Advantages:

  • Provides clean UI for visualization and analysis of experiment metadata

  • Tracking client captures comprehensive information about experiment run environments

  • Includes specialized components for working with LLMs

Disadvantages:

  • Team collaboration only available in paid packages

  • UI tends to become slow for large numbers of experiments

  • Experiment tracking functionality difficult to use independently due to deep platform integration

2.4 Aim

Aim is an open-source experiment tracking tool offering extensive dashboards and plots with multiple run comparisons.

Advantages:

  • Can be run directly from Jupyter notebooks

  • Integration with spaCy and support for most deep learning frameworks

  • Attractive UI that can also be used with MLflow tracking server as backend

Disadvantages:

  • Aim's future is unclear after the company behind it announced AimOS as its successor

  • Does not support scikit-learn

  • No managed offering, and self-hosting requires significant effort

2.5 MLflow

MLflow is an open-source platform that helps manage the entire machine learning lifecycle, including experiments, model storage, reproducibility, and deployment.

Advantages:

  • Focuses on the entire machine learning process lifecycle

  • Strong and large user community providing community support

  • Open interface that can be integrated with any ML library or language

Disadvantages:

  • Must be self-hosted (although there are managed offerings by Databricks), involving significant overhead

  • Security and compliance measures must be implemented independently

  • Lacks user and group management and collaborative features

2.6 DVC Experiments and DVC Studio

DVC Experiments is the experiment tracking component of the open-source Data Version Control (DVC) tool family.

Advantages:

  • Strong Git-based approach suitable for teams with strong software engineering backgrounds

  • DVC Studio provides detailed team management and permissions

Disadvantages:

  • For users unfamiliar with Git or version control, navigating experiments and tracked metadata can be challenging

  • Compared to dedicated experiment trackers, visualization and experiment comparison features are limited

2.7 TensorBoard

TensorBoard is an open-source visualization tool integrated with TensorFlow, often the first choice for TensorFlow users.

Advantages:

  • Well-developed features for working with images, text, and embeddings

  • Tight and deep integration with the latest TensorFlow versions

  • Can run in any Python environment with TensorFlow installed without requiring database setup or additional libraries

Disadvantages:

  • Must be self-hosted

  • No collaboration features, access control, user management, or centralized data storage

  • Not suitable for frameworks other than TensorFlow

3. MLflow Tracking Implementation

3.1 Implementation Steps

Step 1: Installation

Ensure the MLflow library is installed. If not, use pip:

Step 2: Create MLflow Experiment

Create an MLflow experiment in the train.py file. One experiment can contain many runs. For example, each run uses a different embedding model, so we create an experiment to compare BERTopic embedding models with coherence score metrics. Use mlflow.set_experiment to set the experiment name where all runs will be recorded.

Step 3: Start Experiment Run

Use mlflow.start_run to start one experiment run. In this experiment, we also implement GridSearch so runs are executed as many times as the number of embedding models we want to compare. mlflow.log_param is used to record parameters used, in this case the model type and min_topic_size (although default).

Step 4: Log Metrics

mlflow.log_metric is used to record metric values, in this case the coherence score.

Step 5: Log Artifacts

mlflow.log_artifact is used to save files to the MLflow run, in this case we save the model in pickle format and CSV files from topic modeling results.

Step 6: Execute Training

Run the training. If the mlruns file and experiment haven't been created yet, MLflow will create them automatically.

Step 7: Launch MLflow UI

If the training (experiment) process is complete, run mlflow ui and access the tracking results in the browser.

Step 8: View Experiment Results

Experiment results can be viewed in the browser.

Step 9: Compare Runs

Perform comparison of each run using the compare feature provided by MLflow.

Step 10: Model Evaluation

From the experiment results, we can evaluate the models we tested.

3.2 Code Implementation

Here's an example of how MLflow tracking is implemented in our PTIIKInsight project:

3.3 Running MLflow UI

After training is complete, launch the MLflow UI to view and compare results:

Then access the interface at http://localhost:5000 to:

  • View all experiment runs

  • Compare different embedding models

  • Analyze coherence scores and other metrics

  • Download saved models and artifacts

3.4 Benefits of MLflow Implementation

  1. Experiment Reproducibility: All parameters and configurations are tracked

  2. Model Comparison: Easy comparison between different embedding models

  3. Artifact Management: Centralized storage of models and results

  4. Collaboration: Team members can access and review experiment results

  5. Model Versioning: Different model versions are automatically tracked

4. Results and Analysis

The MLflow implementation allows us to:

  • Track and compare the performance of different embedding models

  • Maintain a history of all experimental runs

  • Easily reproduce successful experiments

  • Make data-driven decisions about model selection

  • Share results with team members and stakeholders

Implementation commits can be found at:

Conclusion

MLflow proves to be an excellent choice for experiment tracking in the PTIIKInsight project due to its open-source nature, comprehensive tracking capabilities, and ease of integration with our existing Python-based machine learning pipeline. The implementation enables better experiment management, model comparison, and team collaboration in our topic modeling research.

Last updated