Docker Architecture

PTIIKInsight uses Docker containers to provide a consistent deployment environment. The architecture consists of three main services that work together to provide topic modeling capabilities and monitoring.

Overview

The Docker setup includes:

  • FastAPI Application: Core API service for topic modeling

  • Prometheus: Metrics collection and monitoring

  • Grafana: Visualization and dashboards

Architecture Diagram

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   FastAPI App   │    │   Prometheus    │    │    Grafana      │
│   Port: 8000    │    │   Port: 9090    │    │   Port: 3000    │
│                 │◄───┤                 │◄───┤                 │
│ - Topic Predict │    │ - Metrics Store │    │ - Dashboards    │
│ - Data Scraping │    │ - Alerting      │    │ - Visualization │
│ - Model APIs    │    │ - Time Series   │    │ - Monitoring    │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         │                       │                       │
         └───────────────────────┼───────────────────────┘

                    ┌─────────────────┐
                    │   Host System   │
                    │                 │
                    │ - Volume Mounts │
                    │ - Data Storage  │
                    │ - Model Files   │
                    └─────────────────┘

Service Details

FastAPI Application (fastapi-app)

Container: fastapi-scraper Port: 8000 Image: Built from local Dockerfile

Key Features:

  • Topic prediction using BERTopic model

  • Web scraping for research papers

  • Data preprocessing and cleaning

  • Prometheus metrics integration

  • RESTful API endpoints

Volume Mounts:

  • ./data:/app/data - Data storage

  • ./model:/app/model - ML model files

  • ./preprocessing:/app/preprocessing - Preprocessing scripts

  • ./api:/app/api - API source code

Health Check: Available at /health endpoint

Prometheus (prometheus)

Container: prometheus Port: 9090 Image: prom/prometheus:latest

Key Features:

  • Scrapes metrics from FastAPI application

  • Stores time-series monitoring data

  • Provides alerting capabilities

  • Web-based query interface

Volume Mounts:

  • ./monitoring/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml - Configuration

  • ./monitoring/prometheus/rules:/etc/prometheus/rules - Alert rules

Configuration:

  • Scrapes FastAPI metrics from /metrics endpoint

  • Stores data in internal time-series database

  • Supports custom alert rules

Grafana (grafana)

Container: grafana Port: 3000 Image: grafana/grafana:latest

Key Features:

  • Visual dashboards for monitoring

  • Connects to Prometheus as data source

  • Pre-configured dashboards

  • Dark theme by default

Volume Mounts:

  • ./monitoring/grafana/provisioning:/etc/grafana/provisioning - Data sources config

  • ./monitoring/grafana/dashboards:/var/lib/grafana/dashboards - Dashboard definitions

Default Credentials:

  • Username: admin

  • Password: admin

Configuration:

  • Automatic dashboard provisioning

  • Prometheus data source pre-configured

  • Pie chart plugin installed

Network Architecture

All services run on the default Docker network and communicate internally:

  • FastAPI → Prometheus: FastAPI exposes metrics at /metrics endpoint

  • Prometheus → FastAPI: Prometheus scrapes metrics every 15 seconds

  • Grafana → Prometheus: Grafana queries Prometheus for dashboard data

  • Host → Services: All services accessible from host via mapped ports

Data Flow

  1. Data Input:

    • Web scraping via /scrape endpoint

    • Direct API calls for predictions

  2. Processing:

    • Text preprocessing and cleaning

    • Topic modeling with BERTopic

    • Result storage in CSV format

  3. Monitoring:

    • API metrics collected by Prometheus

    • Performance data visualized in Grafana

    • Real-time monitoring of predictions and errors

  4. Storage:

    • Raw data: ./data/raw/

    • Cleaned data: ./data/cleaned/

    • Models: ./model/

    • Monitoring config: ./monitoring/

Deployment Configuration

Environment Variables

The containers use these environment settings:

Grafana:

  • GF_USERS_ALLOW_SIGN_UP=false - Disable user registration

  • GF_USERS_DEFAULT_THEME=dark - Set dark theme

  • GF_INSTALL_PLUGINS=grafana-piechart-panel - Install pie chart plugin

Restart Policy

All services use restart: always to ensure automatic recovery from failures.

Resource Requirements

Minimum:

  • CPU: 2 cores

  • RAM: 4GB

  • Storage: 5GB

Recommended:

  • CPU: 4+ cores

  • RAM: 8GB+

  • Storage: 20GB+ (SSD)

Scaling Considerations

Horizontal Scaling

Currently, the architecture supports single-instance deployment. For scaling:

  1. API Service: Can be scaled horizontally with load balancer

  2. Prometheus: Single instance (clustering requires additional setup)

  3. Grafana: Single instance (clustering available in enterprise)

Performance Optimization

  • Model Loading: BERTopic model loaded once at startup

  • Async Processing: Scraping runs in background tasks

  • Metrics Collection: Minimal overhead with Prometheus integration

  • Volume Storage: Use fast storage (SSD) for model files

Security

Network Security

  • Services communicate on isolated Docker network

  • Only necessary ports exposed to host

  • No external network access required

Data Security

  • Local volume mounts for data persistence

  • No sensitive data in environment variables

  • Grafana admin credentials should be changed in production

Monitoring and Alerts

Available Metrics

  • model_predictions_total: Total predictions made

  • model_prediction_errors_total: Prediction failures

  • model_prediction_duration_seconds: Prediction timing

  • scraping_requests_total: Scraping requests

  • scraping_errors_total: Scraping failures

  • model_accuracy: Current model accuracy

Dashboard Features

  • Real-time API performance metrics

  • Model prediction success/failure rates

  • System resource utilization

  • Request volume and timing

  • Error rate monitoring

Troubleshooting

Common Issues

Services Not Starting:

docker-compose logs -f

Port Conflicts:

docker-compose down
netstat -ano | findstr :8000

Volume Mount Issues:

docker-compose down -v
docker-compose up -d

Memory Issues:

docker system prune -a

Health Checks

FastAPI: curl http://localhost:8000/health Prometheus: curl http://localhost:9090/-/healthy Grafana: Visit http://localhost:3000

Backup and Recovery

Data Backup

# Backup data volumes
docker run --rm -v $(pwd)/data:/backup alpine tar czf /backup/data-backup.tar.gz /backup

# Backup monitoring config
tar czf monitoring-backup.tar.gz monitoring/

Recovery

# Restore from backup
docker-compose down
tar xzf data-backup.tar.gz
docker-compose up -d

Last updated