Docker Architecture
PTIIKInsight uses Docker containers to provide a consistent deployment environment. The architecture consists of three main services that work together to provide topic modeling capabilities and monitoring.
Overview
The Docker setup includes:
FastAPI Application: Core API service for topic modeling
Prometheus: Metrics collection and monitoring
Grafana: Visualization and dashboards
Architecture Diagram
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ FastAPI App │ │ Prometheus │ │ Grafana │
│ Port: 8000 │ │ Port: 9090 │ │ Port: 3000 │
│ │◄───┤ │◄───┤ │
│ - Topic Predict │ │ - Metrics Store │ │ - Dashboards │
│ - Data Scraping │ │ - Alerting │ │ - Visualization │
│ - Model APIs │ │ - Time Series │ │ - Monitoring │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ │ │
└───────────────────────┼───────────────────────┘
│
┌─────────────────┐
│ Host System │
│ │
│ - Volume Mounts │
│ - Data Storage │
│ - Model Files │
└─────────────────┘
Service Details
FastAPI Application (fastapi-app
)
fastapi-app
)Container: fastapi-scraper
Port: 8000
Image: Built from local Dockerfile
Key Features:
Topic prediction using BERTopic model
Web scraping for research papers
Data preprocessing and cleaning
Prometheus metrics integration
RESTful API endpoints
Volume Mounts:
./data:/app/data
- Data storage./model:/app/model
- ML model files./preprocessing:/app/preprocessing
- Preprocessing scripts./api:/app/api
- API source code
Health Check: Available at /health
endpoint
Prometheus (prometheus
)
prometheus
)Container: prometheus
Port: 9090
Image: prom/prometheus:latest
Key Features:
Scrapes metrics from FastAPI application
Stores time-series monitoring data
Provides alerting capabilities
Web-based query interface
Volume Mounts:
./monitoring/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- Configuration./monitoring/prometheus/rules:/etc/prometheus/rules
- Alert rules
Configuration:
Scrapes FastAPI metrics from
/metrics
endpointStores data in internal time-series database
Supports custom alert rules
Grafana (grafana
)
grafana
)Container: grafana
Port: 3000
Image: grafana/grafana:latest
Key Features:
Visual dashboards for monitoring
Connects to Prometheus as data source
Pre-configured dashboards
Dark theme by default
Volume Mounts:
./monitoring/grafana/provisioning:/etc/grafana/provisioning
- Data sources config./monitoring/grafana/dashboards:/var/lib/grafana/dashboards
- Dashboard definitions
Default Credentials:
Username: admin
Password: admin
Configuration:
Automatic dashboard provisioning
Prometheus data source pre-configured
Pie chart plugin installed
Network Architecture
All services run on the default Docker network and communicate internally:
FastAPI → Prometheus: FastAPI exposes metrics at
/metrics
endpointPrometheus → FastAPI: Prometheus scrapes metrics every 15 seconds
Grafana → Prometheus: Grafana queries Prometheus for dashboard data
Host → Services: All services accessible from host via mapped ports
Data Flow
Data Input:
Web scraping via
/scrape
endpointDirect API calls for predictions
Processing:
Text preprocessing and cleaning
Topic modeling with BERTopic
Result storage in CSV format
Monitoring:
API metrics collected by Prometheus
Performance data visualized in Grafana
Real-time monitoring of predictions and errors
Storage:
Raw data:
./data/raw/
Cleaned data:
./data/cleaned/
Models:
./model/
Monitoring config:
./monitoring/
Deployment Configuration
Environment Variables
The containers use these environment settings:
Grafana:
GF_USERS_ALLOW_SIGN_UP=false
- Disable user registrationGF_USERS_DEFAULT_THEME=dark
- Set dark themeGF_INSTALL_PLUGINS=grafana-piechart-panel
- Install pie chart plugin
Restart Policy
All services use restart: always
to ensure automatic recovery from failures.
Resource Requirements
Minimum:
CPU: 2 cores
RAM: 4GB
Storage: 5GB
Recommended:
CPU: 4+ cores
RAM: 8GB+
Storage: 20GB+ (SSD)
Scaling Considerations
Horizontal Scaling
Currently, the architecture supports single-instance deployment. For scaling:
API Service: Can be scaled horizontally with load balancer
Prometheus: Single instance (clustering requires additional setup)
Grafana: Single instance (clustering available in enterprise)
Performance Optimization
Model Loading: BERTopic model loaded once at startup
Async Processing: Scraping runs in background tasks
Metrics Collection: Minimal overhead with Prometheus integration
Volume Storage: Use fast storage (SSD) for model files
Security
Network Security
Services communicate on isolated Docker network
Only necessary ports exposed to host
No external network access required
Data Security
Local volume mounts for data persistence
No sensitive data in environment variables
Grafana admin credentials should be changed in production
Monitoring and Alerts
Available Metrics
model_predictions_total
: Total predictions mademodel_prediction_errors_total
: Prediction failuresmodel_prediction_duration_seconds
: Prediction timingscraping_requests_total
: Scraping requestsscraping_errors_total
: Scraping failuresmodel_accuracy
: Current model accuracy
Dashboard Features
Real-time API performance metrics
Model prediction success/failure rates
System resource utilization
Request volume and timing
Error rate monitoring
Troubleshooting
Common Issues
Services Not Starting:
docker-compose logs -f
Port Conflicts:
docker-compose down
netstat -ano | findstr :8000
Volume Mount Issues:
docker-compose down -v
docker-compose up -d
Memory Issues:
docker system prune -a
Health Checks
FastAPI: curl http://localhost:8000/health
Prometheus: curl http://localhost:9090/-/healthy
Grafana: Visit http://localhost:3000
Backup and Recovery
Data Backup
# Backup data volumes
docker run --rm -v $(pwd)/data:/backup alpine tar czf /backup/data-backup.tar.gz /backup
# Backup monitoring config
tar czf monitoring-backup.tar.gz monitoring/
Recovery
# Restore from backup
docker-compose down
tar xzf data-backup.tar.gz
docker-compose up -d
Last updated