User Interface
PTIIKInsight includes a comprehensive web-based dashboard built with Streamlit that provides an intuitive interface for topic modeling operations, data crawling, and system monitoring.
Getting Started
Prerequisites
Before running the dashboard, ensure you have:
Python 3.8+ installed
Required dependencies installed (
pip install -r dashboard/requirements.txt
)API service running on
http://localhost:8000
Starting the Dashboard
# Navigate to the project directory
cd project
# Run the Streamlit dashboard
streamlit run dashboard/main.py --server.port 8501
The dashboard will be available at: http://localhost:8501
Dashboard Overview
The PTIIKInsight dashboard provides four main sections accessible through the sidebar navigation:
📊 Overview
System Status: Real-time monitoring of all services
Service Health: API, Grafana, and Prometheus status
Quick Access: Direct links to monitoring tools
🤖 Prediction
Topic Prediction: Classify text using trained BERTopic models
Multiple Input Methods: Single text, batch text, or file upload
Results Visualization: Interactive charts and downloadable results
🕷️ Crawling
Data Collection: Web scraping for research papers
Progress Monitoring: Real-time scraping status
Data Management: View, analyze, and download collected data
🎯 Training
Model Training: Train new BERTopic models
Configuration: Adjust training parameters
Progress Tracking: Monitor training status and results
Feature Details
📸 Visual Examples: For screenshots and visual examples of all features described below, see the Gallery section.
System Overview Page
Features:
Service Status Cards: Shows online/offline status for API, Grafana, and Prometheus
Health Metrics: Real-time system health indicators
Quick Navigation: Direct access to monitoring dashboards
Service Monitoring:
API Service (Port 8000): Core topic modeling functionality
Grafana (Port 3000): Monitoring dashboards and visualizations
Prometheus (Port 9090): Metrics collection and alerting
Topic Prediction Page
Input Methods:
Single Text Input
Text area for individual document analysis
Ideal for testing and quick predictions
Real-time preview of input text
Multiple Text Input
Batch processing of multiple texts
One text per line input format
Efficient for analyzing several documents
File Upload
Supports CSV and JSON file formats
CSV requires 'text' column
JSON should contain array of text strings
Automatic validation and preview
Results Display:
Summary Metrics: Processing time, text count, timestamp
Results Table: Text input with predicted topic assignments
Topic Distribution Chart: Visual breakdown of topic frequencies
Export Options: Download results as CSV for further analysis
Example Workflow:
1. Choose input method (Single/Multiple/File)
2. Enter or upload text data
3. Preview texts to be processed
4. Click "Run Prediction"
5. View results and visualizations
6. Download results if needed
Data Crawling Page
Crawling Controls:
Start Crawling: Initiates web scraping process
Refresh Status: Updates current crawling progress
Real-time Monitoring: Live status updates during scraping
Data Management:
Current Data Overview: Statistics about collected data
Data Preview: Sample of scraped research papers
Data Statistics: Distribution charts and analytics
Export Options: Download collected data as CSV
Data Sources: The crawling system automatically collects:
Research paper titles and abstracts
Author information
Publication dates
Academic source metadata
Status Indicators:
Running: Crawling process is active
Completed: Data collection finished successfully
Error: Issues encountered during crawling
Model Training Page
Training Configuration:
Embedding Model Selection:
sentence-transformers/all-MiniLM-L6-v2
(default)multilingual-e5-large
(advanced)
Minimum Topic Size: Adjustable threshold (5-50 documents)
Training Duration: Typically 15-30 minutes depending on data size
Training Process:
Start Training: Initiates background model training
Progress Monitoring: Real-time status updates
Completion Notification: Success/failure alerts
Model Information: Current model statistics and metadata
Model Information Display:
Model Size: File size in MB
Last Modified: Timestamp of latest training
Availability Status: Model ready for predictions
Navigation and Layout
Sidebar Navigation
Logo Display: PTIIK institutional branding
Page Selection: Four main navigation buttons
Session State: Maintains user's current page selection
Main Content Area
Wide Layout: Optimized for data visualization
Responsive Design: Adapts to different screen sizes
Custom Styling: Professional color scheme and formatting
Interactive Elements
Real-time Updates: Automatic refresh of status information
Progress Indicators: Visual feedback during long operations
Error Handling: User-friendly error messages and suggestions
Customization Options
Styling and Themes
The dashboard includes custom CSS for:
Color Scheme: Professional blue gradient theme
Status Cards: Color-coded success/warning/error indicators
Typography: Clear hierarchy with appropriate fonts
Layout: Optimized spacing and alignment
Configuration Constants
DASHBOARD_TITLE = "PTIIK Insight Dashboard"
DASHBOARD_ICON = "🔬"
DASHBOARD_LAYOUT = "wide"
TRAINING_TIMEOUT = 1800 # 30 minutes
Troubleshooting
Common Issues
Dashboard Won't Start:
# Check Python and Streamlit installation
python --version
streamlit --version
# Install missing dependencies
pip install -r dashboard/requirements.txt
# Run with specific port
streamlit run dashboard/main.py --server.port 8502
API Connection Issues:
Ensure API service is running on port 8000
Check firewall settings for port access
Verify API health at
http://localhost:8000/health
File Upload Problems:
Ensure CSV files have 'text' column
JSON files must contain array of strings
Check file size limits (typically under 100MB)
Training Failures:
Verify sufficient disk space (>5GB recommended)
Check Python memory limits
Ensure data is available before training
Performance Optimization
For Large Datasets:
Process data in smaller batches
Use file upload instead of copy/paste for large texts
Consider increasing system memory allocation
For Slow Performance:
Close unused browser tabs
Restart Streamlit session periodically
Clear browser cache if issues persist
Advanced Features
Session State Management
The dashboard maintains session state for:
Prediction results across page navigation
Training status during long operations
Crawling progress and data cache
Background Processing
Asynchronous Operations: Crawling and training run in background
Non-blocking UI: Interface remains responsive during processing
Status Polling: Regular updates on long-running tasks
Data Export and Import
Multiple Formats: CSV, JSON export support
Batch Processing: Handle large datasets efficiently
Result Persistence: Downloadable analysis results
Integration with Other Services
API Integration
RESTful Communication: Standard HTTP requests to FastAPI backend
Error Handling: Graceful degradation if API unavailable
Health Checks: Automatic service availability detection
Monitoring Integration
Grafana Links: Direct access to monitoring dashboards
Prometheus Metrics: Real-time performance indicators
Service Discovery: Automatic detection of running services
Best Practices
Usage Recommendations
Start API First: Always ensure backend is running before dashboard
Monitor Resources: Watch system resources during training
Save Results: Download important prediction results
Regular Updates: Refresh status for accurate information
Data Management
Regular Backups: Export and save important datasets
Clean Data: Verify data quality before training
Version Control: Keep track of different model versions
Resource Monitoring: Monitor disk space and memory usage
The PTIIKInsight web dashboard provides a complete interface for managing topic modeling workflows, from data collection through model training to result analysis, all within an intuitive and responsive web application.
Last updated