User Interface

PTIIKInsight includes a comprehensive web-based dashboard built with Streamlit that provides an intuitive interface for topic modeling operations, data crawling, and system monitoring.

Getting Started

Prerequisites

Before running the dashboard, ensure you have:

  • Python 3.8+ installed

  • Required dependencies installed (pip install -r dashboard/requirements.txt)

  • API service running on http://localhost:8000

Starting the Dashboard

# Navigate to the project directory
cd project

# Run the Streamlit dashboard
streamlit run dashboard/main.py --server.port 8501

The dashboard will be available at: http://localhost:8501

Dashboard Overview

The PTIIKInsight dashboard provides four main sections accessible through the sidebar navigation:

📊 Overview

  • System Status: Real-time monitoring of all services

  • Service Health: API, Grafana, and Prometheus status

  • Quick Access: Direct links to monitoring tools

🤖 Prediction

  • Topic Prediction: Classify text using trained BERTopic models

  • Multiple Input Methods: Single text, batch text, or file upload

  • Results Visualization: Interactive charts and downloadable results

🕷️ Crawling

  • Data Collection: Web scraping for research papers

  • Progress Monitoring: Real-time scraping status

  • Data Management: View, analyze, and download collected data

🎯 Training

  • Model Training: Train new BERTopic models

  • Configuration: Adjust training parameters

  • Progress Tracking: Monitor training status and results

Feature Details

📸 Visual Examples: For screenshots and visual examples of all features described below, see the Gallery section.

System Overview Page

Features:

  • Service Status Cards: Shows online/offline status for API, Grafana, and Prometheus

  • Health Metrics: Real-time system health indicators

  • Quick Navigation: Direct access to monitoring dashboards

Service Monitoring:

  • API Service (Port 8000): Core topic modeling functionality

  • Grafana (Port 3000): Monitoring dashboards and visualizations

  • Prometheus (Port 9090): Metrics collection and alerting

Topic Prediction Page

Input Methods:

  1. Single Text Input

    • Text area for individual document analysis

    • Ideal for testing and quick predictions

    • Real-time preview of input text

  2. Multiple Text Input

    • Batch processing of multiple texts

    • One text per line input format

    • Efficient for analyzing several documents

  3. File Upload

    • Supports CSV and JSON file formats

    • CSV requires 'text' column

    • JSON should contain array of text strings

    • Automatic validation and preview

Results Display:

  • Summary Metrics: Processing time, text count, timestamp

  • Results Table: Text input with predicted topic assignments

  • Topic Distribution Chart: Visual breakdown of topic frequencies

  • Export Options: Download results as CSV for further analysis

Example Workflow:

1. Choose input method (Single/Multiple/File)
2. Enter or upload text data
3. Preview texts to be processed
4. Click "Run Prediction" 
5. View results and visualizations
6. Download results if needed

Data Crawling Page

Crawling Controls:

  • Start Crawling: Initiates web scraping process

  • Refresh Status: Updates current crawling progress

  • Real-time Monitoring: Live status updates during scraping

Data Management:

  • Current Data Overview: Statistics about collected data

  • Data Preview: Sample of scraped research papers

  • Data Statistics: Distribution charts and analytics

  • Export Options: Download collected data as CSV

Data Sources: The crawling system automatically collects:

  • Research paper titles and abstracts

  • Author information

  • Publication dates

  • Academic source metadata

Status Indicators:

  • Running: Crawling process is active

  • Completed: Data collection finished successfully

  • Error: Issues encountered during crawling

Model Training Page

Training Configuration:

  • Embedding Model Selection:

    • sentence-transformers/all-MiniLM-L6-v2 (default)

    • multilingual-e5-large (advanced)

  • Minimum Topic Size: Adjustable threshold (5-50 documents)

  • Training Duration: Typically 15-30 minutes depending on data size

Training Process:

  1. Start Training: Initiates background model training

  2. Progress Monitoring: Real-time status updates

  3. Completion Notification: Success/failure alerts

  4. Model Information: Current model statistics and metadata

Model Information Display:

  • Model Size: File size in MB

  • Last Modified: Timestamp of latest training

  • Availability Status: Model ready for predictions

  • Logo Display: PTIIK institutional branding

  • Page Selection: Four main navigation buttons

  • Session State: Maintains user's current page selection

Main Content Area

  • Wide Layout: Optimized for data visualization

  • Responsive Design: Adapts to different screen sizes

  • Custom Styling: Professional color scheme and formatting

Interactive Elements

  • Real-time Updates: Automatic refresh of status information

  • Progress Indicators: Visual feedback during long operations

  • Error Handling: User-friendly error messages and suggestions

Customization Options

Styling and Themes

The dashboard includes custom CSS for:

  • Color Scheme: Professional blue gradient theme

  • Status Cards: Color-coded success/warning/error indicators

  • Typography: Clear hierarchy with appropriate fonts

  • Layout: Optimized spacing and alignment

Configuration Constants

DASHBOARD_TITLE = "PTIIK Insight Dashboard"
DASHBOARD_ICON = "🔬"
DASHBOARD_LAYOUT = "wide"
TRAINING_TIMEOUT = 1800  # 30 minutes

Troubleshooting

Common Issues

Dashboard Won't Start:

# Check Python and Streamlit installation
python --version
streamlit --version

# Install missing dependencies
pip install -r dashboard/requirements.txt

# Run with specific port
streamlit run dashboard/main.py --server.port 8502

API Connection Issues:

  • Ensure API service is running on port 8000

  • Check firewall settings for port access

  • Verify API health at http://localhost:8000/health

File Upload Problems:

  • Ensure CSV files have 'text' column

  • JSON files must contain array of strings

  • Check file size limits (typically under 100MB)

Training Failures:

  • Verify sufficient disk space (>5GB recommended)

  • Check Python memory limits

  • Ensure data is available before training

Performance Optimization

For Large Datasets:

  • Process data in smaller batches

  • Use file upload instead of copy/paste for large texts

  • Consider increasing system memory allocation

For Slow Performance:

  • Close unused browser tabs

  • Restart Streamlit session periodically

  • Clear browser cache if issues persist

Advanced Features

Session State Management

The dashboard maintains session state for:

  • Prediction results across page navigation

  • Training status during long operations

  • Crawling progress and data cache

Background Processing

  • Asynchronous Operations: Crawling and training run in background

  • Non-blocking UI: Interface remains responsive during processing

  • Status Polling: Regular updates on long-running tasks

Data Export and Import

  • Multiple Formats: CSV, JSON export support

  • Batch Processing: Handle large datasets efficiently

  • Result Persistence: Downloadable analysis results

Integration with Other Services

API Integration

  • RESTful Communication: Standard HTTP requests to FastAPI backend

  • Error Handling: Graceful degradation if API unavailable

  • Health Checks: Automatic service availability detection

Monitoring Integration

  • Grafana Links: Direct access to monitoring dashboards

  • Prometheus Metrics: Real-time performance indicators

  • Service Discovery: Automatic detection of running services

Best Practices

Usage Recommendations

  1. Start API First: Always ensure backend is running before dashboard

  2. Monitor Resources: Watch system resources during training

  3. Save Results: Download important prediction results

  4. Regular Updates: Refresh status for accurate information

Data Management

  1. Regular Backups: Export and save important datasets

  2. Clean Data: Verify data quality before training

  3. Version Control: Keep track of different model versions

  4. Resource Monitoring: Monitor disk space and memory usage

The PTIIKInsight web dashboard provides a complete interface for managing topic modeling workflows, from data collection through model training to result analysis, all within an intuitive and responsive web application.

Last updated