API Reference

PTIIKInsight provides a REST API for topic modeling and data scraping operations. This reference covers all available endpoints based on the actual project implementation.

Base URL

http://localhost:8000

The API runs on port 8000 by default and provides automatic documentation at /docs.

Available Endpoints

Health Check

Check if the API service is running and healthy.

GET /health

Response:

{
  "status": "healthy",
  "timestamp": 1640995200.0,
  "model_loaded": true
}
### Data Scraping

Start the web scraping process to collect new research paper data.

```http
POST /scrape

Response:

{
  "message": "Scraping sedang berjalan",
  "status": "started"
}

This endpoint runs scraping asynchronously in the background. The process includes:

  1. Web scraping using the scraping module

  2. Data preprocessing and cleaning

  3. Storing results in CSV format

Get Scraped Data

Retrieve the processed data from previous scraping operations.

GET /data

Success Response:

{
  "message": "Data retrieved successfully",
  "count": 150,
  "data": [
    {
      "title": "Paper Title",
      "abstract": "Paper abstract...",
      "authors": "Author names",
      "publication_date": "2024-01-01"
    }
  ]
}

No Data Response:

{
  "message": "Data belum tersedia, silakan jalankan scraping.",
  "status": "no_data"
}

Response Fields:

  • count: Number of records retrieved

  • data: Array of scraped research paper records

Topic Prediction

Predict topics for given text inputs using the trained BERTopic model.

POST /predict

Request Body:

{
  "texts": [
    "Machine learning applications in healthcare",
    "Deep learning for natural language processing"
  ]
}

Request Fields:

  • texts: Array of text strings for topic prediction (max 100 texts per request)

Response:

{
  "message": "Prediction completed successfully",
  "input_count": 2,
  "prediction_time": 1.23,
  "topics": [
    {
      "text": "Machine learning applications in healthcare",
      "topic_id": 5,
      "topic_label": "Healthcare AI",
      "confidence": 0.85
    },
    {
      "text": "Deep learning for natural language processing",
      "topic_id": 12,
      "topic_label": "NLP Deep Learning",
      "confidence": 0.92
    }
  ]
}

Response Fields:

  • input_count: Number of input texts processed

  • prediction_time: Time taken for prediction in seconds

  • topics: Array of prediction results with topic assignments

Model Accuracy Update

Update the model accuracy metric for monitoring purposes.

POST /update-accuracy

Query Parameters:

  • accuracy: Float value between 0 and 1

Example:

curl -X POST "http://localhost:8000/update-accuracy?accuracy=0.87"

Response:

{
  "message": "Model accuracy updated",
  "accuracy": 0.87
}

Error Responses

All endpoints may return error responses in case of failures:

400 Bad Request:

{
  "detail": "Empty text list provided"
}

500 Internal Server Error:

{
  "detail": "Prediction failed: Model not loaded"
}

Usage Examples

Complete Workflow Example

# 1. Check health
curl http://localhost:8000/health

# 2. Start scraping
curl -X POST http://localhost:8000/scrape

# 3. Wait for scraping to complete, then get data
curl http://localhost:8000/data

# 4. Predict topics for new texts
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"texts": ["Your research text here"]}'

Python Client Example

import requests

# API base URL
base_url = "http://localhost:8000"

# Check health
health = requests.get(f"{base_url}/health")
print(health.json())

# Start scraping
scrape_response = requests.post(f"{base_url}/scrape")
print(scrape_response.json())

# Predict topics
predict_data = {
    "texts": [
        "Machine learning in medical diagnosis",
        "Blockchain technology applications"
    ]
}
predict_response = requests.post(
    f"{base_url}/predict",
    json=predict_data
)
print(predict_response.json())

Monitoring and Metrics

The API includes Prometheus metrics for monitoring:

  • model_predictions_total: Total number of predictions made

  • model_prediction_errors_total: Total number of prediction errors

  • model_prediction_duration_seconds: Time spent on predictions

  • model_accuracy: Current model accuracy score

  • scraping_requests_total: Total scraping requests

  • scraping_errors_total: Total scraping errors

Metrics are available at /metrics endpoint for Prometheus scraping.

Rate Limits

  • Maximum 100 texts per /predict request

  • Scraping operations are queued and run one at a time

  • No authentication required for current version

API Documentation

Interactive API documentation is available at:

  • Swagger UI: http://localhost:8000/docs

  • ReDoc: http://localhost:8000/redoc

Last updated