API Reference
PTIIKInsight provides a REST API for topic modeling and data scraping operations. This reference covers all available endpoints based on the actual project implementation.
Base URL
http://localhost:8000
The API runs on port 8000 by default and provides automatic documentation at /docs
.
Available Endpoints
Health Check
Check if the API service is running and healthy.
GET /health
Response:
{
"status": "healthy",
"timestamp": 1640995200.0,
"model_loaded": true
}
### Data Scraping
Start the web scraping process to collect new research paper data.
```http
POST /scrape
Response:
{
"message": "Scraping sedang berjalan",
"status": "started"
}
This endpoint runs scraping asynchronously in the background. The process includes:
Web scraping using the scraping module
Data preprocessing and cleaning
Storing results in CSV format
Get Scraped Data
Retrieve the processed data from previous scraping operations.
GET /data
Success Response:
{
"message": "Data retrieved successfully",
"count": 150,
"data": [
{
"title": "Paper Title",
"abstract": "Paper abstract...",
"authors": "Author names",
"publication_date": "2024-01-01"
}
]
}
No Data Response:
{
"message": "Data belum tersedia, silakan jalankan scraping.",
"status": "no_data"
}
Response Fields:
count
: Number of records retrieveddata
: Array of scraped research paper records
Topic Prediction
Predict topics for given text inputs using the trained BERTopic model.
POST /predict
Request Body:
{
"texts": [
"Machine learning applications in healthcare",
"Deep learning for natural language processing"
]
}
Request Fields:
texts
: Array of text strings for topic prediction (max 100 texts per request)
Response:
{
"message": "Prediction completed successfully",
"input_count": 2,
"prediction_time": 1.23,
"topics": [
{
"text": "Machine learning applications in healthcare",
"topic_id": 5,
"topic_label": "Healthcare AI",
"confidence": 0.85
},
{
"text": "Deep learning for natural language processing",
"topic_id": 12,
"topic_label": "NLP Deep Learning",
"confidence": 0.92
}
]
}
Response Fields:
input_count
: Number of input texts processedprediction_time
: Time taken for prediction in secondstopics
: Array of prediction results with topic assignments
Model Accuracy Update
Update the model accuracy metric for monitoring purposes.
POST /update-accuracy
Query Parameters:
accuracy
: Float value between 0 and 1
Example:
curl -X POST "http://localhost:8000/update-accuracy?accuracy=0.87"
Response:
{
"message": "Model accuracy updated",
"accuracy": 0.87
}
Error Responses
All endpoints may return error responses in case of failures:
400 Bad Request:
{
"detail": "Empty text list provided"
}
500 Internal Server Error:
{
"detail": "Prediction failed: Model not loaded"
}
Usage Examples
Complete Workflow Example
# 1. Check health
curl http://localhost:8000/health
# 2. Start scraping
curl -X POST http://localhost:8000/scrape
# 3. Wait for scraping to complete, then get data
curl http://localhost:8000/data
# 4. Predict topics for new texts
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"texts": ["Your research text here"]}'
Python Client Example
import requests
# API base URL
base_url = "http://localhost:8000"
# Check health
health = requests.get(f"{base_url}/health")
print(health.json())
# Start scraping
scrape_response = requests.post(f"{base_url}/scrape")
print(scrape_response.json())
# Predict topics
predict_data = {
"texts": [
"Machine learning in medical diagnosis",
"Blockchain technology applications"
]
}
predict_response = requests.post(
f"{base_url}/predict",
json=predict_data
)
print(predict_response.json())
Monitoring and Metrics
The API includes Prometheus metrics for monitoring:
model_predictions_total
: Total number of predictions mademodel_prediction_errors_total
: Total number of prediction errorsmodel_prediction_duration_seconds
: Time spent on predictionsmodel_accuracy
: Current model accuracy scorescraping_requests_total
: Total scraping requestsscraping_errors_total
: Total scraping errors
Metrics are available at /metrics
endpoint for Prometheus scraping.
Rate Limits
Maximum 100 texts per
/predict
requestScraping operations are queued and run one at a time
No authentication required for current version
API Documentation
Interactive API documentation is available at:
Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc
Last updated