AI-powered Student Management System
Production-grade full-stack platform with role-based portals, real-time analytics, and a Random Forest model that predicts academic risk with 96% accuracy.
Machine Learning Deep Dive
The problem
Universities often identify at-risk students too late: after they've already failed courses or dropped out. Early intervention requires predictive signals that human advisors might miss across hundreds of students. A single advisor managing 200 students can't manually track GPA trajectories, failure patterns, and credit progress every week. The model can.
The approach
A supervised classification model using a Random Forest ensemble to predict whether a student is academically at-risk based on their academic performance metrics.
Why Random Forest?
- Handles non-linear relationships between features. A student with 3.5 GPA but 3 failed courses is still at risk: logistic regression would struggle with that interaction, Random Forest doesn't.
- Provides feature importance rankings, making predictions interpretable for advisors. "This student is at risk primarily because of low GPA (34%) and three failed courses (26%)" is actionable. "This student is at risk because the model said so" is not.
- Robust against overfitting with 100 decision trees (estimators) and bootstrap sampling.
- No feature scaling required: works directly with raw GPA and count values. Saves a preprocessing step and a class of bugs.
Training data
Generated 500 synthetic student profiles with realistic distributions modeled after actual university patterns:
| Feature | Range | Distribution |
|---|---|---|
| GPA | 0.5 to 4.0 | Uniform |
| Courses taken | 1 to 12 | Uniform integer |
| Courses failed | 0 to 5 | Uniform integer |
| Avg grade points | 0.5 to 4.0 | Uniform |
| Credits completed | 3 to 60 | Uniform integer |
| Semesters enrolled | 1 to 8 | Uniform integer |
Labeling criteria
A student is at risk if any of these apply:
- GPA below 2.0
- 3 or more courses failed
- Average grade points below 1.5
- GPA below 2.5 AND 2+ courses failed
- Fewer than 15 credits completed after 4+ semesters
These rules encode what an experienced academic advisor would flag manually. The Random Forest learns the patterns, then generalizes to combinations the rules don't explicitly cover.
Results
precision recall f1-score support
Not At Risk 0.88 0.88 0.88 17
At Risk 0.98 0.98 0.98 83
accuracy 0.96 100
96% accuracy on held-out test set. The slight imbalance (17 not-at-risk vs. 83 at-risk in test) reflects the labeling criteria: the rules are sensitive, so most generated profiles trip at least one flag.
For production deployment, the next step would be retraining on real anonymized university data, where the class balance is typically reversed (most students are not at risk).
Feature importance
| Feature | Importance | Interpretation |
|---|---|---|
| GPA | 33.96% | Strongest single predictor of academic success |
| Courses failed | 26.32% | Direct indicator of academic difficulty |
| Avg grade points | 20.94% | Captures grade trajectory beyond cumulative GPA |
| Credits completed | 8.49% | Progress indicator: slow progress signals risk |
| Courses taken | 5.21% | Course load context |
| Semesters enrolled | 5.07% | Time-in-program context |
GPA and failure count together account for 60% of the model's predictive power. This matches advisor intuition: those are the two things they'd check first when reviewing a student's standing.
Integration architecture
The ML model runs as an independent Flask microservice, decoupled from the Java backend:
- Staff clicks "Risk Check" on a student in the Java web app
- StudentServlet gathers the student's academic metrics from MySQL (GPA, enrollment count, failed courses, etc.)
- MLClient utility class sends an HTTP POST to
http://localhost:5000/predictwith the metrics as JSON - Flask API loads the pre-trained model from disk (
student_risk_model.pkl), runs inference, and returns a JSON response with prediction, confidence score, and a human-readable recommendation - JSP renders the result with color-coded risk status, confidence percentage, risk probability meter, and specific concerns
This microservice pattern means:
- The ML model can be retrained, updated, or replaced without touching the Java codebase
- Python and Java codebases evolve independently
- The model can be A/B tested by spinning up a second Flask service on a different port
- ML inference (CPU/GPU-bound) scales independently from the Java app (I/O-bound)
It's the same pattern used at companies running production ML: model serving is its own service, not a library import.
Graceful degradation
If the Flask service is unavailable (down, slow, or unreachable), the app doesn't crash. The MLClient has a 5-second timeout and catches connection errors. When inference fails, the Risk Check page shows "Risk assessment service unavailable" instead of a 500 error. The rest of the app: enrollments, grades, dashboards: keeps working.
In distributed systems, every network call is a potential failure point. Designing for graceful degradation from day one is cheaper than retrofitting it after a production outage.
Future enhancements
- Train on real anonymized university data for higher real-world accuracy
- Add time-series features: GPA trend across semesters, grade improvement/decline patterns
- Implement model versioning and A/B testing infrastructure
- Add batch prediction for end-of-semester risk reports across the full roster
- Explore gradient boosting (XGBoost) for potential accuracy improvements
- Add explainability via SHAP values for per-prediction feature attribution