rpmjp/portfolio
rpmjp/projects/student-management-system/ml-deep-dive.md
CompletedApril to May 2026

AI-powered Student Management System

Production-grade full-stack platform with role-based portals, real-time analytics, and a Random Forest model that predicts academic risk with 96% accuracy.

Java 21Jakarta EEMySQL 8PythonFlaskscikit-learnTomcat 10
Languages
Java85.3%
CSS10.8%
Python2.8%
Other1.1%
ml-deep-dive.md

Machine Learning Deep Dive

The problem

Universities often identify at-risk students too late: after they've already failed courses or dropped out. Early intervention requires predictive signals that human advisors might miss across hundreds of students. A single advisor managing 200 students can't manually track GPA trajectories, failure patterns, and credit progress every week. The model can.

The approach

A supervised classification model using a Random Forest ensemble to predict whether a student is academically at-risk based on their academic performance metrics.

Why Random Forest?

  • Handles non-linear relationships between features. A student with 3.5 GPA but 3 failed courses is still at risk: logistic regression would struggle with that interaction, Random Forest doesn't.
  • Provides feature importance rankings, making predictions interpretable for advisors. "This student is at risk primarily because of low GPA (34%) and three failed courses (26%)" is actionable. "This student is at risk because the model said so" is not.
  • Robust against overfitting with 100 decision trees (estimators) and bootstrap sampling.
  • No feature scaling required: works directly with raw GPA and count values. Saves a preprocessing step and a class of bugs.

Training data

Generated 500 synthetic student profiles with realistic distributions modeled after actual university patterns:

FeatureRangeDistribution
GPA0.5 to 4.0Uniform
Courses taken1 to 12Uniform integer
Courses failed0 to 5Uniform integer
Avg grade points0.5 to 4.0Uniform
Credits completed3 to 60Uniform integer
Semesters enrolled1 to 8Uniform integer

Labeling criteria

A student is at risk if any of these apply:

  • GPA below 2.0
  • 3 or more courses failed
  • Average grade points below 1.5
  • GPA below 2.5 AND 2+ courses failed
  • Fewer than 15 credits completed after 4+ semesters

These rules encode what an experienced academic advisor would flag manually. The Random Forest learns the patterns, then generalizes to combinations the rules don't explicitly cover.

Results

              precision    recall  f1-score   support

 Not At Risk       0.88      0.88      0.88        17
     At Risk       0.98      0.98      0.98        83

    accuracy                           0.96       100

96% accuracy on held-out test set. The slight imbalance (17 not-at-risk vs. 83 at-risk in test) reflects the labeling criteria: the rules are sensitive, so most generated profiles trip at least one flag.

For production deployment, the next step would be retraining on real anonymized university data, where the class balance is typically reversed (most students are not at risk).

Feature importance

FeatureImportanceInterpretation
GPA33.96%Strongest single predictor of academic success
Courses failed26.32%Direct indicator of academic difficulty
Avg grade points20.94%Captures grade trajectory beyond cumulative GPA
Credits completed8.49%Progress indicator: slow progress signals risk
Courses taken5.21%Course load context
Semesters enrolled5.07%Time-in-program context

GPA and failure count together account for 60% of the model's predictive power. This matches advisor intuition: those are the two things they'd check first when reviewing a student's standing.

Integration architecture

The ML model runs as an independent Flask microservice, decoupled from the Java backend:

  1. Staff clicks "Risk Check" on a student in the Java web app
  2. StudentServlet gathers the student's academic metrics from MySQL (GPA, enrollment count, failed courses, etc.)
  3. MLClient utility class sends an HTTP POST to http://localhost:5000/predict with the metrics as JSON
  4. Flask API loads the pre-trained model from disk (student_risk_model.pkl), runs inference, and returns a JSON response with prediction, confidence score, and a human-readable recommendation
  5. JSP renders the result with color-coded risk status, confidence percentage, risk probability meter, and specific concerns

This microservice pattern means:

  • The ML model can be retrained, updated, or replaced without touching the Java codebase
  • Python and Java codebases evolve independently
  • The model can be A/B tested by spinning up a second Flask service on a different port
  • ML inference (CPU/GPU-bound) scales independently from the Java app (I/O-bound)

It's the same pattern used at companies running production ML: model serving is its own service, not a library import.

Graceful degradation

If the Flask service is unavailable (down, slow, or unreachable), the app doesn't crash. The MLClient has a 5-second timeout and catches connection errors. When inference fails, the Risk Check page shows "Risk assessment service unavailable" instead of a 500 error. The rest of the app: enrollments, grades, dashboards: keeps working.

In distributed systems, every network call is a potential failure point. Designing for graceful degradation from day one is cheaper than retrofitting it after a production outage.

Future enhancements

  • Train on real anonymized university data for higher real-world accuracy
  • Add time-series features: GPA trend across semesters, grade improvement/decline patterns
  • Implement model versioning and A/B testing infrastructure
  • Add batch prediction for end-of-semester risk reports across the full roster
  • Explore gradient boosting (XGBoost) for potential accuracy improvements
  • Add explainability via SHAP values for per-prediction feature attribution