rpmjp/portfolio
rpmjp/projects/student-management-system/challenges.md
CompletedApril to May 2026

AI-powered Student Management System

Production-grade full-stack platform with role-based portals, real-time analytics, and a Random Forest model that predicts academic risk with 96% accuracy.

Java 21Jakarta EEMySQL 8PythonFlaskscikit-learnTomcat 10
Languages
Java85.3%
CSS10.8%
Python2.8%
Other1.1%
challenges.md

Challenges and Solutions

Five real engineering problems I hit during this build, what I tried, what worked, and what I took away from each one. These aren't theoretical: they're the bugs that ate hours of debugging time and shaped how I think about distributed systems, servlet lifecycles, and data migrations.

1. Cross-language ML integration

Challenge: Connecting a Python ML model to a Java web application without tightly coupling the two systems. Bundling Jython or using JNI would have welded the languages together. Both approaches make the ML model impossible to retrain or replace without touching Java code.

Solution: Built the ML component as an independent Flask microservice with a clean REST API (/predict, /health). The Java backend communicates via HTTP POST using a dedicated MLClient utility class with timeout handling and error recovery. If the Flask service is unavailable, the app degrades gracefully: showing "service unavailable" instead of crashing.

What I learned: Microservice architecture patterns aren't just a buzzword. They're the right answer when two systems have different evolution rates, different runtime requirements, and different scaling characteristics. Cross-language API design comes down to defining a contract (the JSON schema) and treating each side as a black box. Graceful degradation isn't optional: every network call is a failure point.

2. Servlet filter redirect loops

Challenge: Implementing role-based access control with a servlet filter caused infinite redirect loops. Students logging in would bounce between /portal/home and /login indefinitely because the filter, login servlet, and portal servlet were redirecting to each other.

Solution: Systematic debugging with console logging in the filter to trace every request path and user state. Discovered two root causes:

  1. The PortalServlet mapped to /portal/* was intercepting its own JSP forwards
  2. The StudentDAO wasn't reading the user_id column, so the student record lookup always failed silently

Fixed by switching to explicit URL mappings (/portal/home, /portal/grades, /portal/risk) instead of a wildcard, and ensuring all DAO methods read the new database columns.

What I learned: The servlet lifecycle matters. The difference between forward (server-side, no new request) and redirect (client-side, new request that re-triggers the filter chain) is the difference between a working app and an infinite loop. Debugging distributed request flows requires systematic tracing: guessing where the loop starts is hopeless. Add logging at every checkpoint, run one request, read the trace top to bottom.

3. CUDA / TensorFlow GPU compatibility

Challenge: TensorFlow couldn't detect the NVIDIA RTX 4090 despite CUDA 13.0 being installed. The pip-installed TensorFlow was built against CUDA 12.x, and the bundled CUDA libraries weren't on the system library path.

Solution: Installed TensorFlow with pip install tensorflow[and-cuda] to bundle compatible CUDA 12 libraries. Then traced the exact missing library (libcudart.so.12) using ctypes in Python. Added all NVIDIA pip package library paths to LD_LIBRARY_PATH and persisted the configuration in the virtualenv's activate script so it survives shell restarts.

What I learned: GPU computing environments require careful version alignment across drivers, CUDA toolkit, cuDNN, and framework-specific builds. The solution isn't always installing the latest. It's matching compatible versions across the stack. When a library says "can't find GPU," the question is almost never "do I have a GPU". It's "can the library find the right CUDA runtime, in the right version, on the right path."

4. Database schema evolution

Challenge: Adding authentication (the users table, student IDs, the user_id foreign key on students) to an existing database with live data required careful schema migration without breaking existing functionality.

Solution: Used ALTER TABLE to add user_id and student_id columns to the students table, then wrote SQL scripts to retroactively generate student IDs and create user accounts for all existing students. Updated all DAO methods to read the new columns. A missed column read in getAllStudents() caused a subtle bug where student-user matching silently failed: visible only as a downstream auth failure, not as a SQL error.

What I learned: Schema migrations in production require updating every query that touches the modified table, not just the ones you think are affected. A single missed column read can cascade into auth failures that are difficult to diagnose because the SQL succeeds: it just returns incomplete data. The fix is to treat every DAO method as part of the migration's scope, not just the obviously affected ones.

5. Centralized theming across JSP pages

Challenge: Initially, each JSP page had its own inline <style> block with hundreds of lines of CSS. Adding dark mode meant duplicating theme logic across 10+ files: a maintenance nightmare and a guarantee of style drift.

Solution: Extracted all styles into a single theme.css file using CSS custom properties (variables) for every color, shadow, and border. Created a theme.js script that detects system preference (prefers-color-scheme), supports manual toggle, and persists the choice in localStorage. Every JSP page references the same two files: changing a color in one place updates the entire application instantly.

What I learned: CSS architecture matters more than people give it credit for. Custom properties aren't syntactic sugar, they enable runtime theming that's impossible with static values. This is the same pattern used by design systems at GitHub, Stripe, and Linear. The lesson generalizes: centralize the things that change together. If you'd update them in the same PR anyway, they should live in the same place.