The advent of Large Language Models (LLMs) presents a significant challenge to conventional automated programming assessment. Students can now readily produce functionally correct code, obscuring their actual grasp of programming concepts. This paper, accepted for publication at CSEDU 2026, addresses this critical gap by proposing a new paradigm for verifying student understanding in the age of LLM-generated code. The researchers conducted a saturation-based scoping review of conversational assessment approaches, identifying three dominant architectural families: rule-based/template-driven systems, LLM-based systems, and hybrid systems. While conversational agents show promise for scalable feedback and probing deeper understanding, challenges like hallucinations, over-reliance, privacy, and deployment constraints persist.
Bridging the Understanding Chasm with Hybrid Socratic Assessment
To overcome these limitations, the authors introduce a Hybrid Socratic Framework designed to integrate conversational verification seamlessly into Automated Programming Assessment Systems (APASs). This framework innovatively combines deterministic code analysis with a sophisticated dual-agent conversational layer. It incorporates knowledge tracking, scaffolded questioning, and crucial guardrails that anchor prompts to verifiable runtime facts. This approach moves beyond simple code correctness to assess genuine comprehension, a critical need in the era of LLM programming assessment.