Blog
How Do You Assess Coding Skill When Every Student Has an AI?
For two years, much of what we call coding assessment has measured the AI, not the student. That is not a cheating problem; it is an instrument-validity problem, and the fix is not a better detector. It is a different kind of exam.
Examination Center is in early access — we're onboarding institutions through our Early Access Program. The information here describes our current platform and direction and may evolve; it is not a contractual commitment.
By the Examination Center team · Last updated: 2026-06-18
We do not have a cheating problem. We have a validity problem.
Programming education has always leaned on a convenient assumption: if the code runs and passes the tests, the student probably understands it. Generative AI severed that link. A student can now submit correct, well-structured code they could not have written and cannot explain.
The artifact looks identical to mastery. The signal and the noise are now visually indistinguishable in the submitted file, so no amount of staring at the code tells you which one you are holding. The instrument we inherited, the take-home and the online code box, was built for a world where producing working code was itself evidence of understanding. That world is gone.
Why the popular fixes do not hold
Four responses dominate faculty meetings. Each fails for a structural reason, not a tooling reason:
- AI detectors: unreliable for prose and worse for code, which has low entropy and high convergence. The error rate makes detector output indefensible in an integrity hearing.
- Banning AI on take-homes: a ban you cannot observe is a wish, not a policy, and it penalizes the honest students who comply.
- Handwritten paper code: measures handwriting and memorization under stress, which is less authentic than the thing you are worried about, not more.
- Oral exams: they genuinely work, and do not scale past about thirty students without consuming the entire teaching staff.
The reframe: separate learning from assessment
The claim that AI should be banned from the classroom and the claim that AI makes assessment impossible are the same mistake. Both assume learning and assessment must happen under the same conditions. They should not.
Learning is generative and open: students should use every tool, including AI, to build understanding. Assessment is a controlled measurement: its job is to isolate one variable, what can this student do on their own, and read it cleanly. A measurement you cannot control is not a measurement.
So the answer is not to take AI away from students. It is to give them AI for everything except the moment of measurement, and make that moment controlled, authentic, and observable. Use AI all term; turn it off, and prove it is off, for the exam.
What a defensible coding assessment requires
Once the assessment moment is a controlled measurement, the requirements fall out almost mechanically. A coding exam you could defend has to be:
- Authentic: the student writes and runs real code, the way they actually work, not multiple-choice questions about code.
- Controlled: during the exam, the assistance that invalidates the measurement, AI chat, autocomplete, second-screen paste, is off and verifiably so.
- Observable: you can see the process, not just the final artifact. Process is the part AI cannot fake for them and the part the submitted file throws away.
- Low-friction: it runs in the browser with nothing to install and lives inside the LMS, or it gets abandoned the first busy week.
- Humane: it assumes good faith, recovers gracefully when a laptop drops Wi-Fi, and never turns a midterm into a hostage situation.
From detecting cheating to designing for integrity
Detection is a losing arms race: every detector invites a workaround and you are always one model release behind. Design is durable. If AI is genuinely unavailable during a controlled, observed exam, there is nothing to detect, because there is nothing to catch.
It is also the honest position with students. We expect you to use AI to learn, and here is the one bounded window where we measure you without it, so your transcript means something, is a policy students respect. Integrity by design is integrity by respect.
The uncomfortable conclusion
If you teach programming, your assessment instrument is probably broken right now, and patching it with detectors and pledges will not fix it. The fix is a different kind of exam moment, authentic and controlled and observable and humane, designed on purpose and separate from the open, AI-rich way students should learn the rest of the time.
The faculties that get this right will not be the ones with the best detector. They will be the ones who stopped trying to detect and started designing the room where the measurement happens.
Related reading
How to prevent AI cheating in coding exams · Examination Center vs Google Colab · Examination Center vs JupyterHub
Run fair coding exams
Early Access scope: up to 40 students and 1 exam. Indicative pricing only — see pricing or apply for Early Access.
FAQ
Should I ban AI from my course?
No. Separate learning from assessment: let students use AI to learn all term, then run a controlled, AI-off exam to measure what they can do unaided. Banning AI from learning handicaps them; measuring while AI is present is not a measurement.
Do AI detectors work for code?
Not reliably. Source code has low entropy and high convergence, so detector false positives and negatives are high and hard to defend in an integrity hearing. Designing an AI-free exam environment is more durable than detecting after the fact.