Blog
How to Design AI-Resistant Coding Exam Questions
AI coding assistants can solve most textbook programming prompts in seconds. This guide shows CS instructors how to write coding exam questions that measure real understanding, and how exam conditions back them up.
Examination Center is in early access — we're onboarding institutions through our Early Access Program. The information here describes our current platform and direction and may evolve; it is not a contractual commitment.
By the Examination Center team · Last updated: 2026-06-18
Why standard coding questions stopped working
Most classic programming prompts ask a student to produce a function from a clear, self-contained spec: "write a function that reverses a linked list" or "implement binary search." These are exactly the prompts that AI assistants handle best, because the problem is common, the spec is complete, and the answer is short. A student can paste the prompt into a chat tool and get working code with comments in one shot.
The harder truth is that you usually can't tell from the final code alone. A correct submission looks the same whether the student reasoned through it or copied it. So the fix isn't a cleverer phrasing of the same kind of question. It's a shift toward questions where the thinking is the deliverable, paired with exam conditions that make outside help difficult and visible.
Two levers matter, and you need both. Question design raises the cost of using AI to the point where it's slower than just knowing the material. Exam conditions remove the easy paths and capture evidence when something looks off. One without the other leaves a gap.
Question patterns that resist one-shot AI answers
AI-resistant coding exam questions tend to share a few traits: they depend on context the model doesn't have, they reward explanation over output, or they ask the student to engage with code rather than generate it from scratch. Here are patterns that work in practice.
- Tie the problem to your specific course context. Reference a data structure, a dataset, or an API you built together in lab. "Extend the
Gridclass from Lab 4 so that..." forces the student to know your code, not a generic version. - Ask them to debug or extend code, not write it cold. Give a flawed or partial implementation and ask what's wrong and why, then have them fix it. Reading and reasoning about unfamiliar code is a different, harder-to-outsource skill.
- Require an explanation alongside the code. Ask for a short comment block on the time complexity, a trade-off they chose, or why an alternative approach would fail. Graders can probe understanding here even when the code itself is short.
- Add constraints that change the obvious answer. "Solve this without recursion," "in a single pass," or "using only the standard library functions we covered." Constraints break the default solution a model reaches for first.
- Use multi-part questions that build on each other. Part B depends on a decision made in Part A. A student who understands carries it through; a pasted answer to each part in isolation tends to drift or contradict itself.
Make the questions practical, not just AI-proof
It's easy to overcorrect. Questions so obscure that no tool can help often also confuse students who do know the material, and they're harder to grade fairly. The goal is to measure understanding, not to win an arms race.
A useful test: would a student who attended your lectures and did the labs find this fair, while a student relying only on an AI tool would struggle to finish in time? If yes, you're in the right zone. Keep specs clear and keep the scope honest for the time allowed. Difficulty should come from depth of reasoning, not from ambiguity or trivia.
Calibrate with your own materials. Run a draft question through a current AI assistant yourself before the exam. If it produces a complete, correct answer instantly, revise: add course-specific context, a constraint, or an explanation requirement. If it produces something plausible but wrong, or needs information only your class has, you're likely on solid ground.
Question design alone isn't enough: exam conditions matter
Even well-designed questions leak value if a student can quietly run them through an assistant during the exam. This is where the testing environment does work that wording can't. The aim is a setting that's identical for everyone, removes the easiest shortcuts, and records signals a human can review afterward.
Examination Center is built for exactly this part of the problem. It gives every student the same plain, AI-free editor: no built-in AI assistant and no autocomplete, so the environment itself doesn't hand out answers. Code runs right there, with Python in the browser (NumPy, pandas, Matplotlib) and C, C++, Fortran, and Java compiled and run in a secure server sandbox, so your questions can use real tools without students installing anything.
Just as important, it captures integrity evidence for human review: paste events, large or sudden edits, and cross-student code similarity. These are surfaced as evidence for an instructor to interpret, never as automated accusations or verdicts. A burst of pasted code right after a hard sub-question is a prompt to look closer, not a guilty finding. And because the platform does not grade or score, academic judgment stays with you.
One more practical benefit during high-stakes exams: autosave and session recovery mean a frozen or closed browser doesn't wipe out a student's work, so a technical glitch never becomes an integrity dispute.
A workflow you can use this term
Pulling it together, here's a repeatable process for building an exam that holds up.
- Draft questions from your course context using the patterns above: debug-and-extend, explanation-required, constrained, and multi-part.
- Stress-test each one against a live AI tool yourself. Revise anything it solves cleanly and instantly.
- Run the exam in a controlled, AI-free environment that's the same for every student and keeps work safe with autosave.
- Review integrity evidence after the fact, treating paste, sudden-edit, and similarity signals as starting points for a conversation, not conclusions.
- Keep grading in your own workflow, where you can weigh code quality and the reasoning the questions were designed to surface.
The takeaway
No single question is truly AI-proof, and you shouldn't aim for that. What you can do is make outside help slow, awkward, and visible, while measuring the understanding you actually care about.
Good question design and a fair, monitored exam environment reinforce each other. Together they give you results you can trust and defend, without turning your exam into an arms race or your platform into an accuser.
Related reading
Run AI-free Python lab exams · Examination Center vs autograders · Glossary: AI-free exam, integrity evidence · How exams stay secure
Run fair coding exams
Early Access scope: up to 40 students and 1 exam. Indicative pricing only — see pricing or apply for Early Access.
FAQ
What makes a coding exam question "AI-resistant"?
It depends on something an AI assistant doesn't have or doesn't do well: your specific course context, reading and debugging unfamiliar code, explaining trade-offs, or working under constraints that break the obvious solution. Generic, self-contained prompts like "implement binary search" are the easiest for AI to answer in one shot, so they offer the least signal about a student's own ability.
Can I make questions completely impossible for AI to solve?
No, and chasing that usually backfires. Questions obscure enough to defeat every tool tend to confuse students who do know the material and are harder to grade fairly. A better goal is to make AI help slower than simply knowing the content, then back the questions with exam conditions that remove easy shortcuts and capture evidence for review.
How does Examination Center help with AI-resistant exams?
It provides the exam conditions that question design alone can't. Every student gets the same AI-free editor with no built-in assistant or autocomplete, code runs in the browser (Python) or a secure server sandbox (C, C++, Fortran, Java), and the platform captures integrity evidence such as paste events, sudden edits, and code similarity for human review. It does not grade or accuse; judgment stays with you.
Does the platform detect or accuse students of cheating with AI?
No. Examination Center surfaces integrity evidence for a human to interpret, never automated verdicts or accusations. A signal like a large paste after a difficult question is a reason to look more closely, not proof of misconduct. The instructor decides what it means, and grading stays entirely in your own workflow.