There's a version of SHSATlab we almost shipped.

In that version, your child would sit down, attempt an SHSAT practice problem, get it wrong, read a detailed AI explanation — and learn the wrong thing. The explanation would be confident, well-written, and incorrect.

We found this problem ourselves, before any NYC student did. Here's what it was, how common it turned out to be, and what we built to make sure it never reaches a student.

How It Happens

When an AI model generates a test question, it produces several things at once: the question text, the answer choices, the correct answer label, and an explanation.

The problem is that the model generates all of these in a single pass, as a prediction. It doesn't stop and solve the problem independently. It doesn't verify that the answer it labeled as correct is actually correct. It just assigns a label based on patterns in its training data.

For most question types, this works. But for the specific question formats on the Specialized High School Admissions Test — particularly in ELA — the model makes systematic errors.

The most common failure we found: main idea questions where the labeled correct answer only covered part of the passage.

The model would generate a 4-paragraph reading comprehension passage, write four answer choices, and label as correct an answer that described the theme of the last paragraph — not the central idea of the whole piece.

The explanation it wrote was coherent and convincing. A student who read it would nod along, think they understood, and walk away with the wrong mental model for main idea questions.

What We Did When We Found It

The first thing we did was run an audit. We pulled every question in the existing bank and ran each one through a second, independent model call — asking the model to solve the question without knowing what the first pass had labeled as correct.

The disagreement rate was higher than we expected.

Questions where the two passes disagreed were immediately removed from the student-facing bank. All of them, with no exceptions.

This meant temporarily having fewer questions available. We made that trade without hesitation. A wrong question in the bank is worse than an empty slot — a wrong question actively harms the student who practices with it.

The System We Built to Replace the Old One

After the audit, we rebuilt the question generation pipeline from scratch.

Every new question now goes through three stages before a student sees it:

Stage 1 — Generation: The model generates the question, answer choices, and explanation.

Stage 2 — Independent verification: A separate model call, with no knowledge of Stage 1's output, solves the question from scratch. If the two answers disagree, the question is automatically rejected.

Stage 3 — Human review: Questions that pass Stage 2 go into a review queue. A human looks at each one — checking the question structure, the answer choices, the explanation, and whether the format matches the actual Specialized High School Admissions Test.

Only questions that clear all three stages enter the live bank.

We also built format-specific rules for SHSAT ELA questions that address the specific failure modes we found:

Main idea answers must account for the entire passage, not a single section

Inference answers must be directly supported by text, not just plausible

Revision/Editing questions must use the actual numbered-sentence format from the real SHSAT

The labeled correct answer on multiple-choice questions must match the independently verified answer

Why This Matters for NYC Families

The Specialized High School Admissions Test is one high-stakes exam taken one time. There are no retakes for 8th graders. Every hour your child spends practicing is an investment — and like any investment, the quality of what you're practicing against determines the return.

A student who practices 500 questions on a platform where 10–15% of the questions have wrong answer keys isn't building accurate skills. They're building confident-but-wrong intuitions that will cost them on test day.

The three months we spent fixing this problem weren't visible to users. There was no feature announcement, no new screen in the app. Just a guarantee that the questions your child practices on are the ones they should be learning from.

That's the standard we hold ourselves to — and the one every SHSAT prep platform for New York City students should be held to.

We Spent 3 Months Fixing a Problem You'd Never Notice (Unless Your Child Got It Wrong)

How It Happens

What We Did When We Found It

The System We Built to Replace the Old One

Why This Matters for NYC Families

See where your child actually stands