If Testing Companies Use AI to Grade

Source: Nick Potkalitsky Substack
Author: Nick Potkalitsky
Original source: https://nickpotkalitsky.substack.com/p/if-testing-companies-use-ai-to-grade

Private backup: the full article text is archived in the private repository at archives/articles/nickpotkalitsky-substack-com-if-testing-companies-use-ai-to-grade.source.md. It is not published on the public Quartz site.

Summary

Nick Potkalitsky investigates what it means when standardized tests or districts use “AI” to grade student writing. He distinguishes Ohio’s operational use of discriminative AI, which classifies existing writing under human-validated scoring systems, from generative AI tools like ChatGPT, which produce new text and remain unreliable for grading. The article emphasizes prompt sensitivity, model drift, inconsistent harshness or leniency, and potential bias against English learners when training data is insufficient or unrepresentative. Potkalitsky argues educators should ask precise questions about AI type, training data, validation, human oversight, and which students might be disadvantaged.

Big ideas

Claims

Key evidence and examples

  • Ohio’s system is described as a hybrid human-AI process rather than ChatGPT-style grading.
  • The article contrasts operational scoring systems with generative AI experiments that are sensitive to prompts, model versions, and scoring persona.
  • It cites concerns that English learner essays can be scored lower than native-speaker essays judged equal by human raters.
  • The article calls for documentation, validation, representative data, bias testing, and human oversight before consequential grading use.

Education relevance

Highly relevant for standardized testing, district AI procurement, writing assessment, English learner equity, and responsible AI governance in schools.

My notes