If Testing Companies Use AI to Grade
Source: Nick Potkalitsky Substack
Author: Nick Potkalitsky
Original source: https://nickpotkalitsky.substack.com/p/if-testing-companies-use-ai-to-grade
Private backup: the full article text is archived in the private repository at archives/articles/nickpotkalitsky-substack-com-if-testing-companies-use-ai-to-grade.source.md. It is not published on the public Quartz site.
Summary
Nick Potkalitsky investigates what it means when standardized tests or districts use “AI” to grade student writing. He distinguishes Ohio’s operational use of discriminative AI, which classifies existing writing under human-validated scoring systems, from generative AI tools like ChatGPT, which produce new text and remain unreliable for grading. The article emphasizes prompt sensitivity, model drift, inconsistent harshness or leniency, and potential bias against English learners when training data is insufficient or unrepresentative. Potkalitsky argues educators should ask precise questions about AI type, training data, validation, human oversight, and which students might be disadvantaged.
Big ideas
- AI tools should be judged by the work they will actually do
- Schools should start with learning values before choosing AI tools
- District AI work is a long-term redesign project
- AI access tiers can widen educational inequity
Claims
- AI grading systems need transparency, validation, and bias checks
- AI tools should be tested on the real tasks they will be used for
- Schools should start with learning values before choosing AI tools
- Rushed school AI plans can worsen wellbeing and equity risks
Key evidence and examples
- Ohio’s system is described as a hybrid human-AI process rather than ChatGPT-style grading.
- The article contrasts operational scoring systems with generative AI experiments that are sensitive to prompts, model versions, and scoring persona.
- It cites concerns that English learner essays can be scored lower than native-speaker essays judged equal by human raters.
- The article calls for documentation, validation, representative data, bias testing, and human oversight before consequential grading use.
Education relevance
Highly relevant for standardized testing, district AI procurement, writing assessment, English learner equity, and responsible AI governance in schools.