Giving Your AI a Job Interview

Source: One Useful Thing
Author: Ethan Mollick
Original source: https://www.oneusefulthing.org/p/giving-your-ai-a-job-interview

Private backup: the full article text is archived in the private repository at archives/articles/oneusefulthing-org-giving-your-ai-a-job-interview.source.md. It is not published on the public Quartz site.

Summary

Ethan Mollick argues that public AI benchmarks are useful for tracking broad model progress but insufficient for deciding which AI system is best for a specific person or organization. Benchmarks can be contaminated, poorly calibrated, narrow, or disconnected from local work. Individuals may rely on personal “vibes” tests, but organizations need rigorous evaluations built from realistic tasks, repeated trials, and expert assessment. The core metaphor is that AI systems should be evaluated less like generic software and more like candidates interviewing for the actual work and judgments they will perform.

Big ideas

Claims

Key evidence and examples

  • The article lists benchmark problems including training-data contamination, unclear measurement, calibration issues, and question errors.
  • Mollick contrasts personal tests such as image prompts or small coding/writing challenges with organization-scale evaluations.
  • OpenAI’s GDPval is used as an example of workplace-task evaluation with expert-created tasks and blind expert grading.
  • The “GuacaDrone” example shows that models can have different optimism, skepticism, and risk-tolerance patterns that shape recommendations.

Education relevance

Useful for schools and universities choosing AI tools or teaching AI literacy: educators should test tools against real instructional, assessment, advising, administrative, and policy tasks rather than relying only on vendor claims or leaderboard scores.

My notes