home / rubrics / rubric_calibration_events

rubric_calibration_events: 28

This data as json

id teacher_id gradation_id_a gradation_id_b teacher_choice correct response_time_ms created_at confidence perceived_difficulty influential_feature margin rubric_version reasoning
28 deepseek-v4-pro 88 87 A 1 19434 2026-05-26 02:34:03     evidence_quality clearly 1 Response A demonstrates a more sophisticated evaluation of evidence by identifying a credibility gap (specific data early vs. vague attributions later) and an unresolved tension between paragraphs, while Response B merely analyzes a single framing detail. This deeper critique of how evidence quality varies within the text makes A clearly superior.
Powered by Datasette · Queries took 0.499ms