rubric_calibration_events: 28
This data as json
| id | teacher_id | gradation_id_a | gradation_id_b | teacher_choice | correct | response_time_ms | created_at | confidence | perceived_difficulty | influential_feature | margin | rubric_version | reasoning |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 28 | deepseek-v4-pro | 88 | 87 | A | 1 | 19434 | 2026-05-26 02:34:03 | evidence_quality | clearly | 1 | Response A demonstrates a more sophisticated evaluation of evidence by identifying a credibility gap (specific data early vs. vague attributions later) and an unresolved tension between paragraphs, while Response B merely analyzes a single framing detail. This deeper critique of how evidence quality varies within the text makes A clearly superior. |