rubric_calibration_events
143 rows
This data as json, CSV (advanced)
Suggested facets: teacher_id, teacher_choice, correct, influential_feature, margin, created_at (date)
| id ▼ | teacher_id | gradation_id_a | gradation_id_b | teacher_choice | correct | response_time_ms | created_at | confidence | perceived_difficulty | influential_feature | margin | rubric_version | reasoning |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | teacher-kyle | 164 | 163 | B | 0 | 2026-05-24 00:21:28 | 1 | ||||||
| 2 | teacher-kyle | 131 | 130 | A | 1 | 2026-05-25 02:13:59 | 1 | ||||||
| 3 | teacher-kyle | 136 | 135 | A | 1 | 2026-05-25 02:14:24 | 1 | ||||||
| 4 | teacher-kyle | 162 | 163 | A | 0 | 2065 | 2026-05-25 22:03:29 | somewhat | 1 | ||||
| 5 | deepseek-v4-pro | 162 | 161 | A | 1 | 11713 | 2026-05-26 02:27:12 | independence_level | clearly | 1 | Response A shows the student independently selecting a scene, identifying relevant evidence, and connecting it to their claim with personal reasoning, while Response B relies entirely on the teacher’s direction and a provided sentence starter, demonstrating minimal self-directed effort. | ||
| 6 | deepseek-v4-pro | 163 | 162 | A | 1 | 16600 | 2026-05-26 02:27:29 | original_thinking | clearly | 1 | Response A demonstrates original thinking by interpreting the detail's emotional nuance and constructing a paragraph around that insight, while Response B follows a checklist mechanically and offers only a surface-level connection. | ||
| 7 | deepseek-v4-pro | 163 | 164 | B | 1 | 19508 | 2026-05-26 02:27:49 | original_thinking | clearly | 1 | Response B shows a higher degree of autonomy by noting that the student independently flagged a key passage on a first read before the assignment and then, on their own initiative, synthesized a second passage from discussion to create a richer comparison, demonstrating original thinking beyond a single prompted analysis. | ||
| 8 | deepseek-v4-pro | 165 | 166 | B | 1 | 14903 | 2026-05-26 02:28:05 | evidence_quality | obviously | 1 | Response B incorporates a direct quotation from the text and explains its significance, demonstrating stronger evidence use and analysis, whereas Response A merely summarizes without textual support. | ||
| 9 | deepseek-v4-pro | 166 | 167 | A | 0 | 23468 | 2026-05-26 02:28:29 | 1 | |||||
| 10 | deepseek-v4-pro | 168 | 167 | A | 1 | 22540 | 2026-05-26 02:28:52 | depth_of_thinking | clearly | 1 | Response A draws on evidence from two distinct points in the text to trace a character's developing complexity, showing a layered interpretation. Response B's analysis, while focused, remains confined to a single moment and lacks that cross-textual insight. | ||
| 11 | deepseek-v4-pro | 130 | 129 | A | 1 | 17918 | 2026-05-26 02:29:10 | self_awareness | clearly | 1 | Response A demonstrates self-awareness by reflecting on the specific tool (template) that guided their thinking and admitting what they wouldn't have connected on their own, showing an understanding of their learning process. In contrast, Response B simply recounts following teacher instructions and needing direct guidance, lacking this metacognitive reflection. | ||
| 12 | deepseek-v4-pro | 131 | 130 | A | 1 | 18293 | 2026-05-26 02:29:29 | original_thinking | clearly | 1 | Response A demonstrates original thinking by independently tracking how Elena redefines freedom across multiple scenes and interpreting symbolic details like the burning letter's curl; Response B explicitly credits a template for its connections, showing less independent analysis. | ||
| 13 | deepseek-v4-pro | 131 | 132 | B | 1 | 13235 | 2026-05-26 02:29:43 | original_thinking | clearly | 1 | Response B extends beyond textual analysis to forge an original intertextual connection and frames the argument as a planned discussion contribution, while Response A remains a solid but more conventional internal tracking of a character's conflict. | ||
| 14 | deepseek-v4-pro | 134 | 133 | A | 1 | 10939 | 2026-05-26 02:29:55 | depth_of_thinking | clearly | 1 | Response A delves into Elena's internal conflict and mixed feelings with specific textual support, revealing a layered character, whereas Response B stays at a superficial level, merely labeling her traits without analysis. | ||
| 15 | deepseek-v4-pro | 134 | 135 | B | 1 | 13411 | 2026-05-26 02:30:09 | depth_of_thinking | clearly | 1 | Response B analyzes how Elena's conflicting motivations connect to the story's central argument about freedom, moving beyond personal conflict to thematic significance, whereas Response A only describes her mixed feelings. | ||
| 16 | deepseek-v4-pro | 135 | 136 | B | 1 | 19204 | 2026-05-26 02:30:28 | depth_of_thinking | clearly | 1 | Response B introduces the mother’s unsent brochures to construct a generational parallel, then questions whether Elena’s choice can be free at all—shifting the interpretation from resolved transformation to unresolved cycle. This layered analysis moves beyond personal motivation to interrogate the text’s deeper argument about inherited obligation and ambiguity. | ||
| 17 | deepseek-v4-pro | 169 | 170 | B | 1 | 22131 | 2026-05-26 02:30:51 | independence_level | clearly | 1 | Response A relies entirely on teacher explanation, provided charts, and direct pointing to a passage, while Response B builds on a group discussion prompt to generate an original insight and then independently seeks out further evidence. | ||
| 18 | deepseek-v4-pro | 171 | 170 | A | 1 | 23114 | 2026-05-26 02:31:15 | independence_level | clearly | 1 | Response A independently analyzes the text with no mention of external guidance, whereas Response B relies on a group discussion prompt and acknowledges not having the insight without it. This stark difference in self-directed thinking makes A more autonomous. | ||
| 19 | deepseek-v4-pro | 172 | 171 | A | 1 | 17791 | 2026-05-26 02:31:33 | original_thinking | clearly | 1 | Response A makes an original argument about how the narrative challenges default ways of knowing and connects independently to another text, while B offers a more conventional analysis limited to the text. | ||
| 20 | deepseek-v4-pro | 173 | 174 | B | 1 | 8966 | 2026-05-26 02:31:43 | depth_of_thinking | clearly | 1 | Response B provides a deeper analysis of how Davi's limited perspective shapes the reader's experience and creates isolation, using a specific quote to support the reasoning, while Response A only identifies the narrator and basic sensory details. | ||
| 21 | deepseek-v4-pro | 174 | 175 | B | 1 | 16112 | 2026-05-26 02:32:00 | depth_of_thinking | clearly | 1 | Response B offers a nuanced analysis of how the POV creates meaning, interpreting the simile's details and connecting them to a thematic argument about isolation and perception, while Response A only describes the basic emotional effect of the limited perspective. | ||
| 22 | deepseek-v4-pro | 176 | 175 | A | 1 | 15534 | 2026-05-26 02:32:16 | depth_of_thinking | clearly | 1 | Response A delves into the epistemological implications of the point of view, analyzing how the narration exposes the sighted reader's perceptual bias and treats non-visual knowledge as authoritative, while Response B interprets the same evidence mainly to highlight theme and character isolation, offering a less layered analysis. | ||
| 23 | deepseek-v4-pro | 81 | 82 | B | 1 | 13746 | 2026-05-26 02:32:30 | independence_level | clearly | 1 | Response A depends on the teacher’s specific direction to locate evidence, while Response B uses a general checklist to independently select and justify a piece of evidence, demonstrating higher self-directed learning. | ||
| 24 | deepseek-v4-pro | 82 | 83 | B | 1 | 23298 | 2026-05-26 02:32:54 | independence_level | clearly | 1 | Response B shows a student independently evaluating evidence strength ('That felt like the strongest evidence') and building an analysis around it, while Response A merely follows a checklist to insert a required quote, indicating greater reliance on external scaffolding. | ||
| 25 | deepseek-v4-pro | 83 | 84 | B | 1 | 21690 | 2026-05-26 02:33:16 | self_awareness | clearly | 1 | Response B demonstrates a deeper self-awareness by critically evaluating the evidence's selective use and adjusting their analysis to more nuanced data, while Response A simply selects and analyzes a given piece of evidence without questioning its validity. | ||
| 26 | deepseek-v4-pro | 86 | 85 | A | 1 | 13493 | 2026-05-26 02:33:30 | evidence_quality | clearly | 1 | Response A directly quotes specific textual evidence and explains its significance, while Response B only makes general claims without citing any concrete details from the article. | ||
| 27 | deepseek-v4-pro | 87 | 86 | A | 1 | 11979 | 2026-05-26 02:33:43 | depth_of_thinking | obviously | 1 | Response A moves beyond simple identification of evidence to analyze the author's framing and rhetorical choices—explaining how the pairing of the statistic with 'accelerated' builds a policy argument and shifts responsibility. Response B merely summarizes the point of the evidence. | ||
| 28 | deepseek-v4-pro | 88 | 87 | A | 1 | 19434 | 2026-05-26 02:34:03 | evidence_quality | clearly | 1 | Response A demonstrates a more sophisticated evaluation of evidence by identifying a credibility gap (specific data early vs. vague attributions later) and an unresolved tension between paragraphs, while Response B merely analyzes a single framing detail. This deeper critique of how evidence quality varies within the text makes A clearly superior. | ||
| 29 | deepseek-v4-pro | 257 | 258 | B | 1 | 20644 | 2026-05-26 02:34:24 | self_awareness | somewhat | 1 | Response B demonstrates greater autonomy by showing self-awareness—the student recognizes a gap in their rebuttal and considers how to improve, whereas Response A simply completed the worksheet without reflection or initiative. | ||
| 30 | deepseek-v4-pro | 259 | 258 | A | 1 | 19709 | 2026-05-26 02:34:45 | original_thinking | clearly | 1 | Response A demonstrates original thinking by crafting a nuanced claim about identity narrowing rather than repeating the typical burnout argument, and by devising a sophisticated strategy to engage with data-driven counterclaims. In contrast, Response B relies on a standard claim and acknowledges that it hasn't fully developed the counterargument. | ||
| 31 | deepseek-v4-pro | 260 | 259 | A | 1 | 18402 | 2026-05-26 02:35:04 | self_awareness | clearly | 1 | Response A demonstrates deeper self-awareness by describing the iterative revision of their claim based on thoughtful consideration of counterarguments, actively seeking to 'steelman' the most challenging point to make their essay more honest. Response B also engages a counterclaim but lacks the same explicit reflection on the development of their own thinking. | ||
| 32 | deepseek-v4-pro | 262 | 261 | A | 1 | 13723 | 2026-05-26 02:35:18 | clarity_of_reasoning | obviously | 1 | Response A makes a clear claim and supports it with reasoning and evidence, while Response B only announces the topic without taking a position. | ||
| 33 | deepseek-v4-pro | 263 | 262 | A | 1 | 20237 | 2026-05-26 02:35:39 | depth_of_thinking | clearly | 1 | Response A demonstrates nuanced understanding by acknowledging valid research while critiquing its misapplication, whereas Response B offers a one-sided, simplistic argument. | ||
| 34 | deepseek-v4-pro | 263 | 264 | B | 1 | 21272 | 2026-05-26 02:36:01 | depth_of_thinking | clearly | 1 | Response B moves beyond a surface-level critique of misapplied science to examine the structural economic incentives and social dynamics that drive early specialization, demonstrating a more layered and sophisticated analysis of the issue. | ||
| 35 | deepseek-v4-pro | 249 | 250 | B | 1 | 19081 | 2026-05-26 02:36:20 | independence_level | clearly | 1 | Response B demonstrates greater autonomy by independently using a model essay and rubric to structure the argument and self-assessing weaknesses, whereas Response A relied on teacher-provided evidence and direct assistance to connect evidence to claims. | ||
| 36 | deepseek-v4-pro | 250 | 251 | B | 1 | 17805 | 2026-05-26 02:36:39 | original_thinking | clearly | 1 | Response B demonstrates original thinking by independently crafting a counterclaim that engages with specific data and a nuanced rebuttal, while Response A relies heavily on a model essay and produces a generic dismissal. | ||
| 37 | deepseek-v4-pro | 252 | 251 | A | 1 | 22128 | 2026-05-26 02:37:01 | self_awareness | clearly | 1 | Response A demonstrates self-awareness by recognizing an initial oversimplification and revising the entire argument's framing, whereas Response B only constructs a standard counterargument without reflecting on or altering its original thesis. | ||
| 38 | deepseek-v4-pro | 253 | 254 | B | 1 | 11771 | 2026-05-26 02:37:14 | evidence_quality | clearly | 1 | Response B includes a specific citation and data (elephants having less than 1% of natural range) and acknowledges a counterargument, while Response A uses only broad, unsupported claims. | ||
| 39 | deepseek-v4-pro | 255 | 254 | A | 1 | 20224 | 2026-05-26 02:37:35 | depth_of_thinking | clearly | 1 | Response A engages deeply with the counterclaim by citing specific species recovery examples and using statistical evidence to rebut, whereas B dismisses it with a simpler, less substantiated principle. | ||
| 40 | deepseek-v4-pro | 256 | 255 | A | 1 | 14983 | 2026-05-26 02:37:50 | depth_of_thinking | clearly | 1 | Response A moves beyond direct evidence to analyze the structural interdependence of entertainment and conservation, revealing a systems-level problem that complicates the ethical debate in a way Response B's more straightforward argument does not. | ||
| 41 | deepseek-v4-pro | 338 | 337 | A | 1 | 18422 | 2026-05-26 02:38:09 | self_awareness | clearly | 1 | Response A shows a student actively evaluating their own thesis and identifying a need for greater specificity, demonstrating self-awareness and initiative. Response B relies heavily on a teacher-provided frame with little evidence of independent thought. | ||
| 42 | deepseek-v4-pro | 339 | 338 | A | 1 | 14864 | 2026-05-26 02:38:25 | independence_level | clearly | 1 | Response A demonstrates self-directed revision from a simple fact to a nuanced, arguable thesis, while Response B relies on external prompts and remains uncertain about how to improve further. | ||
| 43 | deepseek-v4-pro | 339 | 340 | B | 1 | 18238 | 2026-05-26 02:38:44 | initiative | clearly | 1 | Response B demonstrates greater initiative by rewriting the thesis after research revealed a deeper structural insight, proactively restructuring the essay to reflect a more fundamental argument shift, while Response A revises primarily for nuance without the same level of proactive restructuring. | ||
| 44 | deepseek-v4-pro | 342 | 341 | A | 1 | 10762 | 2026-05-26 02:38:55 | clarity_of_reasoning | clearly | 1 | Response A presents a clear, arguable thesis with a specific cause-and-effect reasoning chain, while Response B merely notes the topic and differing opinions without taking a stance or providing any reasoning. | ||
| 45 | deepseek-v4-pro | 342 | 343 | B | 1 | 13742 | 2026-05-26 02:39:09 | overall_sophistication | clearly | 1 | Response B presents a nuanced, multi-layered argument connecting sleep science, resource allocation, and socioeconomic disparities, whereas Response A is a simple, one-dimensional claim. | ||
| 46 | deepseek-v4-pro | 344 | 343 | A | 1 | 15311 | 2026-05-26 02:39:25 | overall_sophistication | somewhat | 1 | Response A offers a more incisive critique by examining how the framing of the debate itself protects the status quo, converting a political decision into a neutral-sounding constraint, while Response B presents a clearer but more conventional evidence-based link to resource inequality. | ||
| 47 | deepseek-v4-pro | 305 | 306 | B | 1 | 14628 | 2026-05-26 02:39:40 | independence_level | clearly | 1 | Response B demonstrates significantly more autonomy by independently using a reverse outline to diagnose and fix redundancy, strategically reordering paragraphs, and choosing transitions purposefully, whereas Response A relies entirely on the teacher's provided outline and class brainstorming order without self-initiated organizational decisions. | ||
| 48 | deepseek-v4-pro | 306 | 307 | B | 1 | 15657 | 2026-05-26 02:39:57 | original_thinking | clearly | 1 | Response B shows original thinking by independently designing a purposeful organizational arc that builds argumentative stakes, while Response A's revisions were largely prompted by a teacher and an exercise. | ||
| 49 | deepseek-v4-pro | 308 | 307 | A | 1 | 17047 | 2026-05-26 02:40:14 | original_thinking | clearly | 1 | Response A shows more original thinking by inventively restructuring the essay to start with a strengthened counterargument for rhetorical impact and cutting a summary in favor of a question to maintain momentum, whereas Response B follows a more predictable progression. | ||
| 50 | deepseek-v4-pro | 309 | 310 | B | 1 | 10342 | 2026-05-26 02:40:25 | clarity_of_reasoning | clearly | 1 | Response B demonstrates clear organization with a thesis, separate body paragraphs, a counterargument, and a conclusion, while Response A presents a disjointed list of claims without logical progression or structure. |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE rubric_calibration_events (
id INTEGER PRIMARY KEY,
teacher_id TEXT NOT NULL,
gradation_id_a INTEGER NOT NULL,
gradation_id_b INTEGER NOT NULL,
teacher_choice TEXT NOT NULL,
correct INTEGER,
response_time_ms INTEGER,
created_at TEXT DEFAULT (datetime('now'))
, confidence TEXT, perceived_difficulty TEXT, influential_feature TEXT, margin TEXT, rubric_version INTEGER DEFAULT 1, reasoning TEXT);
CREATE INDEX idx_calibration_teacher ON rubric_calibration_events(teacher_id);
CREATE INDEX idx_calibration_correct ON rubric_calibration_events(correct);