rubrics: rubric_calibration

143 rows

descending

id ▼	teacher_id	gradation_id_a	gradation_id_b	teacher_choice	correct	response_time_ms	created_at	influential_feature	margin	rubric_version	reasoning
1	teacher-kyle	164	163	B	0		2026-05-24 00:21:28			1
2	teacher-kyle	131	130	A	1		2026-05-25 02:13:59			1
3	teacher-kyle	136	135	A	1		2026-05-25 02:14:24			1
4	teacher-kyle	162	163	A	0	2065	2026-05-25 22:03:29		somewhat	1
5	deepseek-v4-pro	162	161	A	1	11713	2026-05-26 02:27:12	independence_level	clearly	1	Response A shows the student independently selecting a scene, identifying relevant evidence, and connecting it to their claim with personal reasoning, while Response B relies entirely on the teacher’s direction and a provided sentence starter, demonstrating minimal self-directed effort.
6	deepseek-v4-pro	163	162	A	1	16600	2026-05-26 02:27:29	original_thinking	clearly	1	Response A demonstrates original thinking by interpreting the detail's emotional nuance and constructing a paragraph around that insight, while Response B follows a checklist mechanically and offers only a surface-level connection.
7	deepseek-v4-pro	163	164	B	1	19508	2026-05-26 02:27:49	original_thinking	clearly	1	Response B shows a higher degree of autonomy by noting that the student independently flagged a key passage on a first read before the assignment and then, on their own initiative, synthesized a second passage from discussion to create a richer comparison, demonstrating original thinking beyond a single prompted analysis.
8	deepseek-v4-pro	165	166	B	1	14903	2026-05-26 02:28:05	evidence_quality	obviously	1	Response B incorporates a direct quotation from the text and explains its significance, demonstrating stronger evidence use and analysis, whereas Response A merely summarizes without textual support.
9	deepseek-v4-pro	166	167	A	0	23468	2026-05-26 02:28:29			1
10	deepseek-v4-pro	168	167	A	1	22540	2026-05-26 02:28:52	depth_of_thinking	clearly	1	Response A draws on evidence from two distinct points in the text to trace a character's developing complexity, showing a layered interpretation. Response B's analysis, while focused, remains confined to a single moment and lacks that cross-textual insight.
11	deepseek-v4-pro	130	129	A	1	17918	2026-05-26 02:29:10	self_awareness	clearly	1	Response A demonstrates self-awareness by reflecting on the specific tool (template) that guided their thinking and admitting what they wouldn't have connected on their own, showing an understanding of their learning process. In contrast, Response B simply recounts following teacher instructions and needing direct guidance, lacking this metacognitive reflection.
12	deepseek-v4-pro	131	130	A	1	18293	2026-05-26 02:29:29	original_thinking	clearly	1	Response A demonstrates original thinking by independently tracking how Elena redefines freedom across multiple scenes and interpreting symbolic details like the burning letter's curl; Response B explicitly credits a template for its connections, showing less independent analysis.
13	deepseek-v4-pro	131	132	B	1	13235	2026-05-26 02:29:43	original_thinking	clearly	1	Response B extends beyond textual analysis to forge an original intertextual connection and frames the argument as a planned discussion contribution, while Response A remains a solid but more conventional internal tracking of a character's conflict.
14	deepseek-v4-pro	134	133	A	1	10939	2026-05-26 02:29:55	depth_of_thinking	clearly	1	Response A delves into Elena's internal conflict and mixed feelings with specific textual support, revealing a layered character, whereas Response B stays at a superficial level, merely labeling her traits without analysis.
15	deepseek-v4-pro	134	135	B	1	13411	2026-05-26 02:30:09	depth_of_thinking	clearly	1	Response B analyzes how Elena's conflicting motivations connect to the story's central argument about freedom, moving beyond personal conflict to thematic significance, whereas Response A only describes her mixed feelings.
16	deepseek-v4-pro	135	136	B	1	19204	2026-05-26 02:30:28	depth_of_thinking	clearly	1	Response B introduces the mother’s unsent brochures to construct a generational parallel, then questions whether Elena’s choice can be free at all—shifting the interpretation from resolved transformation to unresolved cycle. This layered analysis moves beyond personal motivation to interrogate the text’s deeper argument about inherited obligation and ambiguity.
17	deepseek-v4-pro	169	170	B	1	22131	2026-05-26 02:30:51	independence_level	clearly	1	Response A relies entirely on teacher explanation, provided charts, and direct pointing to a passage, while Response B builds on a group discussion prompt to generate an original insight and then independently seeks out further evidence.
18	deepseek-v4-pro	171	170	A	1	23114	2026-05-26 02:31:15	independence_level	clearly	1	Response A independently analyzes the text with no mention of external guidance, whereas Response B relies on a group discussion prompt and acknowledges not having the insight without it. This stark difference in self-directed thinking makes A more autonomous.
19	deepseek-v4-pro	172	171	A	1	17791	2026-05-26 02:31:33	original_thinking	clearly	1	Response A makes an original argument about how the narrative challenges default ways of knowing and connects independently to another text, while B offers a more conventional analysis limited to the text.
20	deepseek-v4-pro	173	174	B	1	8966	2026-05-26 02:31:43	depth_of_thinking	clearly	1	Response B provides a deeper analysis of how Davi's limited perspective shapes the reader's experience and creates isolation, using a specific quote to support the reasoning, while Response A only identifies the narrator and basic sensory details.
21	deepseek-v4-pro	174	175	B	1	16112	2026-05-26 02:32:00	depth_of_thinking	clearly	1	Response B offers a nuanced analysis of how the POV creates meaning, interpreting the simile's details and connecting them to a thematic argument about isolation and perception, while Response A only describes the basic emotional effect of the limited perspective.
22	deepseek-v4-pro	176	175	A	1	15534	2026-05-26 02:32:16	depth_of_thinking	clearly	1	Response A delves into the epistemological implications of the point of view, analyzing how the narration exposes the sighted reader's perceptual bias and treats non-visual knowledge as authoritative, while Response B interprets the same evidence mainly to highlight theme and character isolation, offering a less layered analysis.
23	deepseek-v4-pro	81	82	B	1	13746	2026-05-26 02:32:30	independence_level	clearly	1	Response A depends on the teacher’s specific direction to locate evidence, while Response B uses a general checklist to independently select and justify a piece of evidence, demonstrating higher self-directed learning.
24	deepseek-v4-pro	82	83	B	1	23298	2026-05-26 02:32:54	independence_level	clearly	1	Response B shows a student independently evaluating evidence strength ('That felt like the strongest evidence') and building an analysis around it, while Response A merely follows a checklist to insert a required quote, indicating greater reliance on external scaffolding.
25	deepseek-v4-pro	83	84	B	1	21690	2026-05-26 02:33:16	self_awareness	clearly	1	Response B demonstrates a deeper self-awareness by critically evaluating the evidence's selective use and adjusting their analysis to more nuanced data, while Response A simply selects and analyzes a given piece of evidence without questioning its validity.
26	deepseek-v4-pro	86	85	A	1	13493	2026-05-26 02:33:30	evidence_quality	clearly	1	Response A directly quotes specific textual evidence and explains its significance, while Response B only makes general claims without citing any concrete details from the article.
27	deepseek-v4-pro	87	86	A	1	11979	2026-05-26 02:33:43	depth_of_thinking	obviously	1	Response A moves beyond simple identification of evidence to analyze the author's framing and rhetorical choices—explaining how the pairing of the statistic with 'accelerated' builds a policy argument and shifts responsibility. Response B merely summarizes the point of the evidence.
28	deepseek-v4-pro	88	87	A	1	19434	2026-05-26 02:34:03	evidence_quality	clearly	1	Response A demonstrates a more sophisticated evaluation of evidence by identifying a credibility gap (specific data early vs. vague attributions later) and an unresolved tension between paragraphs, while Response B merely analyzes a single framing detail. This deeper critique of how evidence quality varies within the text makes A clearly superior.
29	deepseek-v4-pro	257	258	B	1	20644	2026-05-26 02:34:24	self_awareness	somewhat	1	Response B demonstrates greater autonomy by showing self-awareness—the student recognizes a gap in their rebuttal and considers how to improve, whereas Response A simply completed the worksheet without reflection or initiative.
30	deepseek-v4-pro	259	258	A	1	19709	2026-05-26 02:34:45	original_thinking	clearly	1	Response A demonstrates original thinking by crafting a nuanced claim about identity narrowing rather than repeating the typical burnout argument, and by devising a sophisticated strategy to engage with data-driven counterclaims. In contrast, Response B relies on a standard claim and acknowledges that it hasn't fully developed the counterargument.
31	deepseek-v4-pro	260	259	A	1	18402	2026-05-26 02:35:04	self_awareness	clearly	1	Response A demonstrates deeper self-awareness by describing the iterative revision of their claim based on thoughtful consideration of counterarguments, actively seeking to 'steelman' the most challenging point to make their essay more honest. Response B also engages a counterclaim but lacks the same explicit reflection on the development of their own thinking.
32	deepseek-v4-pro	262	261	A	1	13723	2026-05-26 02:35:18	clarity_of_reasoning	obviously	1	Response A makes a clear claim and supports it with reasoning and evidence, while Response B only announces the topic without taking a position.
33	deepseek-v4-pro	263	262	A	1	20237	2026-05-26 02:35:39	depth_of_thinking	clearly	1	Response A demonstrates nuanced understanding by acknowledging valid research while critiquing its misapplication, whereas Response B offers a one-sided, simplistic argument.
34	deepseek-v4-pro	263	264	B	1	21272	2026-05-26 02:36:01	depth_of_thinking	clearly	1	Response B moves beyond a surface-level critique of misapplied science to examine the structural economic incentives and social dynamics that drive early specialization, demonstrating a more layered and sophisticated analysis of the issue.
35	deepseek-v4-pro	249	250	B	1	19081	2026-05-26 02:36:20	independence_level	clearly	1	Response B demonstrates greater autonomy by independently using a model essay and rubric to structure the argument and self-assessing weaknesses, whereas Response A relied on teacher-provided evidence and direct assistance to connect evidence to claims.
36	deepseek-v4-pro	250	251	B	1	17805	2026-05-26 02:36:39	original_thinking	clearly	1	Response B demonstrates original thinking by independently crafting a counterclaim that engages with specific data and a nuanced rebuttal, while Response A relies heavily on a model essay and produces a generic dismissal.
37	deepseek-v4-pro	252	251	A	1	22128	2026-05-26 02:37:01	self_awareness	clearly	1	Response A demonstrates self-awareness by recognizing an initial oversimplification and revising the entire argument's framing, whereas Response B only constructs a standard counterargument without reflecting on or altering its original thesis.
38	deepseek-v4-pro	253	254	B	1	11771	2026-05-26 02:37:14	evidence_quality	clearly	1	Response B includes a specific citation and data (elephants having less than 1% of natural range) and acknowledges a counterargument, while Response A uses only broad, unsupported claims.
39	deepseek-v4-pro	255	254	A	1	20224	2026-05-26 02:37:35	depth_of_thinking	clearly	1	Response A engages deeply with the counterclaim by citing specific species recovery examples and using statistical evidence to rebut, whereas B dismisses it with a simpler, less substantiated principle.
40	deepseek-v4-pro	256	255	A	1	14983	2026-05-26 02:37:50	depth_of_thinking	clearly	1	Response A moves beyond direct evidence to analyze the structural interdependence of entertainment and conservation, revealing a systems-level problem that complicates the ethical debate in a way Response B's more straightforward argument does not.
41	deepseek-v4-pro	338	337	A	1	18422	2026-05-26 02:38:09	self_awareness	clearly	1	Response A shows a student actively evaluating their own thesis and identifying a need for greater specificity, demonstrating self-awareness and initiative. Response B relies heavily on a teacher-provided frame with little evidence of independent thought.
42	deepseek-v4-pro	339	338	A	1	14864	2026-05-26 02:38:25	independence_level	clearly	1	Response A demonstrates self-directed revision from a simple fact to a nuanced, arguable thesis, while Response B relies on external prompts and remains uncertain about how to improve further.
43	deepseek-v4-pro	339	340	B	1	18238	2026-05-26 02:38:44	initiative	clearly	1	Response B demonstrates greater initiative by rewriting the thesis after research revealed a deeper structural insight, proactively restructuring the essay to reflect a more fundamental argument shift, while Response A revises primarily for nuance without the same level of proactive restructuring.
44	deepseek-v4-pro	342	341	A	1	10762	2026-05-26 02:38:55	clarity_of_reasoning	clearly	1	Response A presents a clear, arguable thesis with a specific cause-and-effect reasoning chain, while Response B merely notes the topic and differing opinions without taking a stance or providing any reasoning.
45	deepseek-v4-pro	342	343	B	1	13742	2026-05-26 02:39:09	overall_sophistication	clearly	1	Response B presents a nuanced, multi-layered argument connecting sleep science, resource allocation, and socioeconomic disparities, whereas Response A is a simple, one-dimensional claim.
46	deepseek-v4-pro	344	343	A	1	15311	2026-05-26 02:39:25	overall_sophistication	somewhat	1	Response A offers a more incisive critique by examining how the framing of the debate itself protects the status quo, converting a political decision into a neutral-sounding constraint, while Response B presents a clearer but more conventional evidence-based link to resource inequality.
47	deepseek-v4-pro	305	306	B	1	14628	2026-05-26 02:39:40	independence_level	clearly	1	Response B demonstrates significantly more autonomy by independently using a reverse outline to diagnose and fix redundancy, strategically reordering paragraphs, and choosing transitions purposefully, whereas Response A relies entirely on the teacher's provided outline and class brainstorming order without self-initiated organizational decisions.
48	deepseek-v4-pro	306	307	B	1	15657	2026-05-26 02:39:57	original_thinking	clearly	1	Response B shows original thinking by independently designing a purposeful organizational arc that builds argumentative stakes, while Response A's revisions were largely prompted by a teacher and an exercise.
49	deepseek-v4-pro	308	307	A	1	17047	2026-05-26 02:40:14	original_thinking	clearly	1	Response A shows more original thinking by inventively restructuring the essay to start with a strengthened counterargument for rhetorical impact and cutting a summary in favor of a question to maintain momentum, whereas Response B follows a more predictable progression.
50	deepseek-v4-pro	309	310	B	1	10342	2026-05-26 02:40:25	clarity_of_reasoning	clearly	1	Response B demonstrates clear organization with a thesis, separate body paragraphs, a counterargument, and a conclusion, while Response A presents a disjointed list of claims without logical progression or structure.

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE rubric_calibration_events (
    id INTEGER PRIMARY KEY,
    teacher_id TEXT NOT NULL,
    gradation_id_a INTEGER NOT NULL,
    gradation_id_b INTEGER NOT NULL,
    teacher_choice TEXT NOT NULL,
    correct INTEGER,
    response_time_ms INTEGER,
    created_at TEXT DEFAULT (datetime('now'))
  , confidence TEXT, perceived_difficulty TEXT, influential_feature TEXT, margin TEXT, rubric_version INTEGER DEFAULT 1, reasoning TEXT);
CREATE INDEX idx_calibration_teacher ON rubric_calibration_events(teacher_id);
CREATE INDEX idx_calibration_correct ON rubric_calibration_events(correct);