methodology

From corpus to cognition, reproducibly

A comparable multilingual corpus, a frozen annotation grid, a validated AI pipeline, and an optional reader study — every choice justified in writing, every automatic measure validated against human judgement.

corpus

Corpus design

Component	Specification
Comparable corpus	100–300 articles per language, same events, publication within a defined window (e.g. ±72 h), full texts accessible, balanced genres (hard news, analysis, wire copy).
Parallel sub-corpus	Source/translation or source/rewrite pairs (e.g. wire dispatch → localised article), explicitly flagged; target ≥ 30 pairs per direction.
Gold annotation subset	20–30 articles per language fully hand-annotated; the benchmark against which every AI output is validated (H5).

Provenance is data. Every article carries outlet, date, byline, licence note, and original-vs-translated status. Copyrighted texts are stored for analysis only; deliverables publish metadata, annotations, and derived measures — never full texts.

annotation grid

Four code families

LIN — linguistic

LIN-1 evaluative lexis · LIN-2 modal intensity · LIN-3 certainty/uncertainty markers · LIN-4 logical connectors · LIN-5 complex syntax · LIN-6 nominalisations · LIN-7 metaphor and imagery

DIS — discursive

DIS-1 event framing (closed label set) · DIS-2 information hierarchy · DIS-3 source selection · DIS-4 narrative angle · DIS-5 opposition/polarisation · DIS-6 agentivity

TRA — translational

TRA-1 omission · TRA-2 addition · TRA-3 explicitation · TRA-4 attenuation · TRA-5 amplification · TRA-6 informational reorganisation · TRA-7 register shift — coded on aligned pairs only

PRD — predictability

PRD-1 lexical surprisal (LM-computed) · PRD-2 cloze probability (piloted) · PRD-3 prediction-support devices · PRD-4 anticipatory shifts that raise target-text predictability

Quality control. One positive and one negative example per code before full annotation; ≥ 15 % of the gold subset double-annotated; Cohen's κ ≥ 0.70 before scaling; every code-book change logged with date and rationale — codes never change silently mid-corpus.

ai pipeline

Seven stages, human adjudication throughout

Document pairing
Cluster articles by event across languages (multilingual sentence embeddings + date filters); verify pairs manually.
Sentence alignment
LaBSE/LASER embeddings with alignment heuristics on the parallel sub-corpus; human verification of all retained pairs.
Divergence detection
Flag aligned pairs with low semantic similarity; classify candidate shifts for human adjudication.
Feature extraction
Automatic counts feeding the grid: connectors, passives, nominalisations, modality lexicons; readability and density indices.
Frame comparison
Classification of DIS-1 labels at scale, trained and validated on the hand-annotated gold subset.
Semi-automatic annotation
Model pre-annotates; the researcher corrects; corrections are logged to measure model reliability (H5).
Predictability profiling
Token/sentence surprisal per language with monolingual language models; aligned with shift annotations to test whether translation changes local predictability (H6).

Validation rules — non-negotiable. Every automatic measure used in a finding is validated against the gold subset, with precision/recall or agreement reported. Prompts, model names, versions, and parameters are logged for every LLM-assisted step. Disagreements between AI output and human judgement are adjudicated by the researcher — the human decision is final.

reader study

Cognitive evaluation

A between-subjects (or mixed) design: each reader sees a text either in its original language or in a translated/adapted version of comparable length. Conditions cross language × status (original vs. translated), with counterbalanced topics and screened participants.

Measure	Instrument
Reading time	Per text; screen-logged or timed.
Factual comprehension	Multiple-choice and short answers keyed to a fixed list of idea units per event.
Recall	Immediate free recall; optional delayed recall (48 h).
Prediction / cloze	Cloze task on key idea units; reading time, effort, and recall analysed as a function of the version's surprisal profile (H6).
Subjective cognitive load	Single-item Paas scale or NASA-TLX subset.
Perceived clarity & credibility	Likert ratings with brief justification.

Ethics first. Research-ethics approval (or documented exemption) is initiated before anything else; consent forms in all study languages; anonymised storage; pilot with 3–5 readers before any full session; pre-registration of H3–H4 and H6 recommended. If approval timelines exceed a project phase, the approved-ready protocol and instruments are themselves a deliverable.

Epistemic caution. Textual analysis grounds hypotheses about reception; only the reader study can test them. Claims are phrased accordingly — is compatible with, predicts — never demonstrates from text alone.

From corpus to cognition, reproducibly

Corpus design

Four code families

Seven stages, human adjudication throughout

Document pairing

Sentence alignment

Divergence detection

Feature extraction

Frame comparison

Semi-automatic annotation

Predictability profiling

Cognitive evaluation