methodology

From corpus to cognition, reproducibly

A comparable multilingual corpus, a frozen annotation grid, a validated AI pipeline, and an optional reader study — every choice justified in writing, every automatic measure validated against human judgement.

corpus

Corpus design

ComponentSpecification
Comparable corpus100–300 articles per language, same events, publication within a defined window (e.g. ±72 h), full texts accessible, balanced genres (hard news, analysis, wire copy).
Parallel sub-corpusSource/translation or source/rewrite pairs (e.g. wire dispatch → localised article), explicitly flagged; target ≥ 30 pairs per direction.
Gold annotation subset20–30 articles per language fully hand-annotated; the benchmark against which every AI output is validated (H5).
Provenance is data. Every article carries outlet, date, byline, licence note, and original-vs-translated status. Copyrighted texts are stored for analysis only; deliverables publish metadata, annotations, and derived measures — never full texts.
annotation grid

Four code families

LIN — linguistic

LIN-1 evaluative lexis · LIN-2 modal intensity · LIN-3 certainty/uncertainty markers · LIN-4 logical connectors · LIN-5 complex syntax · LIN-6 nominalisations · LIN-7 metaphor and imagery

DIS — discursive

DIS-1 event framing (closed label set) · DIS-2 information hierarchy · DIS-3 source selection · DIS-4 narrative angle · DIS-5 opposition/polarisation · DIS-6 agentivity

TRA — translational

TRA-1 omission · TRA-2 addition · TRA-3 explicitation · TRA-4 attenuation · TRA-5 amplification · TRA-6 informational reorganisation · TRA-7 register shift — coded on aligned pairs only

PRD — predictability

PRD-1 lexical surprisal (LM-computed) · PRD-2 cloze probability (piloted) · PRD-3 prediction-support devices · PRD-4 anticipatory shifts that raise target-text predictability

Quality control. One positive and one negative example per code before full annotation; ≥ 15 % of the gold subset double-annotated; Cohen's κ ≥ 0.70 before scaling; every code-book change logged with date and rationale — codes never change silently mid-corpus.
ai pipeline

Seven stages, human adjudication throughout

  1. Document pairing

    Cluster articles by event across languages (multilingual sentence embeddings + date filters); verify pairs manually.

  2. Sentence alignment

    LaBSE/LASER embeddings with alignment heuristics on the parallel sub-corpus; human verification of all retained pairs.

  3. Divergence detection

    Flag aligned pairs with low semantic similarity; classify candidate shifts for human adjudication.

  4. Feature extraction

    Automatic counts feeding the grid: connectors, passives, nominalisations, modality lexicons; readability and density indices.

  5. Frame comparison

    Classification of DIS-1 labels at scale, trained and validated on the hand-annotated gold subset.

  6. Semi-automatic annotation

    Model pre-annotates; the researcher corrects; corrections are logged to measure model reliability (H5).

  7. Predictability profiling

    Token/sentence surprisal per language with monolingual language models; aligned with shift annotations to test whether translation changes local predictability (H6).

Validation rules — non-negotiable. Every automatic measure used in a finding is validated against the gold subset, with precision/recall or agreement reported. Prompts, model names, versions, and parameters are logged for every LLM-assisted step. Disagreements between AI output and human judgement are adjudicated by the researcher — the human decision is final.
reader study

Cognitive evaluation

A between-subjects (or mixed) design: each reader sees a text either in its original language or in a translated/adapted version of comparable length. Conditions cross language × status (original vs. translated), with counterbalanced topics and screened participants.

MeasureInstrument
Reading timePer text; screen-logged or timed.
Factual comprehensionMultiple-choice and short answers keyed to a fixed list of idea units per event.
RecallImmediate free recall; optional delayed recall (48 h).
Prediction / clozeCloze task on key idea units; reading time, effort, and recall analysed as a function of the version's surprisal profile (H6).
Subjective cognitive loadSingle-item Paas scale or NASA-TLX subset.
Perceived clarity & credibilityLikert ratings with brief justification.
Ethics first. Research-ethics approval (or documented exemption) is initiated before anything else; consent forms in all study languages; anonymised storage; pilot with 3–5 readers before any full session; pre-registration of H3–H4 and H6 recommended. If approval timelines exceed a project phase, the approved-ready protocol and instruments are themselves a deliverable.
Epistemic caution. Textual analysis grounds hypotheses about reception; only the reader study can test them. Claims are phrased accordingly — is compatible with, predicts — never demonstrates from text alone.