RunoVerse

Lemma Quality Explorer

Explore 325,332 lemmatization conflicts between corpus annotations and DeepSeek AI analysis. Discover magnet lemmas, suffix patterns, and annotation uncertainty.

What are conflicts?

A conflict occurs when the corpus annotation and the DeepSeek AI analysis assign different lemmas to the same wordform. This does not necessarily mean either is wrong — many conflicts are spelling variants or cross-language differences. True conflicts represent genuine disagreements about word identity.

Conflict Classes

True conflict Different lemma interpretations (216K). Spelling variant Same word, different orthography (99K). Cross-language Estonian vs Finnish lemma (10K).

Magnet Lemmas

Common lemmas (like “ei”, “saama”, “olla”) that attract many wordforms from the corpus pipeline, even when the AI suggests different lemmas. These reveal systematic biases in the annotation pipeline.

Current Pick

For each conflict, RunoVerse currently uses a selection algorithm that considers corpus unanimity, count ratios, and cross-language signals. The “current pick” shows what the system chose.

-Total Conflicts
-True Conflicts
-Spelling Variants
-Cross-Language
Overview
Magnet Lemmas
Search
Patterns
Browse Conflicts

Conflict Class Distribution

Corpus vs AI Dominance

Most Contested Wordforms

Wordforms with the highest combined annotation counts where corpus and AI disagree:

WordformCorpus LemmaCountAI LemmaCountClassCurrent Pick
LemmaAttracted FormsUnanimityTrue ConflictsSpellingCross-Lang

Top Suffix Patterns in Conflicts

Systematic morphological patterns in conflicting lemma assignments:

#Wordform SuffixLemma SuffixConflicts

Magnet Lemma Size Distribution

How many lemmas attract a given number of conflicting wordforms:

Loading quality data...