Cross-Lingual Analysis Guide
What is this guide?
A reference for the six cross-lingual analysis tools in RunoVerse. Each section below describes one tool, the data behind it, and links directly to the explorer page.
How to navigate
Scroll through the sections or use your browser's find (Ctrl/Cmd+F) to jump to a topic. Each section ends with a direct link to the corresponding explorer. The tools range from surface-level string matching (Cognates, Shared Vocabulary) to deeper semantic analysis (Concepts, Thesaurus).
Related guides
Corpus Guide — poem collections and metadata. Dictionary Guide — lexicon structure and annotations. Poetics Guide — meter, alliteration, parallelism. Similarity Guide — poem and verse similarity algorithms.
Estonian and Finnish runosongs share a common Finnic origin. These tools help explore the linguistic connections between the two traditions — cognate words, shared vocabulary, etymological families, and semantic concepts that bridge the language divide.
Cognate Explorer
Browse 6,382 automatically discovered Estonian–Finnish cognate pairs. Cognates are words in the two languages that descend from a common ancestor — they often look similar and carry related meanings, reflecting the shared Finnic heritage of both runosong traditions.
The cognate pairs were identified using three discovery methods:
- Exact string matches – 1,114 pairs where the Estonian and Finnish lemma forms are identical (e.g., words that have not diverged across the two languages).
- Near-exact matches – 2,390 pairs with minor orthographic differences, capturing systematic sound correspondences between Estonian and Finnish (such as vowel length differences or consonant gradation).
- Translation-bridged pairs – 2,873 pairs where both an Estonian and a Finnish word share the same English translation, suggesting a common meaning even when the surface forms have diverged further.
The interactive network visualization shows how cognate pairs connect into larger word families. You can filter results by match type or search for specific words to see their cross-lingual connections.
Open Cognate Explorer →Etymology Families
Explore approximately 49,000 etymological root families containing 174,878 lemmas. Each family groups together words that share a common historical root, based on etymological analysis extracted from DeepSeek AI annotations of the runosong corpora.
- Language family connections – see which roots are classified as Finno-Ugric, Baltic, Germanic, Slavic, or from other language families, revealing the layers of historical contact and inheritance in the runosong vocabulary.
- Cross-lingual groupings – view Estonian and Finnish lemmas grouped under the same etymological root, showing how words from a shared ancestor have evolved differently in each language.
- Corpus frequency data – each etymological family includes frequency information, showing how often words from that root appear across the three source corpora (SKVR, JR, ERAB).
Shared Vocabulary
Explore the 1,240 lemmas that appear in both the Estonian and Finnish corpora as identical string forms. These are words that have survived essentially unchanged in both traditions from their common proto-language ancestor, representing the core shared vocabulary of Finnic runosong.
- Identical forms – 1,240 lemmas found in both Estonian (ERAB) and Finnish (SKVR/JR) sources with the exact same spelling.
- Cognate pairs – the full set of 6,382 automatically discovered Estonian–Finnish cognate pairs, including near-matches and translation-bridged connections beyond the identical forms.
- Interactive visualizations – compare the shared Finnic vocabulary with charts showing frequency distributions, part-of-speech breakdowns, and the overlap between the two corpora.
Dialectal Forms
Explore over 517,000 dialectal form pairs showing how standard language forms differ from the runosong-specific dialect forms preserved in the corpora. Runosongs were transmitted orally across centuries and geographic regions, and the texts reflect a rich diversity of dialectal variation that is often absent from standard written language.
- Estonian dialectal variation – see how Estonian runosong forms differ from modern standard Estonian, including archaic forms, regional vocabulary, and South Estonian (Võro/Seto) features.
- Finnish dialectal variation – explore the range of Finnish dialect forms across the SKVR and JR collections, from western to eastern Finnish dialects and Karelian-influenced forms.
- Search and filter – look up specific words to see all their dialectal variants, or browse by dialect region to understand the geographic distribution of linguistic features.
Concept Browser
Browse 471,241 English semantic concepts mapped to Finnic runosong wordforms. The Concept Browser functions as a cross-lingual reverse dictionary: enter an English word or meaning, and discover which Estonian and Finnish runosong lemmas express that concept.
- Cross-lingual discovery – find out how the same concept is expressed in Estonian and Finnish runosong vocabulary, revealing both shared terms and language-specific expressions.
- Semantic search – enter any English word to find all related runosong vocabulary. For example, searching “mother” reveals the various Estonian and Finnish words for mother used across the poetic traditions.
- Frequency data – each concept shows how many lemmas and word forms map to it, along with corpus frequency information from both traditions.
Thematic Vocabulary
Browse Finnic runosong vocabulary organized by 25 semantic domains, providing a thematic overview of the poetic word-world shared by Estonian and Finnish traditions. Domains include Family, Nature, Animals, War, Magic, Religion, Food, Clothing, Body, and others.
- Semantic domains – 25 thematic categories organize the vocabulary into meaningful groups, making it easy to explore what words the runosong traditions used to talk about specific topics.
- Cross-lingual comparisons – for each domain, see which words exist in both Estonian and Finnish traditions and which are unique to one language, revealing how the two poetic vocabularies overlap and diverge.
- Frequency and distribution – each thematic category shows word counts and corpus frequency data, indicating which semantic domains dominate the runosong vocabulary and how usage patterns differ between traditions.