Lexicon & Dictionary Guide
What is this page?
This is a reference guide to the six dictionary and lexicon tools in RunoVerse. Each section below describes one tool, lists its main features, and links directly to it.
How to use this guide
Scroll through the sections or jump to a tool that interests you. Each section ends with an "Open" link that takes you straight to the tool. If you are new to RunoVerse, start with the Lexicon — it is the main entry point for word-level exploration.
Related explorer pages
Beyond the six tools covered here, several other pages work with word-level data: Dictionary Explorer (browse definitions from five dictionaries), Dictionary Comparison (headword overlap across dictionaries), Cognates (Estonian–Finnish cognate families), and Thesaurus (synonym and semantic clusters).
Other guides
See also: Corpus Guide, Languages Guide, Poetics Guide, and Similarity Guide.
These tools let you search, browse, and analyze the RunoVerse word-level data — 439,746 lemmas drawn from three runosong corpora and enriched with definitions from nine dictionaries and AI-generated translations. Whether you want to look up a single word, compare two entries side by side, or explore how well existing dictionaries cover this historical vocabulary, you will find the right tool below.
Dictionary
The main dictionary is the starting point for exploring the RunoVerse lexicon. Type any word — a lemma, an inflected form, or an English translation — and the dictionary returns matching entries instantly. Each entry opens a detailed view with seven tabs covering word forms, AI analysis, dictionary definitions, similar words, semantic neighbors, poem occurrences, and geographic distribution.
- Search by lemma, word form, or English translation with diacritics-insensitive matching
- Filter results by language (Estonian, Finnish, shared Finnic), part of speech, and data source
- View all inflected forms for a lemma with per-corpus frequency badges
- Read DeepSeek AI translations, etymological notes, and morphological descriptions
- Look up definitions from nine Estonian and Finnish dictionaries, including compound word breakdowns
- Explore similar word forms (edit-distance) and BERT semantic neighbors
- See which poems contain a word and read them in context
- View geographic distribution of word usage on an interactive map
- Bookmark and share direct links to any entry
- Export filtered results as CSV
Word Comparison
The comparison tool places any two words side by side so you can examine their differences and similarities at a glance. Select two lemmas and the tool displays their frequencies, part-of-speech tags, dictionary definitions, word forms, and corpus distributions in parallel columns. This is especially useful for comparing near-synonyms, dialectal variants, or Estonian–Finnish cognate pairs.
- Side-by-side display of frequency, POS, and corpus breakdown for two words
- Compare dictionary definitions and DeepSeek AI annotations in parallel
- View overlapping and unique word forms between the two entries
- Shareable URLs preserve both selected words for easy reference
Dictionary Coverage
This page analyzes how thoroughly nine published dictionaries cover the vocabulary found in the runosong corpora. It shows that 83.5% of the 1,166,348 unique word forms in the corpus appear in at least one dictionary, and breaks the coverage down by individual dictionary, language, and frequency band. You can search for any word to see which dictionaries include it.
- Coverage statistics for nine dictionaries: EMS, EKSS, IMS, ERLA, VMS, Seto, SMS, KKS, and VKS
- Per-dictionary hit counts and overlap analysis
- Breakdown by Estonian and Finnish word forms
- Search to check which dictionaries contain a specific word
Category Browser
The category browser lets you explore 65 grammatical and poetic categories that have been assigned to 1.17 million word-form annotations across the corpus. Categories include morphological cases (nominative, genitive, partitive, and others), verb forms (infinitives, participles, imperatives), and stylistic markers specific to folk poetry. Select any category to see its most frequent word forms with example contexts.
- Browse 65 annotation categories covering morphology, syntax, and poetics
- See the most frequent word forms in each category
- View total annotation counts and per-category distributions
- Click through to full dictionary entries for any word
Frequency Explorer
The frequency explorer lets you compare how often words appear in the Estonian and Finnish corpora. Enter a word to see its rank and raw count in each language, or browse the top-N most frequent lemmas. The tool also shows BERT semantic neighbors for each word, revealing which other words appear in similar poetic contexts.
- Look up frequency rank and token count for any lemma in Estonian and Finnish
- Browse top-N frequency lists for each corpus
- Compare Estonian and Finnish rankings side by side
- View BERT-based semantic neighbors that share similar poetic contexts
Lemma Ambiguity
Many word forms in historical folk poetry can be mapped to more than one lemma. The ambiguity browser lets you explore this phenomenon: search for any word form and see all the candidate lemmas it has been assigned to, along with the frequency of each assignment. This is valuable for understanding the challenges of lemmatizing dialectal and archaic texts, and for assessing how confident a given lemmatization is.
- Search among 400,000 word forms and their candidate lemma mappings
- See frequency data for each word-form-to-lemma assignment
- Identify highly ambiguous forms with many possible lemmas
- Click through to full dictionary entries for any candidate lemma