RunoVerse

Poem & Verse Similarity Guide

What is this page?

A reference guide to the similarity system in RunoVerse. It documents all 7 algorithms (5 poem-level, 4 verse-level) and the tools that use them to find related poems and verses across 292K poems and 4.29M verse lines.

How to navigate

Each section describes one tool — what it does, its key features, and a direct link to open it. The final table summarises the poem-level algorithms and how they differ. Scroll or use the browser's find-in-page (Ctrl/Cmd+F) to jump to a section.

Related explorer pages

Similarity Explorer — poem and verse similarity search · Poem Reader — per-poem similarity tabs · Verse Network — interactive similarity graph · Path Finder — shortest verse chains · Formula Explorer — formulaic patterns · Verse Analysis — cross-algorithm dashboard

RunoVerse provides multiple ways to explore how poems and verses relate to each other across the Finnish and Estonian runosong traditions. Whether you are tracing the spread of a formulaic verse line, comparing thematically related poems from different regions, or mapping cross-lingual parallels, the tools below offer complementary perspectives on the 292,092-poem, 4.29-million-verse corpus.

Poem Reader

The Poem Reader is an interactive viewer for all 292,092 poems from the SKVR (Finnish, published), JR (Finnish, unpublished), and ERAB (Estonian) collections. Each word in a poem is annotated with its standard orthography, English gloss, part-of-speech tag, and a link to its lemma in the dictionary. Clicking a word opens a dictionary lookup panel with full definitions from up to nine lexicographic sources.

Open Poem Reader →

Similarity Explorer

The Similarity Explorer is a standalone tool for investigating poem and verse similarity in depth. It operates in two modes.

Poem mode lets you select any poem and view its nearest neighbors across all five similarity algorithms. Each algorithm displays its top matches with numeric similarity scores. A side-by-side comparison panel lets you read two poems together, line by line. A network graph visualizes the similarity neighborhood, showing how poems cluster. Geographic and temporal analytics reveal where and when similar poems were collected. Cross-algorithm agreement badges highlight poems that appear as top matches in multiple algorithms.

Verse mode lets you search for individual verse lines and see similar verses from across the corpus. You can also browse the top 200 formulaic patterns — the most widely distributed recurring verse lines, ranked by how many poems and collection places they span.

Open Similarity Explorer →

Verse Concordance

The Verse Concordance provides full-text search across 2,906,535 unique verse types drawn from 4.29 million total verse lines. Enter any text fragment to find matching verses. Results include a geographic distribution map showing where each verse was collected, a language breakdown (Estonian vs. Finnish), and occurrence tables listing which poems contain the verse and how frequently it appears.

Open Verse Concordance →

Verse Network

The Verse Network displays an interactive force-directed graph of verse similarity neighborhoods. Start from any verse and explore multi-hop connections to see how verses are linked through shared similarity across the corpus. Each edge shows a per-algorithm score breakdown, so you can see which similarity measures contribute to each connection. Results can be exported as CSV for further analysis.

Open Verse Network →

Verse Path Finder

The Verse Path Finder locates the shortest chain of similar verses connecting any two verses in the Finnic runosong corpus. The result is displayed as an interactive chain showing each intermediate verse and the similarity scores between consecutive steps. This reveals how seemingly unrelated verses may be connected through a sequence of incremental textual similarities.

Open Verse Path Finder →

Formula Explorer

The Formula Explorer lets you browse 200 formulaic verse patterns ranked by frequency and geographic spread. A “formula” here is a cluster of similar verse lines found across many poems — evidence of oral tradition transmission, where singers in different times and places used recognizably similar wording. Each cluster shows its variant texts, the number of member verse occurrences across poems, geographic spread across collection places, and cross-links to the network visualization for further exploration.

Open Formula Explorer →

Verse Similarity Analysis

The Verse Similarity Analysis is a cross-algorithm dashboard showing how the four verse-level similarity algorithms — Jaccard, TF-IDF, Translation-pivot, and CharBigram — compare across the 4.29 million verse lines in the corpus. It includes formulaic cluster analysis and geographic coverage metrics, providing a high-level view of how similarity patterns distribute across the material.

Open Verse Similarity Analysis →

Poem-level similarity algorithms

Five algorithms are used to identify related poems. Each captures a different aspect of similarity — from shared vocabulary to cross-lingual thematic overlap to structural verse alignment. Results from all five are available in the Poem Reader and the Similarity Explorer.

Algorithm Basis Description
TF-IDF Lemma Lemma-level Cosine similarity on TF-IDF vectors of lemmatized poem texts. Captures thematic similarity through shared vocabulary, weighted by corpus-level term importance.
Wordform Overlap (Jaccard) Exact wordforms Jaccard index over raw wordform sets. Identifies poems sharing exact surface forms, useful for detecting formulaic lines and direct textual parallels.
Thematic (Translation-pivot) Cross-lingual Boolean-IDF cosine similarity over English translations derived from DeepSeek annotations. Enables cross-lingual comparison between Estonian and Finnish poems via a shared semantic space.
Alignment Character n-gram Verse sequence alignment using character bigram cosine similarity and dynamic programming, from the FILTER project (Janicki, Kallio & Sarv 2023). Captures structural similarity — poems that follow the same verse order score high.
Verse-level RRF Verse-level fusion Fuses Jaccard, TF-IDF, Translation, and CharBigram similarity at the verse level using Average-Best-Per-Verse aggregation, then combines all four via Reciprocal Rank Fusion into a single poem-level ranking.

← Back to About