How AI sees Dante's Divine Comedy in 27 words

In this Post

Text mining eternal classics with Text Visualization App

Table of Content


The Divine Comedy, penned by the Italian poet Dante Alighieri in the 14th century, is one of the most significant works in world literature. It’s a journey through the realms of the afterlife: Hell (Inferno), Purgatory (Purgatorio), and Paradise (Paradiso), guided by the Roman poet Virgil and later by Dante’s beloved Beatrice. While traditionally studied through literary analysis and historical context, modern computational techniques, particularly text mining, offer a fresh perspective on this timeless masterpiece.

Text Mining Divine Comedy

I would like to share some results of keyword detection applied to Dante’s Poem, simply because it is sheer pleasure to see these words together. One could ask, how do algorithms see the text? Here is the answer:

hope grace nature fire love exclaim truth circle flame woe heaven father mount beatrice blood head steep rock form ill death master fear land mortal virtue sun

«Dante’s journey through infernal depths, purging fires, and celestial heights, guided by love, reveals humanity’s quest for redemption, justice, and divine graceβ€”a timeless odyssey of soul and spirit», - a comment from ChatGPT ;)

That is Divine Comedy in 29 words. An oversimplification? Of course, but the words are good, they really transmit the energy of the text, even if it is an English translation. A vivid portrait of the Poem’s thematic essence. Love, heaven, grace, and virtue illuminate the spiritual journey depicted in the text. Beatrice, as a symbol of divine guidance and inspiration, as a central figure in the cosmic odyssey. Nature, truth, and light underscore the underlying philosophical and theological themes explored throughout the narrative. Earthly elements like rock, flame, and air juxtaposed with celestial imagery such as the sun and stars creates allegory and symbolism.

I tried experimenting with the Italian original, and I could immediately see the role of the word Amor in the Italian text! It simply stands out in the ‘heroes’ cluster.

In addition to online app to extract and detect keywords and build tag clouds, I also offer a 3d text visualization tool. You can use it to explore texts that were mined, or you can upload your data to it. First, you need to prepare your own csv file with the app, and just next step - upload it online to text visualizer. So, the pre-mined text of Dante is located here. You can explore the graph and download the dataset and the entire text (which you can paste into the interactive Keyword Extractor).

Try free online Keyword Extractor and Text Analyzer

View pre-mined Divine Comedy

Download Divine Comedy text for your own analysis

Purpose of Keyword Extractor and Text Analyzer

I created this text mining software so that anyone can investigate any text, in any language. You do not need to do complicated setups of Python or whatever scripts, just copy and paste the text in an online web-form. You can experiment interactively with different settings of the app to see how results differ depending on the mining method. The result above are for aggregated words, words that are «frequent rare events»: they happen not though out the book, but only mostly in some of its parts. This mining method used is called sqrt_kp.

Another metric, which is quite the opposite of the one before, is called Fisher Information.

hear forthwith woe exclaim circle father hope steep mortal close blood grace follow heaven fear mountain stream set evil fix flame rock reach beatrice fire nature aught mount air hence till

Some words intersect, but the difference is that Fisher Info is used for mining ‘background’ words, while the sqrt_kp method is aimed at extracting main characters and heroes. So, in contrast to the method of extracting main characters and heroes, which prioritizes frequent but selective words, the Fisher Information metric mines into the background of Dante’s Divine Comedy. This approach focuses on uncovering the underlying lexis, highlighting words that may not be as prominent but play a crucial role in shaping its thematic depth and narrative context.

The keywords extracted using Fisher Information, such as power, wish, mortal, and eternal, suggest the overarching themes of human ambition, aspiration, mortality, and transcendence. Beatrice, as a central figure, still emerges here, symbolizing divine guidance and the pursuit of higher truth. However, the Fisher Information metric allows to explore the broader landscape of Dante’s literary universe, and quantifies the amount of information contained in a dataset, measuring the sensitivity of a statistical model to changes in its parameters.

TF-IDF Method

How about traditional keyword extraction methods? Here are the results for TF-IDF method, common in information retrieval:

reach deep beam set course held call joy drew fix till true mortal head land desire master form wish holy straight evil fear air song fair flame mov seen

TF-IDF (Term Frequency-Inverse Document Frequency), provide a different perspective on Dante’s Divine Comedy. In this analysis, words like reach, deep, and beam are extracted, indicating themes of spiritual journey and enlightenment. However, compared to other methods, the results may seem less illuminating. Well, try for yourself and compare other methods of keyword detection using my text mining app, Keyword Extractor and Text Analyzer.

Read More Exploring Sacred Texts with Probabilistic Keyword Extractor

Utilizing quantitative methods in the study of religious and literary texts presents a learning curve for researchers primarily focused on traditional objectives such as interpretation, understanding, and exegesis, rather than programming or statistics. Therefore, expecting students in Humanities and Divinity to possess coding skills or conduct NLP may be unrealistic. This is why I developed a user-friendly probabilistic Keyword Extractor presented here. Its purpose is to empower researchers to leverage quantitative methods without the need for extensive technical expertise, enabling them to excel in their core strengths while benefiting from the insights provided by quantitative analysis.

Alexander Sotov

Text: Alexandre Sotov
Comments or Questions? Contact me on LinkedIn

𝕏   Facebook   Telegram

Other Posts:

Sentiment Analysis API

Semascope: Tool for Text Mining and Analysis

Track media sentiment with this app

Keyword Extractor and Text Analyzer - Help

Exploring Sacred Texts with Probabilistic Keyword Extractor

FAQ: Automated keyword detection, content extraction and text visualization

Make ChatGPT Content Undetectable with this App

ChatGPT Detector, a free online tool

The Intricate Tapestry of ChatGPT Texts: Why LLM overuses some words at the expense of others?

How to build word frequency matrix using AWK or Python

How to prepare your texts for creating a word frequency matrix

Intro to Automated Keyword Extraction

How to automatically tag posts in Hugo Static Site Generator with Python

Using Hugo and Goaccess to show most read posts of a static website

How and its semascope πŸ‘οΈ compare with traditional tag clouds?


What is this website?