Find remarkable words that carry special meaning in Sacred Texts

In the vast corpus of human literature, few genres hold as much significance and depth as Sacred Texts. Across cultures and civilizations, these timeless writings offer profound insights into spirituality, morality, and the human condition. For centuries, scholars and seekers have studied these texts to unravel their mysteries and gain the wisdom they contain. Today we offer an advanced Keyword Extractor and Text Analyzer to aid in the empirical research and exegesis of religious texts.

Project Gutenberg as a source of texts

Text mining is a computational technique that allows us to analyze and extract not so evident information from large bodies of text. By applying advanced algorithms and statistical models, text mining can help uncover patterns, relationships, and insights that may not be immediately apparent to the human eye and ear.

In the study of sacred texts, text mining opens up new perspectives of exploration and discovery; in a way, this is a kind of empirical theology. Through this approach, we can go deeper into ancient wisdom contained within these texts, accessing hidden meanings, thematic connections, and linguistic nuances.

Project Gutenberg, a large digital library of free eBooks, has scriptures from various traditions, including, of course the old Indian. Here are just a few examples:

  1. Bhâgavata Purâna
  2. The Yoga Sutras of Patanjali
  3. Maha-bharata, The Epic of Ancient India Condensed into English Verse
  4. The Mahabharata of Krishna-Dwaipayana Vyasa Translated into English Prose
  5. The Upanishads

These texts, although they are translations, contain profound knowledge of the Hindu philosophy, spirituality, and practice.

Similarly, on Project Gutenberg one can easily find scriptures from other traditions.

How to extract keywords

To demonstrate the power of text mining in exploring sacred texts, let us take The Yoga Sutras of Patanjali. This seminal text, attributed to the sage Patanjali, outlines the principles and practices of yoga. It is a guidance on meditation, concentration, and spiritual realization.

With Keyword Extractor and Text Analyzer you can instantly mine the entire text of The Yoga Sutra - or any other book for that matter. Find the book you would like to analyze (such as Yoga Sutra). On Project Gutenberg, select Plain Text UTF-8. Open the link and copy the whole text.

Paste the text into the app, where it writes «Paste text here». Remove the introduction and anything at the end that comes after «END OF THE PROJECT GUTENBERG EBOOK». Press Run to start analysis.

You can repeat the analysis after adjusting parameters such as text chunk size and word frequency. Tailor settings to focus on specific aspects of the text, whether it be common themes, key concepts, or linguistic patterns. For larger texts try setting the word frequency to larger numbers. Use default text chunk size or set it to values like 100, 600, 1000, 3000 - it is like zooming in and out into reader’s attention span. By experimenting with different text mining methods you can gain a better, nuanced understanding of the text’s structure, topic, themes, and lexis. Keyword Extractor and Text Analyzer - Help uses various probabilistic methods of analyzing text, some of them, like TF-IDF and Entropy, are common in corpus linguistics and information retreaval, while others are based on innovative research.

More cases for analysis

Furthermore, by comparing the results of mining experiments across different sacred texts, we can identify common motifs, philosophical principles, and linguistic patterns that transcend cultural and religious boundaries. This comparative approach can enrich human understanding of the universal truths expressed in these texts, their relevance and significance.

You can even put several different texts together, even from different traditions, into same input, especially if you are looking for synonyms across traditions (words that share traits in different Books).

Press Download to save results locally as a csv file.

You will see that Keyword Extractor and Text Analyzer is a sophisticated tool for exploring and unlocking the knowledge contained within texts. You can use original texts (not only translations) in any language, but mind that in languages like Sanskrit word boundaries are affected by sandhi. In classical Sanskrit texts, in the Vedas and other ancient scriptures, words may be concatenated without spaces, requiring knowledge of grammar and morphology to identify word boundaries accurately. So, for Vedas it is suggested to use Padapāṭha texts in any word frequency-based analysis. Do try any texts to see what results you can get.

After running tests in Keyword Extractor and Text Analyzer, make sure to save the data as csv file by pressing Download. The file is downloaded to your device. Do not modify the file. Now you can upload this file to our Semascope viewer and explore a 3D plot of how words relate to each other. The 3d plot viewer is available here: View 3d plot.

Text mining analysis of Religious Texts

You can extract keywords online interactively with Keyword Extractor and Text Analyzer. Simply copy and paste the text that you want to analyze, save results as csv data file by pressing Download on the Keyword Extractor page, then go to the 3d plot viewer page and upload it to semascope.

Natural Language Processing (NLP) and text mining of Religious Texts is becoming increasingly common, and here is why. The study of religious texts is typically conducted by scholars of literature, philosophy, and Divinity. Through qualitative and critical research methods, scholars study religious beliefs and practices. In contrast, traditional quantitative research may oversimplify complex meanings and symbolism that are important for human understanding of the text. On the other hand, modern AI and information retrieval systems employ automated lexical analysis to extract and index words and phrases from documents, facilitating efficient searching. Computer analysis offers speed and breadth that is unattainable by human scholars. This is why, despite its limitations, computer analysis can process entire books rapidly, generating quantitative data for further human analysis and consideration. This approach has seen significant advancements, particularly with the digitization of texts in the last couple of decades.

There is a learning curve for anyone willing to employ quantitative methods in their research of religious texts. The primary purpose of their research has traditional objectives, they deal with interpretation, understanding, and exegesis. Not with programming, not with statistics. So it would be an overstatement to expect students in Humanities and Divinity to be prepared to code and ‘do’ NLP, to put together and test an information system for their specific purpose of qualitative analysis. This is why I wrote a user-friendly probabilistic Keyword Extractor presented here: so that others could do what they are best at - but with the help of quantitative methods.

Just searching for ‘text mining of religious texts’ shows high interest to this topic. There are many contributions, and their titles speak for themselves, here are just a few examples:

  1. A Text Mining Discovery of Similarities and Dissimilarities Among Sacred Scriptures
  2. A Text Mining Analysis of Religious Texts
  3. Exploring the Bible through NLP: A new approach to understanding scripture
  4. Empirical theology in texts and tables : qualitative, quantitative and comparative perspectives

The list can be easily continued with a lot of articles appearing in International Journal of Digital Humanities and other editions.

Hopefully, with the accessible Keyword Extractor and Text Analyzer presented here, more scholars and seekers will turn to computer methods in the exegesis of sacred texts.

Alexander Sotov

Text: Alexandre Sotov
