Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Home

Data Analysis: Text Analysis

Learn how to interpret and explore data to use data confidently, find answers and make smart decisions.

What is Text Analysis? Why learn it?

Text Analysis is computational approach to studying digital texts which can be performed using a software tool or a programming language to analyze a corpus of text to identify meaningful patterns and key trends instead of physically reading through them.Below are few exciting things you could do with text mining!

  • Word Frequency: Examining prominence of terms across one or several texts
  • Parts of speech :Quantifying grammatical distinctions
  • Named-Entity Recognition : Detect proper nouns based on pre-defined categories person names,organization,locations,time, etc
  • Sentiment Analysis :Classifying texts as positive or negative in tone
  • Topic Modeling :Identifying themes across a body of texts

Easy to Use Tools for Text Mining

1. Voyant

Voyant is an online web-based tool for reading and analyzing your digital texts.Use it to perform lexical analysis including the study of frequency and distribution data and creating word clouds; in particular. 

Access Voyant :Voyant Tools

Getting started with Voyant : Support Guide


2. Google Books Ngram Viewer

Uses all digitized e-books in Google Books to create charts for searched words and phrases.

Access Google Books Ngram Viewer : Ngram Viewer


3. Wordle and Concordle

Word Clouds are a popular way of visualizing how important words are in a collection of texts. Wordle and Concordle help create customizable word clouds based on the user's text data.

Access Wordle: Wordle

Access Concordle: Concordle


4.Open Refine 

Open source software that allows users to clean their datasets prior to use.

Access Open Refine : Open Refine


5.Hathi Trust Research Center Analytics -

HathiTrust Research Center (HTRC) enables computational analysis of works in the HathiTrust Digital Library (HTDL) to facilitate non-profit research and educational uses of the collection. HTRC, which is co-located at Indiana University and the University of Illinois at Urbana-Champaign, engages in research and development for computational text analysis of massive digital libraries.

Access Hathi Trust : Hathi Trust Analytics

Teaching materials: HTRC

HathiTrust + BookwormFor visualizing trends in language overtime : Link HTRC + Bookworm

Advanced Tools using Programming

Tools Description Access the software
KNIME KNIME, the Konstanz Information Miner, is a free and open-source text analytics, reporting and integration platform. Also available in UAlbany Library Computing Sites Download
MALLET Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text Download
Python + NLTK Open source programming language widely used for manipulating and analyzing text data Download and Install
R Studio Open source analysis software that rely on community driven packages to mine data. Also available in UAlbany Library Computing Sites Download
WEKA Weka contains a collection of visualization tools and algorithms for text analysis and predictive modeling Download
Rapid Miner An open source software useful to extract insight from unstructured text content Download

Help to get you started

  • Discover other tools and softwares for text mining using TAPoR -Link
  • Where to start with Text Mining ? - Link
  • Text Mining with Python and R - Link
  • Natural Language Processing with Python -Link
  • Text Mining with KNIME -Link