Python Notes for mca i year students osmania university.docx
frontmatter.pptx
1. TextAnalysis with Python and NLTK
Abstract
Digital technologies have made vast amounts of text available to researchers, and this same technological moment has
provided us with the capacity to analyze that text faster than humanly possible. The first step in that analysis is to
transform texts designed for human consumption into a form a computer can analyze. Using Python and the Natural
Language ToolKit (commonly called NLTK), this workshop introduces strategies to turn qualitative texts into
quantitative objects. Through that process, we will present a variety of strategies for simple analysis of text-based data.
Learning Objectives
In this workshop, you will learn skills like:
• How to prepare texts for computational analysis, including strategies for transforming texts into numbers
• How to use NLTK methods such as concordance and similar
• How to clean and standardize your data, including powerful tools such as stemmers and lemmatizers
• Compare frequency distribution of words in a text to quantify the narrative arc
• Understand stop words and how to remove them when needed.
• Utilize Part-of-Speech tagging to gather insights about a text
• Transform any document that you have (or have access to) in a .txt format into a text that can be analyzed
computationally
• How to tokenize your data and put it in a format compatible with Natural Language Toolkit.
Estimated time
10 hours
Prerequisites
• Introduction to Python (required) This workshop relies heavily on concepts from the Python workshop, and having
a basic understanding of how to use the commands discussed in the workshop will be central for anyone who
wants to learn about text analysis with Python and NLTK.
• Introduction to the Command Line (recommended) This workshop makes some reference to concepts from the
Command Line workshop, and having basic knowledge about how to use the command line will be central for
anyone who wants to learn about text analysis with Python and NLTK.
• Short introduction to Jupyter Notebooks (recommended) This workshop uses Jupyter Notebooks to process the
Python commands in a clear and visual way. Anyone who wants to follow along in the workshop on text analysis
with Python and NLTK should read this very short introduction to how to use Notebooks.
• Installing Python (and Anaconda) (required) This workshop uses Python and you will need to have a Python
installation. If you choose to install a different version of Python, make sure it is version 3 as other versions will
not work with our workshop.
• Installing Natural Language Toolkit (required)You will need to install the NLTK package into your Python
packages for the purposes of this workshop. This guide will help you along the way.
Contexts
Pre-reading suggestions
• A Beginner’s Tutorial to Jupyter Notebooks
• What is text analysis
Projects that use these skills
• Short list of academic Text & Data mining projects
• Building a Simple Chatbot from Scratch in Python
• Classifying personality type by social media posts
2. Ethical Considerations
• In working with massive amounts of text, it is natural to lose the original context. We must be aware of that and be
careful when analizing it.
• It is important to constantly question our assumptions and the indexes we are using. Numbers and graphs do not
tell the story, our analysis does.We must be careful not to draw hasty and simplistic conclusions for things that are
complex. Just because we found out that authorA uses more unique words than author B, does it mean thatA is a
better writer than B?
Cheat Sheets
• Jupyter Notebook shortcuts, tips and tricks
Acknowledgements
• Current author: Rafael Davis Portela
• Past contributor: Michelle McSweeney
• Past contributor: Rachel Rakov
• Past contributor: KalleWesterling
• Past contributor: Patrick Smyth
• Past contributor: HannahAizenman
• Past contributor: Kelsey Chatlosh
• Past reviewer: Filipa Calado
• Current editor: Lisa Rhody
• Current editor: KalleWesterling