The source code for this blog is available on GitHub.

Blogs, tutorials, articles and essays about math,
coding and natural language processing.

Cover Image for Text Segmentation

Text Segmentation

Normalization, Tokenization, Sentence Segmentation + Useful Methods

What does normalizing a text do? We have previously called this method .lower() to turn all of the words lowercase, so that strings like “the” and “The” both become “the”, so we don’t double count them.

Jake Batsuuri
Jake Batsuuri
25 min read

More Stories

Cover Image for Inputting & PreProcessing Text

Inputting & PreProcessing Text

Input Methods, String & Unicode, Regular Expression Use Cases

NLTK has preprocessed texts. But we can also import and process our own texts. Importing from __future__ import division import nltk, re, pprint To Import a Book as a Txt Install urlopen: !pip install urlopen And:

Jake Batsuuri
Jake Batsuuri
22 min read
Cover Image for What are Context Free Languages?

What are Context Free Languages?

Grammars, Derivation, Expressiveness, Chomsky Hierarchy

Previously, we talked about how languages are studied using the notion of a formal language. Formal language is a mathematical construction that uses sets to describe a language and understand its properties.

Jake Batsuuri
Jake Batsuuri
11 min read