The source code for this blog is available on GitHub.

Words

Speech to text systems, such as most dictation features in mobile phones, as well as the auditory cortex of our brains translates audio into words or tokens.

Phrases

From the auditory cortex, these word signals travel to the Wernickes Area where it gets parsed into phrases and small syntactical structures.

Sentences

Because theres no upper limit on sentence lengths, it is difficult to say where exactly the information goes from the Wernickes Area. But brain imaging techniques have shown that it does seem to go to the Angular Gyrus, a central hub, where it gets sent to multiple places including the Prefrontal, other parts of the Parietal and Insular cortices.

Paragraphs

As you can see the number of connections between words in a paragraph start to become too much to compute at a certain level. But our Prefrontal Cortex seems to also be able to order a sequence of events and ideas. It seems that our brains like to chunk ideas together into units as a way to deal with complexity. Chunking and sequencing...

Corpus

In a corpus, lets say even in a small book, there are too many connections to handle with standard computing. And increasingly we are getting better results with supercomputing or distributed computing. On the other hand our brain seems to be able to abstract ideas into stories and ideas, while being extremely lossy with the details to reach natural language understanding.

Blogs, tutorials, articles and essays about math,
coding and natural language processing.

Cover Image for Text Segmentation

Text Segmentation

Normalization, Tokenization, Sentence Segmentation + Useful Methods

What does normalizing a text do? We have previously called this method .lower() to turn all of the words lowercase, so that strings like “the” and “The” both become “the”, so we don’t double count them.

Jake Batsuuri
Jake Batsuuri
25 min read

More Stories

Cover Image for Inputting & PreProcessing Text

Inputting & PreProcessing Text

Input Methods, String & Unicode, Regular Expression Use Cases

NLTK has preprocessed texts. But we can also import and process our own texts. Importing from __future__ import division import nltk, re, pprint To Import a Book as a Txt Install urlopen: !pip install urlopen And:

Jake Batsuuri
Jake Batsuuri
22 min read
Cover Image for What are Context Free Languages?

What are Context Free Languages?

Grammars, Derivation, Expressiveness, Chomsky Hierarchy

Previously, we talked about how languages are studied using the notion of a formal language. Formal language is a mathematical construction that uses sets to describe a language and understand its properties.

Jake Batsuuri
Jake Batsuuri
11 min read