Words
Speech to text systems, such as most dictation features in mobile phones, as well as the auditory cortex of our brains translates audio into words or tokens.
Phrases
From the auditory cortex, these word signals travel to the Wernickes Area where it gets parsed into phrases and small syntactical structures.
Sentences
Because theres no upper limit on sentence lengths, it is difficult to say where exactly the information goes from the Wernickes Area. But brain imaging techniques have shown that it does seem to go to the Angular Gyrus, a central hub, where it gets sent to multiple places including the Prefrontal, other parts of the Parietal and Insular cortices.
Paragraphs
As you can see the number of connections between words in a paragraph start to become too much to compute at a certain level. But our Prefrontal Cortex seems to also be able to order a sequence of events and ideas. It seems that our brains like to chunk ideas together into units as a way to deal with complexity. Chunking and sequencing...
Corpus
In a corpus, lets say even in a small book, there are too many connections to handle with standard computing. And increasingly we are getting better results with supercomputing or distributed computing. On the other hand our brain seems to be able to abstract ideas into stories and ideas, while being extremely lossy with the details to reach natural language understanding.
Blogs, tutorials, articles and essays about math,
coding and natural language processing.
Text Segmentation
Normalization, Tokenization, Sentence Segmentation + Useful Methods
What does normalizing a text do? We have previously called this method .lower() to turn all of the words lowercase, so that strings like “the” and “The” both become “the”, so we don’t double count them.
More Stories
Inputting & PreProcessing Text
Input Methods, String & Unicode, Regular Expression Use Cases
NLTK has preprocessed texts. But we can also import and process our own texts. Importing from __future__ import division import nltk, re, pprint To Import a Book as a Txt Install urlopen: !pip install urlopen And:
What are Context Free Languages?
Grammars, Derivation, Expressiveness, Chomsky Hierarchy
Previously, we talked about how languages are studied using the notion of a formal language. Formal language is a mathematical construction that uses sets to describe a language and understand its properties.