The source code for this blog is available on GitHub.

We announced a new product!


What are Context Free Languages?

Cover Image for What are Context Free Languages?
Jake Batsuuri
Jake Batsuuri
min read

Previously, we talked about how languages are studied using the notion of a formal language. Formal language is a mathematical construction that uses sets to describe a language and understand its properties.

We introduced the notion of a string, which is a word or sequence of characters, symbols or letters. Then we formally defined the alphabet, which is a set of symbols. The alphabet often goes hand in hand with the language because we define a formal language as a set of strings over a unique alphabet.

Then we explored some operations on the string.

Then we explored some operations on languages.

Exploring operations naturally leads us to the properties of the languages and the resulting languages from these operations.

Two main topics of the article were regular expressions and finite-state automata. Which are computational paradigms that help us express or generate regular languages.

Then we explored the properties of the finite state automata itself, we said that the FSA is an information generating machine, and this process of information generation can be deterministic or non deterministic.

And the main topic of the article was to explore the operations on FSA, and whether they are closed under these operations.

Upon proving that these operations, such as concatenation, are closed we apply them to morphology. In the sense that we can define a FSA that modularly build affixes to form words.

Beyond Regular

Eventually we would like to study and understand natural languages, such as English. So far we learned about regular languages, which are neat, but we know English can’t be a regular language.

How do we know English is not a Regular Language?

If we consider English as just a set of words, which it is.

And we construct regular expressions for each word, which we technically could.

English is a regular language, since union of regular languages is a regular language.

**Union of two regular languages - Wikipedia***In formal language theory, and in particular the theory of nondeterministic finite automata, it is known that the union…*

Higher Linguistic Abstractions

But the problem arises when we realize that English is more than just set of words. For example, take the sentence “I ate an apple”, which is a really simple sentence.

If English was a set of words, then order of a group of words shouldn’t matter.

  [ 'i', 'ate', 'an', 'apple' ],
  [ 'ate', 'i', 'an', 'apple' ],
  [ 'an', 'i', 'ate', 'apple' ],
  [ 'i', 'an', 'ate', 'apple' ],
  [ 'ate', 'an', 'i', 'apple' ],
  [ 'an', 'ate', 'i', 'apple' ],
  [ 'apple', 'ate', 'i', 'an' ],
  [ 'ate', 'apple', 'i', 'an' ],
  [ 'i', 'apple', 'ate', 'an' ],
  [ 'apple', 'i', 'ate', 'an' ],
  [ 'ate', 'i', 'apple', 'an' ],
  [ 'i', 'ate', 'apple', 'an' ],
  [ 'i', 'an', 'apple', 'ate' ],
  [ 'an', 'i', 'apple', 'ate' ],
  [ 'apple', 'i', 'an', 'ate' ],
  [ 'i', 'apple', 'an', 'ate' ],
  [ 'an', 'apple', 'i', 'ate' ],
  [ 'apple', 'an', 'i', 'ate' ],
  [ 'apple', 'an', 'ate', 'i' ],
  [ 'an', 'apple', 'ate', 'i' ],
  [ 'ate', 'apple', 'an', 'i' ],
  [ 'apple', 'ate', 'an', 'i' ],
  [ 'an', 'ate', 'apple', 'i' ],
  [ 'ate', 'an', 'apple', 'i' ]

So all these permutations of the words should have the same meaning. But it doesn’t. Consider the bolded permutations.

  • I ate an apple.
  • Apple ate an I.

If we disregard the “I/me” rule, it’s kinda clear that the meaning changes. Furthermore, it's also clear many of these permutations are meaningless.

So order matters. Which makes sentences a sequence, not a set.

But what is a sentence? Sentence is a group of words where repetition is allowed and order matters.

Need for Rules

Notice that when we have a sentence of 4 words, we have 4! ways to form sentences. So out of the 24 different ways to order the words, only 1 really makes sense.

For an average sentence length of 20 words in modern English, we find that is 2432902008176640000 different permutations to check. This is not really computationally tractable for computers or brains. Hence the need for ways to speed up sentence creation.

The solution to create well formed sentences is rules or grammars. Think of these as computational shortcuts.


Set of rules that manipulate the symbols.


Elements of the target language

Elements of some finite set:

  • Words
  • Letters


Auxiliary symbols that facilitate the specification

Syntactic categories:

  • Sentence
  • Noun phrase
  • Verb phrase


A rule is a non-empty sequence of symbols, a mixture of terminals and non-terminals, with the only requirement that the first element in the sequence be a non-terminal one (alternatively, one can define a rule as an ordered pair whose first element is a non-terminal symbol and whose second element is a sequence of symbols)

Given this we are really creating recipes for creating sentences and phrases.

We write such rules with a special symbol, ‘→,’ separating the distinguished leftmost non-terminal from the rest of the sequence. The leftmost non-terminal is sometimes referred to as the head of the rule, while the rest of the symbols are called the body of the rule.

  • Non-terminal → Terminal
  • Head of the rule → Body of the rule

For natural languages it is the case that the body is either all terminal or non terminal, meaning they don’t mix ever. But there is nothing in the formalism that requires such a rule for grammars.

Given the terminals {the, cat, in, the, hat} and non terminals {D, N, P, NP, PP}, where:

  • D is determiner
  • N is noun
  • P is preposition
  • NP is noun phrase
  • PP is preposition phrase

Class I (all terminal) examples of rules:

  • D → the
  • N → cat
  • N → hat
  • P → in

Class II (all non-terminal) examples of rules:

  • NP → D N
  • PP → P NP
  • NP → NP PP

Context Free Grammars

DEFINITION 1. Context Free Grammar, CFG is a four tuple G = <V, Σ, P, S>, where:

  • V is a finite set of non-terminal symbols
    • S ∈ V is the start symbol
  • Σ is an alphabet of terminal symbols
  • P ⊂ V × (V ∪ Σ)* is a set of rules

Generally we introduce grammars by listing their rules only, which is denoted by P.


DEFINITION 2. Let G = <V, Σ, P, S> be a grammar. The set of forms induced by G is (V ∪ Σ)*:

  • A form α immediately derives a form β, denoted by α ⇒ β, if and only if there exist γₗ, γᵣ ∈ (V ∪ Σ)* such that α = γₗAγᵣ and β=γₗγγᵣ and A → γ is a rule in P
  • A is called the selected symbol

The the set of terminal and non terminal symbols are all forms. So:

  • NP
  • D
  • the
  • cat
  • ...

Are all forms.

DEFINITION 3. The derivation relation, denoted $\alpha \stackrel{}{\Rightarrow} \beta$ is defined recursively as follows: $\alpha \stackrel{}{\Rightarrow} \beta$ if α = β, or if α ⇒ γ and $\gamma\stackrel{*}{\Rightarrow} \beta$.

This is an odd recursive definition because what happens when you consider $\gamma\stackrel{*}{\Rightarrow} \beta$ now?

To use this we say:

  • $\langle NP \rangle {\Rightarrow} \langle \textrm{D N} \rangle$
    • $\langle \textrm{D N} \rangle {\Rightarrow} \langle \textrm{the N} \rangle$
      • $\langle \textrm{the N} \rangle {\Rightarrow} \langle \textrm{the cat} \rangle$


  • $\langle \textrm{D N} \rangle \stackrel{*}{\Rightarrow} \langle \textrm{the N} \rangle$

So what do forms and derivations have anything to do with context free languages?

We formally define these concepts so that we can define the language generated by a context free grammar.

DEFINITION 4. Let G = <V, Σ, P, S> be a grammar. The language of a non-terminal A$\in$V is:

$$L_G(A)={a_1... a_n |\textrm{ } a_i \in \Sigma \textrm{ for } 1\le i \le n \textrm{ and } \langle A \rangle \stackrel{*}{\Rightarrow} \langle a_1, ..., a_n \rangle } $$

Language of the grammar G is $L(G) = L_G(S)$

This basically says that the language generated by a grammar is denoted by its start symbol.

We know that a formal language has an alphabet and is a set of words. And regular languages are a type of formal language, so the complement of regular language in the set of formal languages would be all the languages that are not regular. So what types of languages are in this complement set?

EXAMPLE 1. Set of possible rules over these two sets:

  • V = {D, N, P, NP, PP},
  • Σ = {the, cat, in, hat },


  • D → the
  • N → cat
  • N → hat
  • P → in
  • NP → D N
  • PP → P NP
  • NP → NP PP

EXAMPLE 2. Let grammar be G = <V, Σ, P, S> where:

  • V = {D, N, P, NP, PP},
  • Σ = {the, cat, in, hat },
  • P is the set of rules, and
  • the start symbol S is NP.

Derivation Trees

These are sometimes called parse trees, they help us visualize what the derivations are doing. You can think of them as imposing a structure or form on words.

Some tree terminology:

  • Vertex is each node
    • Root is the top most node, which is the start symbol S
    • If a vertex has an outgoing branch, then it must be a non-terminal symbol
      • This implies that this parent node is the head of the rule and the child node or children are the bodies of the rule
    • Leaf is a node with no outgoing branches
      • Derivation trees should have a natural or left to right order,
        • If you read the leaves like this, we call this the yield of the tree


We can express ambiguity using derivation trees. Consider these sentences:

  • (the cat in the hat) in the hat
  • the cat in (the hat in the hat)

These have different meanings even though without the brackets they're exactly the same sentence.

We call this structural or syntactic ambiguity.


We will show here that we can express all regular expressions using grammars, and show that grammars can generate even more abstractions, implying that the class of regular languages is contained within the class of context free languages.

We say that context free languages are more expressive than regular languages.

Right Linear Grammar

In this type of grammar, the body of the rule consists of a sequence of terminal symbols:

  • i.e. Σ = {the, cat, in, hat }

optionally followed by one non-terminal. Restricting that there should never be more than one non-terminal in a body.

If there is then that is a leaf node.

This sub-type of grammar generates all and only regular languages.

Chomsky Hierarchy

We learned so far about context free grammars and right linear grammars and the languages they generate.

If we relax our definition of context free grammars, we can introduce context sensitive grammars, these have even more expressiveness than CFGs.

Context Sensitive Languages

In the head of the body, we not only have the head, we would also have the context. The formalism that corresponds to this class of languages is called the Context Sensitive Grammar,

Note that context free languages are a special case of context sensitive languages, where the context is empty.

Recursively Enumerable Languages

Now above the context sensitive grammars is the general phrase-structure grammars also called unrestricted grammars, which generate the recursively enumerable languages.

Natural Languages

Natural languages are mostly context free and only some specific constructions in natural languages necessitate us to use context sensitive grammars. This motivated us to develop the mildly context sensitive category.

Weak Generative Capacity

Weak generative capacities are a type of property of computational formalisms. For example when we say "Weak generative capacity of regular expressions is insufficient for generating English, we are saying "regular expressions can't generate all the words or sentences in English".

Strong Generative Capacity

So far we have considered only the generation of words or sentences by grammars. But the structure is as important if not more. Strong generative capacity helps us weakly generate all the strings in the language AND assign correct structures to them.

Mildly Context Sensitive Languages

These are almost an extension of the Context Free Languages, this class of languages have the properties:

  • properly contains all the context free languages
  • can be parsed in polynomial time
  • can properly account for the construction in natural languages that CFGs can't
    • cross serial dependencies
  • has the linear growth property

Tree Adjoining Grammars

TAG constructs larger trees from smaller ones, so that basic operations that take place during derivations are not limited to string concatenation.

These operations facilitate nesting of one tree within another, resulting in extended expressiveness.

Natural Language Grammars

Several other grammars are proposed as adequate for expressing natural languages:

  • Head Grammars
  • Linear Indexed Grammars
  • Combinatory Categorical Grammars

All three are weakly equivalent to TAG, which could imply that TAG may just be the correct formalism class in which all natural languages reside.

Proposed Approach To Learning Grammar

We know that given a large corpus we can find probability sequences to words. To illustrate this, consider the word “there…”. To guess what word comes next, we can take some candidates:

  • is
  • are
  • exists
  • cow

We know “there is…”, “there are…”, “there exists…” are all pretty probable, but “there cow…” doesn’t really make sense.

So we can create 2 word probability chains.

If we imagine creating a long chain from something like this. The odds are, our sentence will either converge to a repeating loop, or diverge to complete nonsense.

To make a meaningful sentence, our word generator should take into account:

  • 2 word probability
  • phrase probability
  • sentence context
  • paragraph memory

These pockets of localized high probability sequence of words can be a computational way to learn grammar without resorting to part of speech tags.