Show / Hide Table of Contents

Class EarleyParser

The main class responsible for parsing using the Earley algorithm (Earley 1970, Stolcke 1995).

Inheritance
object
EarleyParser
EarleyGenerator
Inherited Members
object.Equals(object)
object.Equals(object, object)
object.GetHashCode()
object.GetType()
object.MemberwiseClone()
object.ReferenceEquals(object, object)
object.ToString()
Namespace: EarleyParser
Assembly: EarleyParser.dll
Syntax
public class EarleyParser
Remarks

This is a chart parser with dynamic programming that builds a parse table where:

  • Each column corresponds to a position in the input
  • Each state represents a partially matched rule
  • Three operations (predict, scan, complete) fill the table

The parser implements the following key features:

  1. Dynamic programming through a chart structure
  2. Support for both context-free and linear indexed grammars
  3. Handling of ambiguous parses
  4. Generation of all possible parse trees
  5. Analysis of parse tree properties (recursion, basic trees)

The parser can be used in two modes:

  1. Parser mode: Parses a given input sentence according to the grammar
  2. Generator mode: Produces all possible sentences of a grammar up to a specified length

The implementation follows Stolcke's 1995 optimizations for handling ambiguity efficiently.

Constructors

| Edit this page View Source

EarleyParser(Grammar, Vocabulary, string[], int)

Initializes a new instance of the EarleyParser class.

Declaration
public EarleyParser(Grammar g, Vocabulary v, string[] text, int maxWords = 0)
Parameters
Type Name Description
Grammar g

The context-free grammar to use for parsing

Vocabulary v

The vocabulary that maps words to their possible parts of speech

string[] text

The input tokens to parse

int maxWords

Maximum length when generating sentences. Only used when calling GenerateSentence(). Default is 0, which means no generation.

Fields

| Edit this page View Source

Grammar

Gets the grammar used for parsing.

Declaration
public Grammar Grammar
Field Value
Type Description
Grammar
| Edit this page View Source

Voc

Gets the vocabulary that maps words to their possible parts of speech.

Declaration
protected Vocabulary Voc
Field Value
Type Description
Vocabulary
| Edit this page View Source

_table

Gets the Earley parsing table containing columns for each position in the input.

Declaration
protected EarleyColumn[] _table
Field Value
Type Description
EarleyColumn[]

Methods

| Edit this page View Source

AddLexicalizedRules(List<Rule>, int, bool)

Adds lexicalized rules to the grammar.

Declaration
protected virtual void AddLexicalizedRules(List<Rule> lexicalRules, int i, bool preprocess = true)
Parameters
Type Name Description
List<Rule> lexicalRules

The list of lexical rules to add.

int i

The index of the current column.

bool preprocess

Whether to preprocess the rules before adding them.

| Edit this page View Source

CountDerivations()

Counts the number of distinct parse trees that can be derived for the input.

Declaration
public int CountDerivations()
Returns
Type Description
int

The total number of valid parse trees for the input

| Edit this page View Source

CountLengthsOfBasicTrees()

Analyzes the parse trees to determine the counts of lengths of all "basic trees" in the derivation. A basic tree is a non-pumped tree according to the pumping lemma.

Declaration
public List<int> CountLengthsOfBasicTrees()
Returns
Type Description
List<int>

A list of lengths for all basic trees found in the parse, or null if no parse exists

| Edit this page View Source

GenerateSentence()

Generates all possible trees for the grammar up to input length n.

Declaration
public void GenerateSentence()
Remarks

This method builds the complete parse forest for all possible sentences up to the length specified in the constructor. The generated trees can be retrieved using GetFormattedString() to get either bracketed representations or sequences of parts of speech.

Exceptions
Type Condition
TooManyEarleyItemsGeneratedException

Thrown when the number of generated states exceeds the maximum allowed threshold.

| Edit this page View Source

GetCompletedStartNonterminal(int)

Retrieves the completed parse node for the start symbol of the grammar.

Declaration
public EarleySpan GetCompletedStartNonterminal(int columnIndex = 0)
Parameters
Type Name Description
int columnIndex

The offset from the end of the input to look for the completed parse. Default is 0, which means the entire input.

Returns
Type Description
EarleySpan

The EarleySpan representing the completed parse, or null if no such parse exists

| Edit this page View Source

GetFormattedString(int, bool)

Gets a formatted string representation of the parse tree.

Declaration
public string[] GetFormattedString(int columnIndex = 0, bool onlyPartsOfSpeechSequences = false)
Parameters
Type Name Description
int columnIndex

The index of the column to get the string from.

bool onlyPartsOfSpeechSequences

Whether to only include parts of speech in the output.

Returns
Type Description
string[]

Array of string representations of the parse tree.

| Edit this page View Source

GetPOSRulesInEachDerivation(int)

Extracts all part-of-speech rules used in each possible parse derivation.

Declaration
public List<HashSet<Rule>> GetPOSRulesInEachDerivation(int i)
Parameters
Type Name Description
int i

The derivation index to analyze (when there are multiple parses)

Returns
Type Description
List<HashSet<Rule>>

A list of rule sets, where each set contains the POS rules for one derivation

| Edit this page View Source

GetPossibleSyntacticCategoriesForToken(string)

Gets the possible syntactic categories for a given token.

Declaration
protected virtual HashSet<string> GetPossibleSyntacticCategoriesForToken(string nextScannableTerm)
Parameters
Type Name Description
string nextScannableTerm

The token to get categories for.

Returns
Type Description
HashSet<string>

Set of possible syntactic categories for the token.

| Edit this page View Source

HasDerivation()

Determines whether there exists a valid derivation for the input.

Declaration
public bool HasDerivation()
Returns
Type Description
bool

True if there is at least one valid derivation; otherwise, false.

| Edit this page View Source

ParseSentence()

The main parsing loop that processes all states in the Earley chart.

Declaration
public (bool, string[]) ParseSentence()
Returns
Type Description
(bool, string[])

A tuple containing:

  • bool: true if parse was accepted, false if rejected due to state count threshold
  • string[]: array of bracketed string representations of all valid parses
Remarks

The method traverses the chart in two phases:

  1. First processes completed states (priority queue ordered by decreasing start indices)
  2. Then processes predicted states (queue of non-terminals to predict)

Epsilon transitions may cause looping back to completion phase. Parsing is rejected if the number of completed states in any column exceeds the maximum threshold.

| Edit this page View Source

PrepareEarleyTable(string[], int)

Creates and initializes the Earley parsing table with columns for each position in the input. The first column (index 0) is initialized with an empty token.

Declaration
protected virtual EarleyColumn[] PrepareEarleyTable(string[] text, int maxWord)
Parameters
Type Name Description
string[] text

The input tokens to be parsed

int maxWord

The maximum length when generating sentences (used in generation mode)

Returns
Type Description
EarleyColumn[]

An array of EarleyColumn objects representing the parse table

  • Edit this page
  • View Source
In this article
Back to top Generated by DocFX