Class EarleyParser

Parser mode: Parses a given input sentence according to the grammar
Generator mode: Produces all possible sentences of a grammar up to a specified length

The implementation follows Stolcke's 1995 optimizations for handling ambiguity efficiently.

Constructors

| Edit this page View Source

EarleyParser(Grammar, Vocabulary, string[], int)

Initializes a new instance of the EarleyParser class.

Declaration

public EarleyParser(Grammar g, Vocabulary v, string[] text, int maxWords = 0)

Parameters

Type	Name	Description
Grammar	g	The context-free grammar to use for parsing
Vocabulary	v	The vocabulary that maps words to their possible parts of speech
string[]	text	The input tokens to parse
int	maxWords	Maximum length when generating sentences. Only used when calling GenerateSentence(). Default is 0, which means no generation.

Fields

| Edit this page View Source

Grammar

Gets the grammar used for parsing.

Declaration

public Grammar Grammar

Field Value

Type	Description
Grammar

| Edit this page View Source

Voc

Gets the vocabulary that maps words to their possible parts of speech.

Declaration

protected Vocabulary Voc

Field Value

Type	Description
Vocabulary

| Edit this page View Source

_table

Gets the Earley parsing table containing columns for each position in the input.

Declaration

protected EarleyColumn[] _table

Field Value

Type	Description
EarleyColumn[]

Methods

| Edit this page View Source

AddLexicalizedRules(List<Rule>, int, bool)

Adds lexicalized rules to the grammar.

Declaration

protected virtual void AddLexicalizedRules(List<Rule> lexicalRules, int i, bool preprocess = true)

Parameters

Type	Name	Description
List<Rule>	lexicalRules	The list of lexical rules to add.
int	i	The index of the current column.
bool	preprocess	Whether to preprocess the rules before adding them.

| Edit this page View Source

CountDerivations()

Counts the number of distinct parse trees that can be derived for the input.

Declaration

public int CountDerivations()

Returns

Type	Description
int	The total number of valid parse trees for the input

| Edit this page View Source

CountLengthsOfBasicTrees()

Analyzes the parse trees to determine the counts of lengths of all "basic trees" in the derivation. A basic tree is a non-pumped tree according to the pumping lemma.

Declaration

public List<int> CountLengthsOfBasicTrees()

Returns

Type	Description
List<int>	A list of lengths for all basic trees found in the parse, or null if no parse exists

| Edit this page View Source

GenerateSentence()

Generates all possible trees for the grammar up to input length n.

Declaration

public void GenerateSentence()

Remarks

This method builds the complete parse forest for all possible sentences up to the length specified in the constructor. The generated trees can be retrieved using GetFormattedString() to get either bracketed representations or sequences of parts of speech.

Exceptions

Type	Condition
TooManyEarleyItemsGeneratedException	Thrown when the number of generated states exceeds the maximum allowed threshold.

| Edit this page View Source

GetCompletedStartNonterminal(int)

Retrieves the completed parse node for the start symbol of the grammar.

Declaration

public EarleySpan GetCompletedStartNonterminal(int columnIndex = 0)

Parameters

Type	Name	Description
int	columnIndex	The offset from the end of the input to look for the completed parse. Default is 0, which means the entire input.

Returns

Type	Description
EarleySpan	The EarleySpan representing the completed parse, or null if no such parse exists

| Edit this page View Source

GetFormattedString(int, bool)

Gets a formatted string representation of the parse tree.

Declaration

public string[] GetFormattedString(int columnIndex = 0, bool onlyPartsOfSpeechSequences = false)

Parameters

Type	Name	Description
int	columnIndex	The index of the column to get the string from.
bool	onlyPartsOfSpeechSequences	Whether to only include parts of speech in the output.

Returns

Type	Description
string[]	Array of string representations of the parse tree.

| Edit this page View Source

GetPOSRulesInEachDerivation(int)

Extracts all part-of-speech rules used in each possible parse derivation.

Declaration

public List<HashSet<Rule>> GetPOSRulesInEachDerivation(int i)

Parameters

Type	Name	Description
int	i	The derivation index to analyze (when there are multiple parses)

Returns

Type	Description
List<HashSet<Rule>>	A list of rule sets, where each set contains the POS rules for one derivation

| Edit this page View Source

GetPossibleSyntacticCategoriesForToken(string)

Gets the possible syntactic categories for a given token.

Declaration

protected virtual HashSet<string> GetPossibleSyntacticCategoriesForToken(string nextScannableTerm)

Parameters

Type	Name	Description
string	nextScannableTerm	The token to get categories for.

Returns

Type	Description
HashSet<string>	Set of possible syntactic categories for the token.

| Edit this page View Source

HasDerivation()

Determines whether there exists a valid derivation for the input.

Declaration

public bool HasDerivation()

Returns

Type	Description
bool	True if there is at least one valid derivation; otherwise, false.

| Edit this page View Source

ParseSentence()

The main parsing loop that processes all states in the Earley chart.

Declaration

public (bool, string[]) ParseSentence()

Returns

Type	Description
(bool, string[])	A tuple containing: bool: true if parse was accepted, false if rejected due to state count threshold string[]: array of bracketed string representations of all valid parses

Type

Description

(bool, string[])

A tuple containing:

bool: true if parse was accepted, false if rejected due to state count threshold
string[]: array of bracketed string representations of all valid parses

Remarks

The method traverses the chart in two phases:

First processes completed states (priority queue ordered by decreasing start indices)
Then processes predicted states (queue of non-terminals to predict)

Epsilon transitions may cause looping back to completion phase. Parsing is rejected if the number of completed states in any column exceeds the maximum threshold.

| Edit this page View Source

PrepareEarleyTable(string[], int)

Creates and initializes the Earley parsing table with columns for each position in the input. The first column (index 0) is initialized with an empty token.

Declaration

protected virtual EarleyColumn[] PrepareEarleyTable(string[] text, int maxWord)

Parameters

Type	Name	Description
string[]	text	The input tokens to be parsed
int	maxWord	The maximum length when generating sentences (used in generation mode)

Returns

Type	Description
EarleyColumn[]	An array of EarleyColumn objects representing the parse table