Show / Hide Table of Contents

Class Vocabulary

Manages the vocabulary and part-of-speech relationships for the parser.

Inheritance
object
Vocabulary
Inherited Members
object.Equals(object)
object.Equals(object, object)
object.GetHashCode()
object.GetType()
object.MemberwiseClone()
object.ReferenceEquals(object, object)
object.ToString()
Namespace: EarleyParser
Assembly: EarleyParser.dll
Syntax
public class Vocabulary
Remarks

This class maintains mappings between words and their possible parts of speech (POS). The corresponding production rules (PART-OF-SPEECH -> 'token') are kept separate from the Grammar class for clarity and functionality. The grammar class allows lexicalized rules (e.g., A -> 'John', B -> 'John' 'left', C -> 'John' D).

The EarleyParser uses this vocabulary to create relevant Earley Items [PART-OF-SPEECH -> 'token', i, i] in a pre-processing step, according to the input sentence.

Example vocabulary can be found in Vocabulary.json

Constructors

| Edit this page View Source

Vocabulary()

Initializes a new instance of the Vocabulary class with empty dictionaries.

Declaration
public Vocabulary()
| Edit this page View Source

Vocabulary(List<Rule>)

Initializes a new instance of the Vocabulary class from a list of part-of-speech rules.

Declaration
public Vocabulary(List<Rule> POSRules)
Parameters
Type Name Description
List<Rule> POSRules

List of rules defining part-of-speech assignments.

Properties

| Edit this page View Source

this[string]

Gets the possible parts of speech for a given word.

Declaration
[JsonIgnore]
public HashSet<string> this[string word] { get; }
Parameters
Type Name Description
string word

The word to look up.

Property Value
Type Description
HashSet<string>

A set of possible parts of speech for the word, or null if the word is not found.

| Edit this page View Source

POSWithPossibleWords

Gets or sets the dictionary mapping parts of speech to their possible words.

Declaration
public Dictionary<string, HashSet<string>> POSWithPossibleWords { get; set; }
Property Value
Type Description
Dictionary<string, HashSet<string>>
| Edit this page View Source

WordWithPossiblePOS

Gets or sets the dictionary mapping words to their possible parts of speech.

Declaration
[JsonIgnore]
public Dictionary<string, HashSet<string>> WordWithPossiblePOS { get; set; }
Property Value
Type Description
Dictionary<string, HashSet<string>>

Methods

| Edit this page View Source

ContainsPOS(string)

Determines whether a given part of speech exists in the vocabulary.

Declaration
public bool ContainsPOS(string pos)
Parameters
Type Name Description
string pos

The part of speech to check.

Returns
Type Description
bool

True if the part of speech exists; otherwise, false.

| Edit this page View Source

Disambiguate()

Removes ambiguous words from the vocabulary.

Declaration
public void Disambiguate()
Remarks

A word is considered ambiguous if it has more than one possible part of speech. This method removes such words from both WordWithPossiblePOS and POSWithPossibleWords.

| Edit this page View Source

GetBigramsOfData(string[][])

Extracts all possible part-of-speech bigrams from the given data.

Declaration
public HashSet<(string rhs1, string rhs2)> GetBigramsOfData(string[][] data)
Parameters
Type Name Description
string[][] data

Array of word sequences to analyze.

Returns
Type Description
HashSet<(string rhs1, string rhs2)>

A set of tuples containing all possible part-of-speech pairs that could appear consecutively.

| Edit this page View Source

ReadVocabularyFromFile(string)

Reads a vocabulary from a JSON file.

Declaration
public static Vocabulary ReadVocabularyFromFile(string jsonFileName)
Parameters
Type Name Description
string jsonFileName

The name of the JSON file containing the vocabulary.

Returns
Type Description
Vocabulary

A new Vocabulary instance populated with the data from the file.

  • Edit this page
  • View Source
In this article
Back to top Generated by DocFX