Class Vocabulary
Manages the vocabulary and part-of-speech relationships for the parser.
Inherited Members
Namespace: EarleyParser
Assembly: EarleyParser.dll
Syntax
public class Vocabulary
Remarks
This class maintains mappings between words and their possible parts of speech (POS). The corresponding production rules (PART-OF-SPEECH -> 'token') are kept separate from the Grammar class for clarity and functionality. The grammar class allows lexicalized rules (e.g., A -> 'John', B -> 'John' 'left', C -> 'John' D).
The EarleyParser uses this vocabulary to create relevant Earley Items [PART-OF-SPEECH -> 'token', i, i] in a pre-processing step, according to the input sentence.
Example vocabulary can be found in Vocabulary.json
Constructors
| Edit this page View SourceVocabulary()
Initializes a new instance of the Vocabulary class with empty dictionaries.
Declaration
public Vocabulary()
Vocabulary(List<Rule>)
Initializes a new instance of the Vocabulary class from a list of part-of-speech rules.
Declaration
public Vocabulary(List<Rule> POSRules)
Parameters
| Type | Name | Description |
|---|---|---|
| List<Rule> | POSRules | List of rules defining part-of-speech assignments. |
Properties
| Edit this page View Sourcethis[string]
Gets the possible parts of speech for a given word.
Declaration
[JsonIgnore]
public HashSet<string> this[string word] { get; }
Parameters
| Type | Name | Description |
|---|---|---|
| string | word | The word to look up. |
Property Value
| Type | Description |
|---|---|
| HashSet<string> | A set of possible parts of speech for the word, or null if the word is not found. |
POSWithPossibleWords
Gets or sets the dictionary mapping parts of speech to their possible words.
Declaration
public Dictionary<string, HashSet<string>> POSWithPossibleWords { get; set; }
Property Value
| Type | Description |
|---|---|
| Dictionary<string, HashSet<string>> |
WordWithPossiblePOS
Gets or sets the dictionary mapping words to their possible parts of speech.
Declaration
[JsonIgnore]
public Dictionary<string, HashSet<string>> WordWithPossiblePOS { get; set; }
Property Value
| Type | Description |
|---|---|
| Dictionary<string, HashSet<string>> |
Methods
| Edit this page View SourceContainsPOS(string)
Determines whether a given part of speech exists in the vocabulary.
Declaration
public bool ContainsPOS(string pos)
Parameters
| Type | Name | Description |
|---|---|---|
| string | pos | The part of speech to check. |
Returns
| Type | Description |
|---|---|
| bool | True if the part of speech exists; otherwise, false. |
Disambiguate()
Removes ambiguous words from the vocabulary.
Declaration
public void Disambiguate()
Remarks
A word is considered ambiguous if it has more than one possible part of speech. This method removes such words from both WordWithPossiblePOS and POSWithPossibleWords.
GetBigramsOfData(string[][])
Extracts all possible part-of-speech bigrams from the given data.
Declaration
public HashSet<(string rhs1, string rhs2)> GetBigramsOfData(string[][] data)
Parameters
| Type | Name | Description |
|---|---|---|
| string[][] | data | Array of word sequences to analyze. |
Returns
| Type | Description |
|---|---|
| HashSet<(string rhs1, string rhs2)> | A set of tuples containing all possible part-of-speech pairs that could appear consecutively. |
ReadVocabularyFromFile(string)
Reads a vocabulary from a JSON file.
Declaration
public static Vocabulary ReadVocabularyFromFile(string jsonFileName)
Parameters
| Type | Name | Description |
|---|---|---|
| string | jsonFileName | The name of the JSON file containing the vocabulary. |
Returns
| Type | Description |
|---|---|
| Vocabulary | A new Vocabulary instance populated with the data from the file. |