mirror of https://github.com/ogoun/Zero.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Ogoun
e965b22304
|
9 months ago | |
---|---|---|
.. | ||
ILexProvider.cs | 5 years ago | |
ILexer.cs | 6 years ago | |
Languages.cs | 6 years ago | |
LexToken.cs | 9 months ago | |
README.txt | 6 years ago | |
WordToken.cs | 6 years ago |
README.txt
The implementation of the basis for semantic work with the text.
LexProvider - implements the selection of tokens from the text, where a token is any coercion of a word.
For example, a token can be directly the word itself, a system, a lemma.
Two factories were created as an implementation:
SnowbolLexProviderFactory - returns providers based on stemming 'Snowball'
JustWordLexProviderFactory - returns a provider that takes the word itself for the token, no change (lower case)
To implement your own provider, you need to create a class based on the ILexer interface and implement the Lex method,
in which the necessary normalization of the word in the necessary semantic context will be carried out.
For example:
public class LemmaLexer: ILexer
{
public string Lex (string word) {return Lemmatizer.Lemma (word); }
}
Then you can create a provider based on it:
var provider = new LexProvider (new LemmaLexer ());