Title: | A Stemming Algorithm for the Portuguese Language |
---|---|
Description: | Implements the "Stemming Algorithm for the Portuguese Language" <DOI:10.1109/SPIRE.2001.10024>. |
Authors: | Daniel Falbel [aut, cre] |
Maintainer: | Daniel Falbel <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.0 |
Built: | 2024-11-13 04:13:29 UTC |
Source: | https://github.com/dfalbel/rslp |
Apply rules
apply_rules(word, name, steprules)
apply_rules(word, name, steprules)
word |
word to which you want to apply the rules |
name |
the rule name, possible values are: 'Plural', 'Feminine', 'Adverb', 'Augmentative', 'Noun', 'Verb', 'Vowel' . |
steprules |
steprules as obtained from the function extract_rules. |
Separate the seven kinds of rules
extract_raw_rules(raw_rules)
extract_raw_rules(raw_rules)
raw_rules |
a charcter with the raw rules. |
Parses the the raw replacement rules.
extract_replacement_rules(raw_repl)
extract_replacement_rules(raw_repl)
raw_repl |
the part with replacement rules for each step rule. |
Extract all info for one rule
extract_rule_info(rule)
extract_rule_info(rule)
rule |
the rule you want to extract infos |
This function parse the rules that are disponible in the RLSP package disponible in the RSLP C source. This file has been downloaded and is installed with the package. It's path can be found using system.file("steprules.txt", package = "rslp") A parsed version is saved is also installed with the package and its path can be found using system.file("steprules.rds", package = "rslp").
extract_rules(path = system.file("steprules.txt", package = "rslp"))
extract_rules(path = system.file("steprules.txt", package = "rslp"))
path |
path to the raw steprules. Most of the times you don't have to change it. |
Extract all info from all rules
extract_rules_info(rules)
extract_rules_info(rules)
rules |
rules parsed before by extract_rule_info |
A wrappper for stringi package.
remove_accents(s)
remove_accents(s)
s |
the string you want to remove accents |
Apply the Stemming Algorithm for the Portuguese Language to vector of words.
rslp( words, steprules = readRDS(system.file("steprules.rds", package = "rslp")) )
rslp( words, steprules = readRDS(system.file("steprules.rds", package = "rslp")) )
words |
vector of words that you want to stem. |
steprules |
as obtained from the function extract_rules. (only define if you are certain about it). The default is to get the parsed versionof the rules installed with the package. |
V. Orengo, C. Huyck, "A Stemming Algorithmm for the Portuguese Language", SPIRE, 2001, String Processing and Information Retrieval, International Symposium on, String Processing and Information Retrieval, International Symposium on 2001, pp. 0186, doi:10.1109/SPIRE.2001.10024
words <- c("gostou", "gosto", "gostaram") rslp(words)
words <- c("gostou", "gosto", "gostaram") rslp(words)
Apply the Stemming Algorithm for the Portuguese Language to a word.
rslp_( word, steprules = readRDS(system.file("steprules.rds", package = "rslp")) )
rslp_( word, steprules = readRDS(system.file("steprules.rds", package = "rslp")) )
word |
word to be stemmed. |
steprules |
as obtained from the function extract_rules. |
Apply the Stemming Algorithm for the Portuguese Language to vector of documents. It extracts words using the regex "\b[:alpha:]\b"
rslp_doc( docs, steprules = readRDS(system.file("steprules.rds", package = "rslp")) )
rslp_doc( docs, steprules = readRDS(system.file("steprules.rds", package = "rslp")) )
docs |
chr vector of documents |
steprules |
as obtained from the function extract_rules. (only define if you are certain about it). The default is to get the parsed version of the rules installed with the package. |
V. Orengo, C. Huyck, "A Stemming Algorithmm for the Portuguese Language", SPIRE, 2001, String Processing and Information Retrieval, International Symposium on, String Processing and Information Retrieval, International Symposium on 2001, pp. 0186, doi:10.1109/SPIRE.2001.10024
docs <- c("coma frutas pois elas fazem bem para.") rslp_doc(docs)
docs <- c("coma frutas pois elas fazem bem para.") rslp_doc(docs)
Given a list of suffixes, returns a vector of true or false indicating if the word has each one of the suffixes.
verify_sufix(word, rep_rules)
verify_sufix(word, rep_rules)
word |
word you which to verify replacement rules |
rep_rules |
data.frame of rules as specified in steprules$replacement_rule |