WordSegment API Reference¶
WordSegment API reference.
-
wordsegment.
clean
(text)¶ - Return text lower-cased with non-alphanumeric characters removed.
-
wordsegment.
divide
(text)¶ - Yield (prefix, suffix) pairs from text with len(prefix) not exceeding limit.
-
wordsegment.
load
()¶ - Load unigram and bigram counts from disk.
-
wordsegment.
score
(word, prev=None)¶ - Score a word in the context of the previous word, prev.
-
wordsegment.
isegment
(text)¶ - Return iterator of words that is the best segmenation of text.
-
wordsegment.
segment
(text)¶ - Return a list of words that is the best segmenation of text.
-
wordsegment.
UNIGRAMS
¶ - Mapping of (unigram, count) pairs. Loaded from the file ‘wordsegment/unigrams.txt’.
-
wordsegment.
BIGRAMS
¶ - Mapping of (bigram, count) pairs. Bigram keys are joined by a space. Loaded from the file ‘wordsegment/bigrams.txt’.