WordSegment API Reference

WordSegment API reference.

wordsegment.clean(text)
Return text lower-cased with non-alphanumeric characters removed.
wordsegment.divide(text)
Yield (prefix, suffix) pairs from text with len(prefix) not exceeding limit.
wordsegment.load()
Load unigram and bigram counts from disk.
wordsegment.score(word, prev=None)
Score a word in the context of the previous word, prev.
wordsegment.isegment(text)
Return iterator of words that is the best segmenation of text.
wordsegment.segment(text)
Return a list of words that is the best segmenation of text.
wordsegment.UNIGRAMS
Mapping of (unigram, count) pairs. Loaded from the file ‘wordsegment/unigrams.txt’.
wordsegment.BIGRAMS
Mapping of (bigram, count) pairs. Bigram keys are joined by a space. Loaded from the file ‘wordsegment/bigrams.txt’.