WordSegment API Reference¶

WordSegment API reference.

wordsegment.clean(text)¶: Return text lower-cased with non-alphanumeric characters removed.

wordsegment.divide(text)¶: Yield (prefix, suffix) pairs from text with len(prefix) not exceeding limit.

wordsegment.load()¶: Load unigram and bigram counts from disk.

wordsegment.score(word, prev=None)¶: Score a word in the context of the previous word, prev.

wordsegment.isegment(text)¶: Return iterator of words that is the best segmenation of text.

wordsegment.segment(text)¶: Return a list of words that is the best segmenation of text.

wordsegment.UNIGRAMS¶: Mapping of (unigram, count) pairs. Loaded from the file ‘wordsegment/unigrams.txt’.

wordsegment.BIGRAMS¶: Mapping of (bigram, count) pairs. Bigram keys are joined by a space. Loaded from the file ‘wordsegment/bigrams.txt’.