Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Document ¶ added in v0.1.1
type Document interface {
Summarize(length int, threshold float64, focus string) ([]*Sentence, error)
Highlight(length int, merge bool) ([]*Keyword, error)
Characters() (int, int)
}
A Document represents a given text, and is responsible for handling the summarization and keyword extraction process.
type Highlighter ¶ added in v0.1.1
type Highlighter interface {
Initialize(tokens []*Token, filter TokenFilter, window int)
Rank(iters int)
Highlight(length int, merge bool) ([]*Keyword, error)
}
A Highlighter is responsible for extracting key words from a document.
type Keyword ¶ added in v0.2.0
A Keyword is the keyword belonging to a highlighted document. A Keyword contains the raw word, and its associated weight.
type Parser ¶ added in v0.1.1
A Parser is responsible for parsing and tokenizing a document into strings and words. A Parser also performs additional tasks such as POS-tagging and sentiment analysis.
type Sentence ¶ added in v0.1.1
type Sentence struct {
Raw string // Raw sentence string.
Tokens []*Token // Tokenized sentence.
Sentiment float64 // Sentiment score.
Score float64 // Score (weight) of the sentence.
Bias float64 // Bias assigned to the sentence for ranking.
Order int // The sentence's order in the text.
}
A Sentence represents an individual sentence within the text.
type Similarity ¶ added in v0.1.1
type Similarity func(n1, n2 []*Token, filter TokenFilter) float64
A Similarity computes the similarity of two sentences after applying the token filter.
type Summarizer ¶ added in v0.1.1
type Summarizer interface {
Initialize(sents []*Sentence, similar Similarity, filter TokenFilter,
focusString *Sentence, threshold float64)
Rank(iters int)
}
A Summarizer is responsible for extracting key sentences from a document.
type Token ¶ added in v0.1.1
type Token struct {
Tag string // The token's part-of-speech tag.
Text string // The token's actual content.
Order int // The token's order in the text.
}
A Token represents an individual token of text such as a word or punctuation symbol.
type TokenFilter ¶ added in v0.1.1
A TokenFilter represents a (black/white) filter applied to tokens before similarity calculations.
Directories
¶
| Path | Synopsis |
|---|---|
|
internal
|
|
|
prose
Package prose is a repository of packages related to text processing, including tokenization, part-of-speech tagging, and named-entity extraction.
|
Package prose is a repository of packages related to text processing, including tokenization, part-of-speech tagging, and named-entity extraction. |