By Gerard Salton
Offers a concept of indexing able to rating index phrases, or topic identifiers in reducing order of value. This ends up in the alternative of excellent record representations, and likewise debts for the function of words and of glossary sessions within the indexing approach.
This examine is general of theoretical paintings in automated details association and retrieval, in that techniques are used from arithmetic, computing device technology, and linguistics. an entire conception of info retrieval may perhaps emerge from a suitable mix of those 3 disciplines.
Read Online or Download A Theory of Indexing PDF
Similar probability books
Well-written creation covers likelihood conception from or extra random variables, reliability of such multivariable constructions, idea of random functionality, Monte Carlo tools for difficulties incapable of tangible answer, extra.
This publication examines statistical types for frequency information. the first concentration is on log-linear versions for contingency tables,but during this moment edition,greater emphasis has been put on logistic regression. subject matters equivalent to logistic discrimination and generalized linear versions also are explored. The therapy is designed for college students with past wisdom of study of variance and regression.
Curiosity within the temporal fluctuations of organic populations might be traced to the sunrise of civilization. How can arithmetic be used to achieve an knowing of inhabitants dynamics? This monograph introduces the speculation of established inhabitants dynamics and its functions, targeting the asymptotic dynamics of deterministic versions.
- Credit risk: modeling, valuation, and hedging
- Nonequilibrium phenomena 2: from stochastics to hydrodynamics
- Stochastic Methods for Flow in Porous Media: Coping with Uncertainties
- Epistemology and Probability: Bohr, Heisenberg, Schrödinger, and the Nature of Quantum-Theoretical Thinking
- Applied Stochastic Control of Jump Diffusions
- Weak Convergence of Measures
Additional info for A Theory of Indexing
11 is in fact an accurate representation of the indexing value of the terrns it must be possible to improve the retrieval performance by breaking up terms with negative discrimination value in such a way that lower frequency terms are produced from higher frequency components, with correspondingly better discrimination values , . Specifically, if the high frequency nondiscriminators are taken in groups, and "phrases" are formed for cooccurring sets of nondiscriminators, the phrases will obviously exhibit lower document frequencies than the original components.
For practical purposes, the average discriminators are terms that occur with a term frequency of 1 in relatively few documents in a collection. The poor discriminators, finally, have high document frequency, and collection frequencies two or three times the size of the document frequency. The number of documents in which these terms occur with low frequency is very large, which of course accounts for their low discrimination values. Whereas no clear correlation was found to exist between the S/N ratings and the document or collection frequencies of the corresponding terms, a direct relation appears to exist for the discrimination value rankings.
7573 TF Standard term frequency weighting (word stem run). PT + SPT Use pairs and triples derived from nondiscriminators plus singles, pairs and triples obtained from discriminators. TF • IDF Use a term weight consisting of term frequency multiplied by the inverse document frequency. G. SALTON 50 TABLE 22 Statistical significance output for selected runs of Table 21 (probability that run B is significantly better than run A, except where A > B indicates that test is made in reverse direction) r-test A.