Phrase queries

  • 片語查詢(Phrase queries)主要有兩種
  1. 雙字索引(Biword indexex)
  2. 位置索引(Postitional indexex)

雙字索引(Biword indexex)

缺點:字典變大
優點:節省搜尋時間,兩字一組
①去除死字
②只對名詞做biword

ex: cost overruns on a power plant
=> Biword (cost overruns) (overruns power) (power plant)

位置索引(Postitional indexex)

< be: 993427
1: 7,18,72,86,231;
2: 2,149;
4:17,191,291,430,434;5: 363,367; ...>

ex: to be or not to be 可能出現在Doc4、5,因為片語中兩個be相差3個距離
故位置索引可做接近查詢。

Exercise2-3:

Assume a biword index. Give an exmple of a document which will be returned for a query of New York University but is actually a false positive which should not be returned.

solution:
New York University使用Biword
Ans: biword(New York)AND(York University)

Exercise2-4:

使用位置索引查詢下列片語可能在哪幾個文件當中

(1): fool rush in

solution:
Ans: Doc 2、4、7

(2): "fool rush in" AND "angles fear to tread"

solution:
Ans: Doc 4

Exercise2-5:

Are the following statments true or flse?

a. In a Boolean retrieval system, stemming never lowers precision.
b. In a Boolean retrieval system, stemming never lowers recall.
c. Stemming increases the size of the vocabulary.
d. Stemming should be invoked at indexing time but not while
processing a query.

solution:
Ans: a. False b. True c. False d. False

results matching ""

    No results matching ""