For logged in premium users, they have access to note search.
An example query:
https://www.documentcloud.org/documents/?q=Higganum+complaint
The first few documents are rated so highly not because they are super relevant in content, but because they have a LOT of notes with the word complaint in them. Did testing in the Django shell with my user (which has note search enabled) and those without and the resulting scores for the top 5 documents Solr returns are as follows:
WITHOUT notes:
id: 2704406 score: 51.575443
id: 3535180 score: 45.707706
id: 336686 score: 44.942238
id: 26948698 score: 41.152092
id: 6953986 score: 41.038723
WITH notes:
id: 321087 score: 397.05875
id: 3034115 score: 186.30943
id: 28167531 score: 147.98016
id: 23789873 score: 147.8687
id: 1369433 score: 132.84674
When looking deeper into the explanation:
doc 321087 has 106 notes matching 'complaint' in title:
note: 47030 title: Telephone-related complaint score: 3.3627408
note: 47031 title: Telephone-related complaint score: 3.3627408
note: 47032 title: Telephone-related complaint score: 3.3627408
note: 47033 title: Telephone-related complaint score: 3.3627408
note: 47036 title: Telephone-related complaint score: 3.3627408
We should add some kind of diminishing return mechanism here so the first few note hits impact score, but the returns diminish with too many
For logged in premium users, they have access to note search.
An example query:
https://www.documentcloud.org/documents/?q=Higganum+complaint
The first few documents are rated so highly not because they are super relevant in content, but because they have a LOT of notes with the word complaint in them. Did testing in the Django shell with my user (which has note search enabled) and those without and the resulting scores for the top 5 documents Solr returns are as follows:
WITHOUT notes:
id: 2704406 score: 51.575443
id: 3535180 score: 45.707706
id: 336686 score: 44.942238
id: 26948698 score: 41.152092
id: 6953986 score: 41.038723
WITH notes:
id: 321087 score: 397.05875
id: 3034115 score: 186.30943
id: 28167531 score: 147.98016
id: 23789873 score: 147.8687
id: 1369433 score: 132.84674
When looking deeper into the explanation:
doc 321087 has 106 notes matching 'complaint' in title:
note: 47030 title: Telephone-related complaint score: 3.3627408
note: 47031 title: Telephone-related complaint score: 3.3627408
note: 47032 title: Telephone-related complaint score: 3.3627408
note: 47033 title: Telephone-related complaint score: 3.3627408
note: 47036 title: Telephone-related complaint score: 3.3627408
We should add some kind of diminishing return mechanism here so the first few note hits impact score, but the returns diminish with too many