Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think you'd probably get the best result with both. There's definitely real merit to a keyword-understanding of which terms appear in the title or as a named entity for example.


Usually you would want to have weighted n-grams with 1-grams having lowest weights. In many cases it's better to have zeros. For English, 4-grams react on generic phrases/idioms, 5+ are way too selective and just keywords usually reduce relevance too much. 2+3 are the best.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: