Found 1333 bookmarks
Newest
How RSS Brain Shows Related Articles | Bin Wang - My Personal Blog
How RSS Brain Shows Related Articles | Bin Wang - My Personal Blog
The score is computed by tf-idf, which is basically a score considers both the frequent of the term in this article, and the frequent in all the articles: the more frequent it is in this article, the bigger the score since it can better represent the article. However the more frequent it is in all the articles, the score should be smaller since it’s not unique enough to represent the feature of this article. Once we have a term vector for each of the article, we can find the similarity by counting the distance between these vectors. So we have the algorithm, how RSS Brain implements it in the code? We are using ElasticSearch under the hood. It’s widely used and open source. So for the APIs that RSS Brain is using, you can check the code for implementation details if you want. It has an API to find the term vector for an article, and we use the scores in the term vector to do a boosting query, which means do a search with different weights for each query term. For example, in the example above, if we want to find related articles for article A, we would find its term vector first, then convert the term vector into a boosting query that searches related articles by the query apple^0.5 boy^0.5 cat^1.
·binwang.me·
How RSS Brain Shows Related Articles | Bin Wang - My Personal Blog