//John Mueller of Google discusses Algo TF-IDF by @martinibuster
1555792399 john mueller of google discusses algo tf idf by martinibuster 760x490 - John Mueller of Google discusses Algo TF-IDF by @martinibuster

John Mueller of Google discusses Algo TF-IDF by @martinibuster



john mueller of google discusses algo tf idf by martinibuster - John Mueller of Google discusses Algo TF-IDF by @martinibuster

john mueller of google discusses algo tf idf by martinibuster - John Mueller of Google discusses Algo TF-IDF by @martinibuster & # 39;);

h3_html = & # 39;

& # 39; + cat_head_params.sponsor.headline + & # 39; & # 39;

& nbsp;

cta = & # 39; & # 39; +
atext = & # 39;

& # 39; + cat_head_params.sponsor_text +

& # 39 ;;
scdetails = scheader.getElementsByClassName (& # 39; scdetails & # 39;);
sappendHtml (scdetails [0] h3_html);
sappendHtml (scdetails [0] atext);
sappendHtml (scdetails [0] cta);
// logo
sappendHtml (scheader, "http://www.searchenginejournal.com/");
sc_logo = scheader.getElementsByClassName (& # 39; sc-logo & # 39;);
logo_html = & # 39; - John Mueller of Google discusses Algo TF-IDF by @martinibuster & # 39 ;;
sappendHtml (sc_logo [0] logo_html);

sappendHtml (scheader, & # 39;


& # 39;)

if ("undefined"! = typeof __gaTracker) {
__gaTracker ('create', 'AU-1465708-12', 'auto', 'tkTracker');
__gaTracker ("tkTracker.set", "dimension1", window.location.href);
__gaTracker ('tkTracker.set', 'dimension2', 'seo');
__gaTracker ("tkTracker.set", "contentGroup1", & # 39; seo & # 39;);
__gaTracker ('tkTracker.send', 'hitType': 'pageview', page: cat_head_params.logo_url, & title> #:; Cat_head_params.sponsor.headline, & # 39; sessionControl & # 39 ;: & # 39;
slinks = scheader.getElementsByTagName ("a");
sadd_event (slinks, click & # 39 ;, spons_track);
} // endif cat_head_params.sponsor_logo

John Mueller from Google explained the role of TF-IDF in Google's algorithm. He discussed what it was and proposed a better way to optimize the ranking of web pages.

What is the TF-IDF?

Wikipedia has a concise definition of what TF-IDF is:


"… tf – idf or TFIDF, abbreviated frequency-inverse-frequency document, is a numerical statistic intended to reflect the importance of a word for a document from a collection … The value of TF-IDF increases proportionally to the number of times a word appears in the document and is offset by the number of documents in the corpus containing the word, which allows to take into account the fact that some words appear more often in general. "

The key It is important to focus on the fact that TF-IDF is a metric linked to the whole" collection "or" corpus. "This means that all web pages containing a specific word or phrase.In the case of a web search, this means that the metric depends on how often the word or phrase appears in every existing web page online.

This part about "some words appear more often in general" is about how the TF-IDF is used to capture and delete commonly used words (and, a, le). important to suppress common words (like, a, and the) for purposes of ranking.

TF-IDF is used to create statistical averages of the use of words and phrases on the Web. This is not the magic content solution suggested by some people. 459010]

Here is the question.

«What do you think about TF-IDF keywords? Does Google use a similar mechanism?

Should we use it to improve our content?

John Mueller replied:

"… The TF-IDF keywords are essentially a metric used in the search for information. "

This reference to" information search "is a reference to the general field of information retrieval. This includes the science of research in the GMAIL Inbox. The search for information is a somewhat ambiguous term.

Then he says this:

"To try to understand what the relevant words are on a page, we use a ton of different techniques to search for a particular page. ;information. And there are tons of these metrics that have appeared over the years. "

This is an allusion to the fact that focusing on an old metric useful for finding" empty words "is not helpful, as many other techniques are used.

] TF-IDF and Ranking in Google

"… My general recommendation here is not to focus on this type of artificial metrics … because that's where you can not duplicate this measure directly because it is based on the general index of all content on the Web.

So it's not that you can say that it's good, it's what I have do, because you do not do it In general, this metric really has this metric. "

This means that it is impossible to calculate the TF-IDF metric because it is based on statistics of the whole Web.

John Mueller Recommendations for a ranking

John Mueller then described a better al It is better to focus on TF-IDF:

"I highly recommend you to focus on your site Web and its users and make sure that what you offer will always be what Google will provide. recognize and continue to use as something precious.

Mueller revealed that it was a very old metric, implying that the search for modern information has become more sophisticated:

"Another thing is … it's a pretty old metric and things have evolved a lot over the years. … There are also many other parameters.

He then said that focusing on users was a better approach because it was immune to changes. Google strives to provide the most useful search results. If you focus on useful content, the page will probably remain popular and displayed on Google.

This is what Mueller said

"If you blindly focus on a type of theoretical theory and try to insert those words into your pages, I do not Do not think this is a useful thing.

I think it's a short-sighted thought, because you're focusing solely on a search engine in which those words have a stronger effect.

So do not just focus on the artificial addition of keywords, make sure that all the new algorithms continue to look at your pages and tell you that it's really great. We should show it more prominently in the search results. "


One of the main uses of TF-IDF is to search for blank words like a, the and and.C is an old and basic content metricIt exists from ombrous metrics of content that are better than the basic and simple metric of TF-IDF

In a world where AI, neural networks and machine learning are the norm, TF-IDF is like a child bike on training wheels compared to a Ferrari.

Mueller spoke of its use to eliminate empty words (words such as et, le, etc.). This seems an appropriate use for such an old technology. Such a base could very well be limited to contributing to the simple task of identifying empty words.

We can not be sure, but the fact that Mueller mentioned TF-IDF in the context of empty words

Look at Google's Hangout for Webmasters here.

Screenshots by the author, Modified by the author