Google hit by major internal documentation leak: Insights, and Secrets Exposed
On May 28, 2024, Google faced a massive leak of internal documentation.
The inner workings of Google’s search engine are among the most secretive and closely guarded black boxes in the world. And… they leaked online due to a developer’s mistake, who confused private and public repositories on GitHub.
For the past 10 years, there have been no reports of leaks of this magnitude and detail from Google’s search division.
C-levels from Sparktoro, iPullRank, and other top SEO firms have showcased leaked documents about how Google’s search ranking works. Well, in reality it’s more complex than that. They leaked various APIs around the search engine, but even from these APIs, much can be inferred. A total of 2500 pages of internal Google Search documentation was made public.
And naturally, the documents contradict what Google has been saying about site promotion in its search engine for years.
This article will be useful for everyone who owns a domain or is involved in website promotion; some very interesting secrets were revealed. Let’s dive in!
What’s the Deal and Can You Trust the Leak?
Numerous checks through various former and current Googlers indicate that this is not a fake, not a joke, but a very real leak, the investigation of which is now of great concern to all SEO researchers.
Such documentation exists in many Google teams, explaining APIs to help familiarize project members with the available data. This leak coincides in appearance with documentation in public GitHub repositories and Google Cloud API documentation, using the same notation style, formatting, and even the names of processes/modules/functions and links.
In short, they leaked instructions for members of the Google search system team.
Apparently, the leak happened from GitHub. Someone accidentally and briefly published documentation in public access, apparently confusing a private Google repository with a public one. While the folder was there, somewhere between March and May 2024, the API documentation got onto Hexdocs and from there, it was downloaded by anyone interested.
Those who are into website optimization and promotion have long known that Google persistently denies using user data for ranking and search quality.
Back in 2012, there were rumors that clicks on links in Google Chrome were more important than any others – because Google could measure such clicks. There was much talk about the search engine using user behavior data collected through Google Chrome and extensions for ranking and promoting sites, and at that time Google officially stated that it did not use them. But despite this, SEO specialists did not believe these statements. As it turned out, they were right.
How can you be sure of the authenticity of the leak? After all, Google could have abandoned some functions, used others exclusively for testing or internal projects, or even made API functions available that were never used.
Obviously, the search engine changes significantly from year to year, and recent features like AI were not highlighted in this leak. However, the documentation contains references to outdated functions and specific notes to others, indicating that they should no longer be used. This suggests that the functions not marked as outdated were still actively used at the time of the leak in March 2024.
Where Google Lied
“We don’t rank sites by authority”
Google claimed they don’t use “domain authority.” What does this mean? The “Domain Authority” metric was created by Moz, and it assesses how authoritative a site is overall, primarily based on citations. The metric is based on various factors such as the number and quality of links leading to the site. Moz developed this metric as a way to measure the likelihood of a site ranking well in search results.
Firstly, when Google says they don’t use “domain authority,” they don’t mean it doesn’t exist at all. They’re just saying they don’t use that specific metric from Moz. And it doesn’t mean that Google doesn’t have a way to assess the authority of sites. They might just not use the same methodology and algorithms as Moz.
Secondly, Google might not assess the authority of a site for a specific topic. That is, they might not measure how important a site is in a particular field of knowledge or for a specific subject.
However, this doesn’t mean that Google doesn’t assess the quality or importance of sites at all. In fact, they have “siteAuthority,” which helps determine how reliable and authoritative a site is. Their metric is based on many factors, like content quality, site structure, and user time on site.
Overall, Google is just playing with words when talking about ways to assess the authority of sites.
“We don’t use clicks for ranking”
Another lie from Google – their claims that they didn’t track clicks for ranking sites.
Recently, the Vice President of Search testified in the AntiTrust trial and revealed interesting things. Specifically, he talked about the ranking systems Glue and NavBoost.
NavBoost is based on clicks. This system analyzes how often users click on certain search results. If a link gets clicked frequently, it might indicate that it’s more relevant and useful to users. Based on the collected data, NavBoost can boost or lower the positions of results in the search output.
Glue, on the other hand, is not related to links but to content. It includes news, images, videos, and everything else. Glue can also analyze behavior but focuses on interaction with content.
Plus, everyone knows that clicks are the best thing to track site promotion. But even then, it’s not entirely clear how to promote?
It’s all due to Google’s evasive answers and, let’s be honest, a lot of complimentary articles about the search engine that just repeat official statements.
Furthermore, the leaked documentation represents users as “votes.” Each user click is considered a vote for the relevance of a page. The more clicks, the higher the likelihood that the page is useful. Naturally, all clicks are distributed by geolocation, devices, and other parameters.
Moreover, the search engine through Navboost and Glue tracks the last time a user successfully found the needed information on a page by clicking on it. Therefore, if a page doesn’t get clicks for a long time, it means it’s outdated and can be downgraded in the search output.
Additionally, not only the number of clicks is considered, but also the time spent on the page after clicking the link. The longer the user stays on the page, the higher the chance they found something useful.
But if almost immediately after clicking the link the user closes the page or clicks “back,” it means the page was not useful.
All this is taken into account during ranking.
In the leaked documentation, NavBoost is mentioned 84 times. There is also evidence that they consider its evaluation at the subdomain, root domain, and URL level, which implies different treatment of different levels of the site.
If Forbes.com/Cats/ doesn’t have clicks, it is marked as “low quality,” and the link is ignored for analytics. But if Forbes.com/Dogs/ has a large volume of clicks, it is marked as “high quality.”
In short, based on the number of clicks the page gets, sites are divided into three categories, for each of which its own “quality rank” is built, and more popular sites by clicks bring a greater contribution to PageRank, i.e., are more valuable.
So yes, Google doesn’t directly talk about click tracking and data collection from Google Chrome, but there are proofs of the opposite. Google uses clicks and post-click behavior in its ranking algorithms.
“We don’t boost sites”
Another thing that came out is that there are lists of sites in Google search that are forcibly optimized. Well, traffic is forcibly driven to them, they are at the top of the search. Known only in some topics, such as elections in the states in 2020 or COVID, but there are clearly other topics.
Covid was a very controversial topic, and we still do not know exactly if vaccination did more good or bad.
But what do people do when they need more info? They… Google it! And if Google shows all sites, where there are opinions of both sides, that is, both “right” and “wrong,” it will provoke tension. So they go the easy wat and limit one of the sides in the search results.
Whether it’s right or not is more of an ethical question.
“There is no Sandbox”
For years, Google has been saying that the sandbox, into which sites fall by age or lack of signs of trust, does not exist. In case you don’t know, the sandbox is a filter that discards young, newly created sites. Such a filter allows discarding “junk” sites that are created purely for promotion or as intermediaries.
But, good sites also fall into this filter. Conditionally, someone made a one-page site unrelated to the main site for a sales funnel hoping to increase coverage. And here, bam, it doesn’t even show up in the search. How can this be, and what to do, Google said they don’t have a garbage dump! But it turns out they do.
“We don’t do manual work”
And the most interesting: data from EWOK is used directly in the search. This is a system where live people sit and evaluate, for money, which search result is better.
Apparently, there are users who, with their own eyes and opinions, determine which of several sites is better for a particular query.
Therefore, it is important not to underestimate how important it is for quality assessors to perceive and evaluate your sites well. Just like impressing a concierge. Either you go on your business, or you stand in the entrance waiting for someone from the 12th floor to come down to you.
And another revelation from the leak: Google considers the brand size of a site, not only based on the site itself but in general on the mention of this site on the internet. Even without links. In principle, this was obvious.
How to Attract Organic Traffic Now
So, what to do now?
If someone asks you: how to attract traffic to your site? Answer boldly, you won’t go wrong: you need to create a noticeable, popular, well-recognizable brand outside of Google.
The EEAT algorithm is actually not as important as SEO specialists think. As it turned out, the only mention where this EEAT is needed is in reviews on Google Maps. Is it relevant? Find at least ten friends who use Google Maps to find a website.
The real aspects of E-E-A-T, not Google’s claims, are either hidden, indirect, or so deeply buried that they have little to do with specific elements of ranking and promotion systems.
When you have a brand, Google starts to perceive you as a separate “entity,” not just content or a collection of links in one place. And then all the SEO optimization benefits open up to you. In case you didn’t know, Google defines an entity as “a thing or concept that is singular, unique, well-defined, and distinguishable.”
Building influence as a content author can indeed lead to a higher ranking and rise higher in search. And a little browsing through the search engine shows that there are many powerful brands that rank very well, but they don’t engage in these rat races of boosting EEAT metrics. Content and links actually take a backseat when there is a clear user intent to find something. Suppose, many people in the center of NY search for “Joe’s Burgers” and scroll through one, two, three pages of results until they find the café with Joe’s Burgers, and then go to this site. Pretty quickly, the search engine will understand that this is exactly what people want with this query in this area.
Even if the first search result is a Wikipedia article about the first every Joe who created his first ever burger, it may generate a lot of clicks and views, but it is unlikely to surpass the signals of user intent wanting a burger in the center of Moscow.
If you extend this example to the broader network and search in general, then if you can create demand for your site among enough probable searchers in the regions you are targeting, you can bypass the need for classic SEO methods like links, anchor texts, optimized content, and so on.
Navboost and user intent in a specific area are likely now the most powerful ranking factors in the Google system.
Classic ranking factors, however, have not gone anywhere. Though PageRank, anchors, and text matching have been losing their significance for many years, page titles are still very important. The document leak hints that many versions of PageRank have been created and canceled over the years.
In general, anyone who wants to engage in SEO optimization will likely face either low profits, weak traffic, or maybe even work at a loss. You need to build authority in the search engine, citation, navigational demand, and strong reputation among the audience. In short, drive leads to your site by all means, right and wrong.
Conclusion
Well, to dispel any doubts, Google confirmed the leak. Undoubtedly, this is the most significant leak about Google search in recent years. It turns out that Google often lied in its recommendations, statements regarding the search engine. It’s time to stop believing that content is king, clickbaits, and bot farms are the way to success in SEO.
What to do now? We’ve already figured it out: build your brand, gain authority, be a quotable source, and a search target in specific areas. What’s your SEO strategy? Will you change it now?
We hope this article was helpful to you.
READ MORE: