as Internal Live PR; Evidence and Findings

March 27 update

In Matt Cutts latest post: http://www.mattcutts.com/blog/q-a-thread-march-27-2006/:

Q: Is the RK parameter turned off, or should we expect to see it again? A: I wouldn't expect to see the RK parameter have a non-zero value again.

Q: What's an RK parameter? A: It's a parameter that you could see in a Google toolbar query. Some people outside of Google had speculated that it was live PageRank, that PageRank differed between Bigdaddy and the older infrastructure, etc.

So my assumption that it is now put to zero because they were not supposed to be public seem to be valid.

Well, it was funny as it lasted.

March 25 update

Matt Cutts of Google read this article and also posted a funny comment on this post.

I was very curious and so of course I sent him an e-mail.

His answer:

I'm sorry, I can't shed light on that at this time. :)

Best wishes though, Matt

Matt, hope you are ok that I post this :), anyways I give here my theory:

The RK "Internal Live PR" values were not supposed to be public and thus Google put them now all to 0.

/ Jim

March 24 update

Google has now put all RK values to zero for all URLs.

Either some temporary glitch OR Google didn't like that value to be public ...

Let's wait and see.

/ Jim

Original Article

--------------------------

"PR prediction tools"

In may 2004 the checksum algorithm that Google is using to query a site's PageRank from it's servers were cracked and released here.

This spread across the internet and the code was translated into PHP. People started to write scripts to get the PR value without a toolbar and even though Google changed their checksum algorithm it was shortly cracked again. Then the popular PageRank prediction tools started to emerge from different places.

All these tools does not work as nobody fully understood what it is, not even the tool creaters.

The discovery

In a WebmasterWorld thread of Feb 12, 2006 a member asked what the mysterious figures such as Rank_1:1:6 Rank_1:1:5 Rank_1:1:4 that are displayed in one of the "PR prediction" tools when clicking "check" actually meant.

The discussion starts in the thread and another member (arran) posts that if you drop &features=Rank from the URL you get an XML feed, visible here.

It was also found out that you can add &start=0&num=10 at the end of the URL (protellix) to get a start and finish with more listings and that you could replace Rank in the &features=Rank part of the URL and instead use one of the following features (phish):

CacheSize
Filter
Hostname
Level
Link
Rank
Results
Summary
Title
URL

We realised that this XML feed is in fact a search query of the Google SERPs.

In message 75 of that thread, member Hanu posts a script that has the checksum decoder built in and will get the XML feed including any of the above features. I will not post this tool here as it violates Google's TOS.

Anyway, when using that tool:

uncheck info:
leave Features empty
set Start: to 0 and Num: to 100 and
type any query into Query: or a site: command of your site.
Optionally, put 64.233.179.104 into the Host: field in order to see the BigDaddy results.

That will display the XML feed of the SERP in a standard way with 100 results on a BigDaddy datacenter. Kudus to Hanu for this tool!

Don't use toolbarqueries.google.com as Google is using a DNS-based load balancing and it is like getting a random DC servered to you (same when you check SERPs in Google.com btw), so use the IP number of a DC.

Explaining the SERP XML Feed

Let's look at the above XML example.

This is what the abbrevations mean according to Google XML Tag Definitions, an official Google document:

<gsp VER="3.2"> GSP = "Google Search Protocol", Provides an encapsulation for all data returned in the Google XML search results
<tm> - Total server time to return search results, measured in seconds.
<q> - The search query submitted to the Google search engine to generate these results.
<param> - The input parameters submitted to the Google search engine to generate these results.
<res SN="1" EN="10"> - In simple english the start and finish of the SERP listings for the query, controlled by &start=0&num=10 in the URL.
<m> - The estimated total number of results for the search.
<fi /> - Indicates that document filtering was performed during this search. Note: See the section on Automatic Filtering for more details
<nb> - Navigation data for next and previous results
<pu> - Previous results
<nu> - Next results

<r N="1"> - Provides encapsulation for the details of an individual search result. A SERP listing in other words. N is the rank of the SERP listing, L if it is indented (2) or not (1). Note: Currently this value will always be 1 unless directory crowding occurs. In this case, the second directory result will have a value of 2.. Read that link btw, interesting data on filtering and dup content.
<u> - The URL of the search result.
<ue> - No data found, same as URL.
<t> - The title of the search result.
<rk> - Provides a general rating of the relevance of the search result
<s> - Search result snippet for the search result.
<lang> - Language of the SERP listing.
<has> - Provides encapsulation for any special features supported for this search request. This is in other words the last line of the SERP listing. It has the URL, cache data, cache size, cache ID, supplemental results and related pages.

It is the <rk> here that is interesting. Some recent "PR predictions tools" or "Live PageRank tools" has been using this value in their tools.

By looking at various SEO forums now during the PR update we see that it is those kind of tools that are most liked and accurate, but not 100% accurate - usually 1 value higher and I found the reason.

Observations and testings of RK to see if it is Live PR

By looking at Matt Cutts different blog posts with tools that shows you the PR value of multiple datacenters we see that the PR that is now in the process of being exported to the toolbars across 40+ DCs is dated between feb 4-7, 2006 which is around 20 days ago.

I was then using a tool that shows you the RK values for an URL across 40+ DCs on the more recent blog posts by Cutts and I found some incredible things.

Matt Cutts post from (Feb 17) shows as RK 3 on the DCs that found it and nothing on the rest, see the results from the tool here. A post from feb 15 shows as RK 6 on half DCs and RK 3 on the other half, see it yourself.

This means that the RK values are updated regularly.

And now this:

On those DCs that the RK of the blog post was 6 compared to those it was 3 he ranked higher in the SERPs!

Example 1:

Difference in RK: 6 compared to 3 Name of post: "WSJ on SEO contests" Date blogged: feb 15 Query used: WSJ SEO contest Difference in rank: 3 compared to 5.

Example 2:

Difference in RK: 7 compared to 6 Name of post: "Road trip: Ask Jeeves in Campbell" Date blogged: feb 13 Query used: ask jeeves Albuquerque Difference in rank: 3 compared to 5.

Why older blog posts by Cutts has higher RK? GoogleBot has not yet found the the many RSS feeds and links that are linking to his posts (most probably).

I have also seen the RK values changing by the day and according to the person that made the LivePR tool he has been saying RK values increasing the same day as Google is caching new backlinks on that particular datacenter!

If you do search queries in the tool that Hanu provided you will see that the RK value is static and the same - no matter which query you use.

BidDaddy and non-BigDaddy RK values

The recent PR tools that uses the RK uses the BigDaddy datacenters, and those are usually 1-2 values higher than the toolbar PR. And by using this tool we see that there is a also a different RK value on the BigDaddy datacenters then the rest of the datacenters. Reason for this has to do with the new infrastructure but I don't know what.

I will update here later when I find out.

Difference of toolbar RK and SERP RK

I found something else very interesting.

Look at this article explaining how the PageRank toolbar works. After information is sent to a Google server, data in the form of an XML document with data about the URL is sent back.

There are values and info on a lot of things and guess what the field of the value of the PR is called?

<RK>

The same name as the SERP XML, but in the SERP XML the RK is not the toolbar PR, just has the same name.

Google definition of RK

Let's look at the definition of <rk> from Google.

From their official "Google XML Reference":

"Provides a general rating of the relevance of the search result"

Where does this come from? Seem total wacko, and yes it is a mistake.

First of all it is the XML document for the Google Search Appliance not the general search API.

I found a very interesting document from Google called: "Google's Search Results Protocols", hosted by some guy that mirrors controversial and important documents "that is in danger of censorship".

And there it says:

Definition of RK: "Google's rating of how good a single search result is"

But check this:

In that same document it defines what is a "single search result".

And it says:

"R - A single search result - Contains a U; an optional T; an RK; any number of F's; an optional S; and a HAS"

That is the SERP XML!

Every SERP listing in the XML starts with an <R>.

The old definition of R as per that same docuement is:

"A single search result"

The new definition from Google XML Tag Definitions is:

"Provides encapsulation for the details of an individual search result"

So the guy that wrote the new version of this document now called "Google XML Reference", earlier called "Google's Search Results Protocols" translated RK:

From:

"Google's rating of how good a single search result is"

To:

"Provides a general rating of the relevance of the search result"

Which is total wrong, the person didn't see there was a special definition for "single search result".

And this has caused headaches for SEOs ever since ...

A "single search result" is meant to be a listing in the SERP.

Which means that RK is:

"Google's rating of how good a listing in the SERP is"

Which is: PageRank!

To further prove the point:

Old version:

U - The URL of a single search result T - The title of a single search result RK - Google's rating of how good a single search result is

New version:

U - The URL of the search result T - The title of the search result RK - Provides a general rating of the relevance of the search result

The RK is a static value and has nothing to do with relevance, check yourself.

What does RK stand for?

My theory:

It is Rank. Why?

The RK values shows up on "&features=Rank".

PR is not a Google official abbreviation, it is something SEOs made up. They have PageRank and use the Rank part of the word, simple.

Checking the example above of Cutts blog posts shows that those DCs that has higher RK has a higher Rank as well.

What we now know

There are 3 kind of values.

Toolbar PR

BigDaddy RK values

Non-BigDaddy RK values

If RK is not the internal PR, which I now believe, it must be something that is very close to being it.

What we now can do with this if RK is live internal PR

See the live internal PR instantly for a any URL, there is already a FF plugin with a toolbar for RK here (but a few things on it needs to be fixed).
Track RK values to see what happens to backlinks on an almost daily basis.
See dates when particular DCs are updating.
Make some excellent tools.

And more

Questions remaining

If RK is internal live PR - why no decimals on the numbers? Low or high PR 8 is a huge difference.

Update of this article

I will update this article regularly as soon I find more evidence and findings. Please also comment so I get more info.

The article is discussed on cre8asite forums here.

And the original discovery thread on WMW from message 151, here.

I think this is very interesing.

EDIT: Nice forum post by Matt Cutts :)