Matt Cutts on BigDaddy, linking, crawling etc

just recently wrote a major post, . In it he explains a great deal about the new BigDaddy infrastructure, new rules for crawling and how relevancy in regards to links now has become a lot more important.

I will quote below what I personally find very interesting from his post and his comments.

- After looking at the example sites, I could tell the issue in a few minutes. The sites that fit “no pages in Bigdaddy” criteria were sites where our algorithms had very low trust in the inlinks or the outlinks of that site. Examples that might cause that include excessive reciprocal links, linking to spammy neighborhoods on the web, or link buying/selling. The Bigdaddy update is independent of our supplemental results, so when Bigdaddy didn’t select pages from a site, that would expose more supplemental results for a site.

Considering the amount of code that changed, I consider Bigdaddy pretty successful in that I only saw two complaints. The first was one that I mentioned, where we didn’t index pages from sites with less trusted links, and we responded and started indexing more pages from those sites pretty quickly.

- Okay, let’s check one from May 11th. The owner sent only a url, with no text or explanation at all, but’s let’s tackle it. This is also a real estate site, this time about a Eastern European country. I see 387 pages indexed currently. Aha, checking out the bottom of the page, I see this:

[Image of low quality footer links]

Linking to a free ringtones site, an SEO contest, and an Omega 3 fish oil site? I think I’ve found your problem. I’d think about the quality of your links if you’d prefer to have more pages crawled. As these indexing changes have rolled out, we’ve improving how we handle reciprocal link exchanges and link buying/selling.

Some folks that were doing a lot of reciprocal links might see less crawling. If your site has very few links where you’d be on the fringe of the crawl, then it’s relatively normal that changes in the crawl may change how much of your site we crawl. And if you’ve got an affiliate site, it makes sense to think about the amount of value-add that your site provides; you want to provide a reason why users would prefer your site.

Quotes from his comments:

With Bigdaddy, it’s expected behavior that we’ll crawl some more pages than we index. That’s done so that we can improve our crawling and indexing over time, and it doesn’t mean that we don’t like your site.

arubicus, typically the depth of the directory doesn’t make any difference for us; PageRank is a much larger factor. So without knowing your site, I’d look at trying to make sure that your site is using your PageRank well. A tree structure with a certain fanout at each level is usually a good way of doing it.

BTW, CrankyDave, your site seems like an example of one of those sites that might have been crawled more before because of link exchanges. I picked five at random and they were all just traded links. Google is less likely to give those links as much weight now. That’s the simple explanation for why we don’t crawl you as deeply, in my opinion.

To the degree that search engines reflect reputation on the web, the best way to gather links is to offer services or information that attract visitors and links on your own. Things like blogs are a great way to attract links because you’re offering a look behind the curtain of whatever your subject is, for example.

On the other hand, don’t expect that just listing a sitemap is enough to get a domain crawled. If no one ever links to your site, that makes Googlebot less likely to crawl your pages.

graywolf, it’s true that if you had N backlinks and some fraction of those are considered lower quality, we’d crawl your site less than if all N were fantastic. Hope that makes sense. Light crawling can also mean “we just didn’t see many links to your domain” as well though.

There’s SEO and there’s QUALITY and there’s also finding the hook or angle that captivates a visitor and gets word-of-mouth or return visits. First I’d work on QUALITY. Then there’s factual SEO. Things like: are all of my pages reachable with a text browser from a root page without going through exotic stuff. Or having a site map on your site. After you’re site is crawlable, then I’d work on the HOOK that makes your site interesting/useful.

BTW, just because Yahoo reports nofollow links in the Site Explorer, I wouldn’t assume that those links are counting for Yahoo!Rank (or whatever you want to call it ).

For example, if you don’t have as much PageRank relative to other sites, you may see fewer pages crawled/indexed in our main results; that would often be visible by having more supplemental results listed.

“Sounds like Google is now actually penalizing for poor quality inbound links.” Mike, that isn’t what’s happening in the examples that I mentioned. It’s just that those links aren’t helping the site.

David Burdon, no, off-topic links wouldn’t cause a penalty by themselves. Now if the off-topic links are spammy, that could cause a problem. But if a hardware company links to a software package, that’s often a good link even though some people might think of the link as off-topic.

To go to your other question. I wouldn’t be thinking in terms of “if I like to Yahoo/Google/ODP/whatever, I’ll get some cred because those sites are good.” If it’s natural and good to link to a particular site because it would help your users, I’d do it. But I wouldn’t expect to get a lot of benefit from linking to a bunch of high-PageRank sites.

Jack Mitchell, you said “Saying you can’t do reciprocal linking is just sheer idiocy. How does Google expect you to get back links?” I’m not saying not to do reciprocal links. I only said that in the cases that I checked out, some of the sites were probably being crawled less because those reciprocal links weren’t counting as much. As far as how to get back links, things like offering tools (robots.txt checkers), information (newsletters, blogs), services, or interesting hooks (e.g. seobuzzbox doing interviews) can really jumpstart links. Building up a reputation with a community helps (doing forums on your own site or participating in other forums can help). As far as hooks, I’d study things like digg, slashdot, reddit, techmeme, tailrank to get an idea of what captures people’s attention. For example, contests and controversy attract links, but can be overused. That would be my quick take.

Spam Reporter, a lot of the time we’ll give a relatively short penalty (e.g. 30 days) for the first instance of hidden text. You might submit again because sometimes we’ll decide to take stronger action.

Technorati Tags: ,

4 responses » Leave a comment
  1. Nor Cal Cars said on May 23, 2006 at 9:37 pm

    Interesting blog. Seems this will likely hurt small regionalized sites that by nature will not have earth shattering new content will get the attention of the search engines even though it may be very relevant to people interested in the business.

    One example is a landscaping supply site I’m working on. Not likely to have content of interest world wide but it is of interest to the people in the region it’s located. Recip links are pretty much the only reliable way to get links.

  2. Aaron Pratt said on May 25, 2006 at 3:07 pm

    Jim - If you visit google groups you will see hundreds of people who either:

    1.) Have no backlinks and are on lame free hosted subdomains.
    2.) People who took part in reciporcal link trading.

    All have the same complaint, “our site was dropped from the index”!

    Of course there is google’s admitted issues at the same time so everyone is a little confused.

    Thank God I didn’t ever do much link trading but at the same time I also didn’t do much link building and that still seems to be needed.

  3. Linda Lynch said on July 8, 2006 at 10:25 am

    Greetings
    In my view, Google rules of the road will result in a huge loss to both the middle to small business on the web AND search engine users. While I truly appreciate Matt’s advice and insight we have to get down to the grass roots-the backbone of the web-and look long and hard. Blogs and rss, forums et al are waaaaaaay beyond the capability of many who own sites. The knowlege in the biz community about SEO is just coming into its own. Another trend has been quietly gathering speed=people searching are now skipping the first page!!Why? Because, they tell me that they figure that these sites have ‘payed’ for their ranking or have used ‘high tech’tricks to get there-they want the ‘real deal’! Google should do some surveys of medium and small business on the web and see that the majority are not web savvy, are struggling to find an IT budget, are just happy to have a web presence. They have great products and services and have made a significant-to them-investment to get on the web and there just isn’t any money,people to do the Google website ‘tweak’. The fact is that many have sites that were created before the Google rules of the road happened don’t have budgets for redesigns,etc. Another problem is folks who maybe design their own with minimal skill and have no idea about this ’stuff’. Personally, I think Google should have two search options “top guns”, Newbies-…………..
    Cheers!

RSS feed for comments on this post or Track with co.mments

Leave a Comment

To quote: <blockquote>text to quote</blockquote>

Link to This Page

If you found this page useful, consider linking to it.
Simply copy and paste the code below into your web site (Ctrl+C to copy)
It will look like this: Matt Cutts on BigDaddy, linking, crawling etc

Pingbacks


Post Navigation by Category

Previous post in category: The invaluable SEO Guide

Next post in category: Turning down SEO clients