Cloaking
Articles,  Blog

Cloaking


MATT CUTTS: Hi, everybody. It’s Matt Cutts. And we’re back to talk
a little bit about cloaking today. A lot of people have questions
about cloaking. What exactly is it? How does Google define it? Why is it high risk behavior? All those sorts of things. And there’s a lot of
HTML documentation. We’ve done a lot
of blog posts. But I wanted to sort of do the
definitive cloaking video, and answer some of those questions,
and give people a few rules of thumb to make
sure that you’re not in a high risk area. So first off, what
is cloaking? Cloaking is essentially showing
different content to users than to Googlebot. So imagine that you have a
web server right here. And a user comes and
asks for a page. So here’s your user. You give him some
sort of page. Everybody’s happy. And now, let’s have Googlebot
come and ask for a page as well. And you give Googlebot a page. Now in the vast majority of
situations, the same content goes to Googlebot
and to users. Everybody’s happy. Cloaking is when you show
different content to users than to Googlebot. And it’s definitely high risk. That’s a violation of our
quality guidelines. If you do a search for quality
guidelines on Google, you’ll find a list of all the stuff– a lot of auxiliary documentation
about how to find out whether you’re
in a high risk area. But let’s just talk through
this a little bit. Why do we consider cloaking bad,
or why does Google not like cloaking? Well, the answer is sort of in
the ancient days of search engines, when you’d see a lot of
people do really deceptive or misleading things
with cloaking. So for example, when Googlebot
came, the web server that was cloaking might return a page
all about cartoons– Disney cartoons, whatever. But when a user came and visited
the page, the web server might return something
like porn. And so if you do a search for
Disney cartoons on Google, you’d get a page that looked
like it would be about cartoons, you’d click on it,
and then you’d get porn. That’s a hugely bad
experience. People complain about it. It’s an awful experience
for users. So we say that all types of
cloaking are against our quality guidelines. So there’s no such thing
as white hat cloaking. Certainly, when somebody’s doing
something especially deceptive or misleading, that’s
when we care the most. That’s when the web spam team
really gets involved. But any type of cloaking is
against our guidelines. OK. So what are some rules of thumb
to sort of save you the trouble or help you stay out
of a high risk area? One way to think about cloaking
is, almost take the page, like you Wget
it or you cURL it. You somehow fetch it, and you
take a hash of that page. So take all the different
content and boil it down to one number. And then you pretend to be
Googlebot, with a Googlebot user agent. We even have a Fetch as
Googlebot feature in Google Webmaster Tools. So you fetch a page as
Googlebot, and you hash that page as well. And if those numbers are
different, then that could be a little bit tricky. That could be something
where you might be in a high risk area. Now pages can be dynamic. You might have things like
timestamps, the ads might change, so it’s not a
hard and fast rule. Another simple heuristic to keep
in mind is if you were to look through the code of your
web server, would you find something that deliberately
checks for a user agent of Googlebot specifically or
Googlebot’s IP address specifically? Because if you’re doing
something very different, or special, or unusual
for Googlebot– either its user agent
or its IP address– that’s the potential to maybe
be showing different content to Googlebot than to users. And that’s the stuff
that’s high risk. So keep those kinds
of things in mind. Now one question we get from a
lot of people who are white hat, and don’t want to be
involved in cloaking in any way, and want to make sure that
they steer clear of high risk areas, are what about
geolocation and mobile user agents– so phones and
that sort of thing. And the good news– the
executive sort of summary– is that you don’t really need
to worry about that. But let’s talk through exactly
why geolocation and handling mobile phones is not cloaking. OK. So until now, we’ve
had one user. Now let’s go ahead
and say this user is coming from France. And let’s have a completely
different user, and let’s say maybe they’re coming from
the United Kingdom. In an ideal world, if you have
your content available on a .fr domain, or .uk domain, or
in different languages, because you’ve gone through the
work to translate them, it’s really, really helpful if
someone coming from a French IP address gets their
content in French. They’re going to be much
happier about that. So what geolocation does is
whenever a request comes in to the web server, you look at the
IP address and you say, ah, this is a French
IP address. I’m going to send them the
French language version or send them to .fr version
of my domain. If someone comes in and their
browser language is English, or their IP address is something
from America or Canada, something like that,
then you say, aha, English is probably the best message,
unless they’re coming from the French part of Canada,
of course. So what that is doing is you’re
making the decision based on the IP address. As long as you’re not making
some specific country that Googlebot belongs to– Googlandia or something
like that– then you’re not doing something
special or different for Googlebot. At least currently– when we’re
making this video– Googlebot crawls from
the United States. And so you would treat Googlebot
just like a visitor from the United States. You’d serve up content
in English. And we typically recommend that
you treat Googlebot just like a regular desktop browser–
so Internet Explorer 7 or whatever a very common
desktop browser is for your particular site. So geolocation– that is, looking at the IP
address and reacting to that– is totally fine, as long
as you’re not reacting specifically to the IP address
of just Googlebot, just that very narrow range. Instead, you’re looking at
OK, what’s the best user experience overall depending
on the IP address? In the same way, if someone
now comes in– and let’s say that
they’re coming in from a mobile phone– so they’re accessing it via an
iPhone or an Android phone. And you can figure out OK, that
is a completely different user agent. It’s got completely different
capabilities. It’s totally fine to respond
to that user agent and give them a more squeezed version
of the website or something that fits better on
a smaller screen. Again, the difference is if
you’re treating Googlebot like a desktop user– so that user
agent doesn’t have anything special or different
that you’re doing– then you should be in perfectly
fine shape. So you’re looking at the
capabilities of the mobile phone, you’re returning an
appropriately customized page, but you’re not trying to do
anything deceptive or misleading. You’re not treating Googlebot
really differently, based on its user agent. And you should be fine there. So the one last thing I want
to mention– and this is a little bit of a power user kind
of thing– is some people are like, OK, I won’t make the
distinction based on the exact user agent string or the exact
IP address range that Googlebot comes from,
but maybe I’ll say check for cookies. And if somebody doesn’t respond
to cookies or if they don’t treat JavaScript the same
way, then I’ll carve out and I’ll treat that
differently. And the litmus test there is are
you basically using that as an excuse to try to find
a way to treat Googlebot differently or try to find some
way to segment Googlebot and make it do a completely
different thing? So again the instinct behind
cloaking is are you treating users the same way as you’re
treating Googlebot? We want to score and return
roughly the same page that the user is going to see. So we want the end user
experience when they click on a Google result to be the same
as if they’d just come to the page themselves. So that’s why you shouldn’t
treat Googlebot differently. That’s why cloaking is a bad
experience, why it violates our quality guidelines. And that’s why we do pay
attention to it. There’s no such thing as
white hat cloaking. We really do want to make sure
that the page the user sees is the same page that
Googlebot saw. OK, so I hope that
kind of helps. I hope that explains a little
bit about cloaking, some simple rules of thumb. And again, if you get nothing
else from this video, basically ask yourself, do I
have special code that looks exactly for the user agent
Googlebot or the exact IP address of Googlebot and treat
it differently somehow? If you treat it just like
everybody else– so you send it based on geolocation,
you look at the user agent phones– that sort of thing is fine. It’s just you’re looking for
Googlebot specifically, and you’re doing something
different, that’s where you start to get into a
high risk area. We’ve got more documentation
on our website. So we’ll probably have links
to that, if you look at the metadata for this video. But I hope that explains a
little bit about why we feel the way we do about cloaking,
why we take it seriously, and how we look at the overall
effect in trying to decide whether something is cloaking. The end user effect is what
we’re ultimately looking at. And so regardless of what your
code is, if something is served up that’s radically
different to Googlebot than to users, that’s something that
we’re probably going to be concerned about. Hope that helps.

60 Comments

  • Dan Fabulich

    It's really disappointing that this video doesn't mention Google's "First Click Free" program (which requires you to check for the useragent "Googlebot") or Google's AJAX Crawling specification, which lets you return HTMLified JavaScript just for Googlebot.

  • dicrox

    Hi, If Googlebot crawls only from US. How can crawl a Spanish Website that redirects to the english version? In this case the Spanish Website is kind of invisible? or what?

  • dicrox

    @adithecool huum, then the user will be redirected again when clic on the Spanish site. Do you mean to disable the redirection if the user comes through the link and not redirected if is coming directly? it's a bit weird no?

  • lemannequin

    To clarify (and because there wasn't space for more characters).

    Most users (those with JS enabled or a particular user agent) will get the basic page that will get some enhancents during loading/rendering time.
    Other visitors (those with JS disabled, or GoogleBot) will get the basic page, with the relevant/expected content, without all the other extra elements (navigation, links to related content, etc)

  • Keith Carberry

    so, it seems to me that a short url service is not good either? especially if it comes with a frame? I think I may have missed something here… I need to to study this a bit more… very interesting though thank you!

  • Carrie Eller

    Hi Matt, the last comment you made regarding the litmus test for "power user kind of thing" did not answer the question fully if you would be in a high risk area. Our UX team ofter has questions about testing and having navigation hidden on certain pages through JS when a page loads. This would not be attempting to Spam Google or influence rankings, but to A/B test user interaction w/in our site limitations. Would this still be considered cloaking and therefore bad?

  • Sadie-Michaela Harris

    Hello… very helpful, thanks
    Slightly of topic;- what's the Google thinking on the use of stealth forwarding please? Sadie

  • rebelseo

    Matt, on the Google developer's page on "Video Search – Webmaster EDU" the description for both contentURL and embedURL suggest the following: "Best practice: Ensure that only Googlebot accesses your content by using a reverse DNS lookup."
    Doesn't that conflict with what you have told us here about NOT carving out specific content for Googlebot? Or are these meta properties a different story?
    How would one go about only showing that meta tag to Googlebot?

  • Kevin Cox

    If I have a version of my site for text-only browsers (Lynx, w3m) that I serve based on user-agent, is it alright if I add the GoogleBot UA to that list? Or do you always want the desktop version. It seams silly that I have to render it in something like phantomjs when the text only page has the same content just presented differently.

  • Matthew Marr

    I'm here for the story about the paper clip. Where's the god damn paper clip? I think we're talking about 2 different types of cloaking…

  • Olvin Abarca

    The way I understand it, is that you should have a link to the Spanish version from the English version of your site so that Googlebot can visit it later

  • Will Farley

    SEO is retarded. The only way to get on top of search results is to bribe a starving google engineer.
    In every one of his video, Matt (sarcastically) concludes; time spent on UI will have a better ROI than SEO voodoo.

  • hab1b1

    Since Googlebot crawls from the US. you'd be treating googlebot and the user the SAME way: you would be redirecting them to the english version. As long as they are the SAME experience, it isn't cloaking

  • Wayne Phillips

    Matt,

    In the not too distant future, I'm hoping to get into internet marketing in a big way. As such, I've found this video to be quite useful and informative. (Actually, that applies to all of your videos I've viewed so far.)

    It goes without saying that I've both "liked" this video and subscribed to your channel!

    Keep 'em comin'!

  • svommams566

    You can set it up such that / on your domain redirects, but /index.html does not redirect. Then you can add links allowing the users to choose a specific version of the website, and have those link to the relevant site's index.html file.

  • svommams566

    I have a page which rendered incorrectly in the Google preview of the page because of the way Googlebot spend several days crawling some dynamic content, which a browser would donwload in a matter of seconds.

    In the end I made a special case for Googlebot in order to get it to see something more like the user would see. Is that considered cloacking?

  • AJ Mihalic

    Why no mention of multivariate & A/B testing? Based on the definition of "cloaking" above, then any testing program like "Adobe Test & Target" will be considered cloaking, as they intentionally exclude Google from their tests.

  • David Little

    This doesn;t pertain to Google,but I find annoying that Canada who spells like the U.K.does not have an option in which it appears your from the U.K. not Canada,or the other choice is to appear like an American.Either way Canadians appear to be from somewhere else and the equivalent of Google Bot is given false info.

  • Dharmender Singh Dehiya

    But what about Cloaking Affiliate Links ??
    If user lands on a specific page he was finding for and he likes the product OR service on that particular page and then clicking on the link on that page the user lands on the other page with relavent content and make a purchase.

    If any one can answer it ?? Please help.

  • mark goat

    Hello sir,
    You did not explain one particular thing: What if our website is purely ajax based with no graceful degradation, and when we detect a Googlebot we serve it an HTML page which is equivalent to what the a user would see after the ajax calls have loaded his page?

    Specifically, do things like "Seoserver" fall into the "Cloaking" category?

    Seoserver link: https://github.com/thomasdavis/seoserver

    "Seo Server is a command line tool that runs a server that allows GoogleBot(and any other crawlers) to crawl your heavily Javascript built websites. Seo Server runs PhantomJs(headless webkit browser) which renders the page fully and returns the fully executed code to GoogleBot."

  • Narasimham Krovvidi

    To cover the nonworking and diverting the real issues these useless and unnecessary contaversies are brought into forefront. Otherwise what purpose is served and what gain is made by these roumermongers

  • Miss Pixels

    Does it count as cloaking if you just got rid of an index of a search result that you cannot delete online e.g. a comment from a site that never allows you to delete any comment you post? Or is it still treating both users and googlebots differently?

  • Rob Paul

    6 years later and google is still indexing cloaked spam pages and actually removing the original page from their index…

    Shameful…

  • Cycling Around The World

    Thank you so much for info. When I search Cloaking on online, it looked too complicated to understand. After watching one minute, it is so easy to understand! Awesome!

  • FMGM1PCE7L

    What if I send you a bit different version just to make it easier to parse? I'm not gonna put different text, images, links. How to promote a website the best way? I'm not gonna trick your bot to put my website higher.

  • Solomon Ucko

    What about the Accept-Language header? Not everyone in a given country necessarily speak's that countries language very well. Tourists are a good example. IP address is a decent backup if there is no Accept-Language header or it contains no available languages.

Leave a Reply

Your email address will not be published. Required fields are marked *