English Google Webmaster Central office-hours hangout on best practices
Articles,  Blog

English Google Webmaster Central office-hours hangout on best practices


JOHN MUELLER: Welcome everyone
to today’s Google Webmaster Central office hours Hangouts. My name is John Mueller. I am a webmaster trends analysts
here at Google Switzerland, and part of what I do is I
talk to webmasters like you all to make sure that
you have information you need to make
fantastic websites and so that our engineers
have information from you guys to see what we need to do to
improve our search results. So one thing we
talked about, I think in one of the
previous Hangouts was to do some more general
information, some themes maybe. I compiled a short list of
best practices and myths that I’d love to share with
you guys just start off with, and after that, we’ll
start off with the Q and A. There are lots of questions
submitted already, and feel free to submit
more as we go along. All right. Let’s see if I can switch
over to the presentation here. So I guess we can start
off with something basic. Essentially something
you would assume one of the things we love to
see is that we can actually look at the content,
for example. So we recommend using fast and
reusable technologies the way to kind of present
your content in ways that works across all
different types of devices. So Googlebot can
render pages now so JavaScript isn’t something
you need to avoid completely, but I’d still make sure that
it can actually view it, so check on a smartphone,
check with the Fetch as Googlebot feature
in Webmaster Tools, check with the
webpagetest.org tool to see that it’s actually
loading in a reasonable time. Responsive Web
Design is a great way to create one
website that you can reuse across a
number of devices. So if you don’t have anything
specific for mobile yet, that might be
something to look into. I’d recommend avoiding
Flash and other rich media for textual content. So if there’s text on
your page and you’d like to have it
indexed directly, make sure it’s
actually on the page, make sure it’s loaded
as HTML somehow so that we can pick it up and
index it as good as we can. Someone has a bit of
noise in the background. Let me just mute you. Feel free to unmute if
you have any questions. If videos are critical, make
sure that they work everywhere. Certain types of videos
don’t work on mobile phones, for example. So there’s a special HTML5 tag
you can use to kind of provide alternate medias
for those devices, or if you use a common video
system like YouTube or Vimeo, then those almost
always work as well. And regarding your
website architecture, most CMS systems work
out of box nowadays. You don’t need to
tweak anything. You don’t need to
re change everything to use search engine-friendly
URLs or anything like that. Most of them pretty
much just work. For indexing, we recommend using
sitemaps and RSS or Atom feeds. Sitemaps let us know
about all of your URLs, so these are essentially files
where you this everything that you have on your website
and you let us know about that. We can use that
to recognize URLs that we might have
missed along the way. RSS and Atom feeds
are a great way to let us know about
new or an updated URL. So if you push both of
these at the same time, then usually what will happen
is we’ll crawl the sitemap files regularly, but we’ll crawl
their feed a lot more frequently because usually
it’s a smaller file. It’s easier for your
server to server that. We can use something
like pubsubhubbub to fetch it
essentially immediately once you publish something. An important aspect
there is to make sure you use the right dates
in both of these files, so don’t just use a
current server date, make sure this is
actually the date that matches your primary content. So if you have, for
example, a sidebar that changes randomly
according to daily news, that’s not something you’d use is
a change for these pages. It should really be
the primary content. You can include
comments if you want, but essentially it should be
the primary content, the data [INAUDIBLE]. And the indexed URL
account confirms indexing of exactly the
URLs that you submitted. If the index count
doesn’t match what you think you have indexed, for
example, if you check an index status, then usually that’s
a sign that the URLs you’re submitting there are slightly
different than the ones we find during crawling. So that’s something where I
really double check the exact URLs that are submitted there,
I’m happy also help with that if you have a thread in a forum
or on Google+ where you have an example where it looks like
the index count is lower than it should be, but in almost
all cases that I’ve looked at so far, essentially the URLs
have to match exactly what was actually found during crawling. We recommend using the rel
canonical where you can. This is a great way to let us
know about the preferred URL that you’d like to have indexed. If you use tracking
parameters like analytics, then that’s something
that sometimes is shown. Sometimes people will
link to those URLs with tracking parameters,
and the rel canonical lets us know that
this is actually the URL that you
want to have indexed. It can be the right
upper lowercase. It could be on the right host
name with the right protocol. That’s essentially
the one you want. It’s fine to have that
on the page itself. Make sure you set up correctly
so that they don’t all point at the home page. That’s a common mistake
we see, and it’s important that the pages be
equivalent, so don’t do this across different types of pages. So if you have a blue
shoe and a red shoe and different pages
for that, those pages aren’t really equivalent, so
I’d recommend not setting a rel canonical across those pages. Sorry. Was there a question? AUDIENCE: Oh, yes. JOHN MUELLER: OK. AUDIENCE: I’d like to ask
you about canonical setting. My question is right or wrong. So can I send a group job? JOHN MUELLER: Sure. AUDIENCE: [INAUDIBLE]? JOHN MUELLER: Oh, wow. OK. Long question. Best practice for
canonical setting, small ecommerce site with not
so many items with pagination, faceted navigation. I’d have to take a look
at exactly what you’re looking at there,
but I’d recommend taking a look at our
blog posts we did, a blog post on faceted
navigation earlier this year. AUDIENCE: Yeah. I’ve already read this. But in my example,
the faceted stops are not so important to
the main canonical page, so does it matter if it is
canonical or in [INAUDIBLE]? JOHN MUELLER: Yeah. It should be essentially
equivalent content, so one thing the
engineering sometimes do is they take a look at the
text that actually matches between these pages and
if the text matches, then that’s a good sign. If the order is
slightly different, then that’s less of a
problem, but the text itself should be matching. So if you have different
items in a different order, that’s fine. If you have completely
different items, then that’s something
where I’d say don’t use a rel canonical there. AUDIENCE: I see. Thanks. JOHN MUELLER: All right. Let’s look at the next one here. This is one we see
almost in every Hangout, a duplicate content question. So there’s a myth that
duplicate content will penalize your site, and that’s
definitely not the case. I certainly wouldn’t worry
about duplicate content within your website. We’re pretty good at recognizing
these kind of duplicates and ignoring them. It still makes sense to
clean them up for crawling, and if you’re reusing
content from other websites, if that’s the kind of
duplicate content you have, then just make sure you’re
being a reasonable with that. Make sure you’re using that
with reasonable amounts and not just taking all of your
content from other people’s websites. With regards to where duplicate
content does affect your site, I made this nice
and simple diagram, but maybe not so simple. Essentially this is our somewhat
simplified web search pipeline. We start off with
a bunch of URLs that we know about
from your website, we set them to a
scheduler which makes sure that we don’t crash your
server by crawling everything all the time, and Googlebot
goes off and crawls them from your website and brings
the results back and says, oh, I found a bunch of new URLs. These are here. It also takes the content
that it found on those pages and passes them on to indexing. And essentially what
happens is, if you have a lot of duplicate content
with different URL parameters, for example, we’ll collect a
ton of URLs for your website that we’ll know about with all
these different variations. We try to schedule them
regularly so that we can like double check to make
sure that we’re not missing anything, and we’ll essentially
be kind of stuck in this cycle here crawling all
of these duplicates instead of being able to
focus on your main content. So if you have things that
change quickly on your website, we might be stuck
here kind of crawling all over your duplicates
in the meantime, which makes it a little
bit slower for us to pick up your actual content. So that’s one aspect
where duplicate content can make a difference. This is especially true if
you have a lot of duplication. If you just have like dub,
dub, dub, non dub, dub, dub, those duplicates then
that’s less of a problem. That’s essentially
everything twice, but it’s not that
much of a problem if you have everything 10
times or 20 times or 100 times, then that’s going to make
a big difference in how you can crawl your website. Then once we’ve
picked up the content, we send it off to indexing. Indexing tries to recognize
duplicates as well and will reject those kind
of duplicates and say, hey, I already know about this URL. I already know about this
content under a different URL. I don’t really need to
keep an extra copy of this. So this is another
place where we’ll run through your duplicates. We’ll pass them onto the
system here, and at this point, the system will say,
well, I don’t actually need this content. We didn’t need to waste
our time essentially trying to pick it up. Sometimes what also happens is
that pages aren’t completely duplicate, and in
cases like that, for example, you
have one article that you post on your
US English website and you post the same
article on your UK English website, which makes sense. It’s your company website. The same article might
be relevant for both of your audiences,
that’s essentially fine. And in those cases, we
do actually index some, but we try to filter them out
during the search results. So that looks a little bit like
this with a very simplified figure here where essentially we
have these three pages that we were crawling and indexing
and the green part here is essentially
the content that’s the same across these pages. So this could be, for
example, the same article that you have on your UK
English page, your US English page, and Australia
English page, for example. So the article is the same. The headings are different. There’s some other content
on these pages maybe. And what happens in
these cases we’ll index all of these three pages
and in the search results, if someone searches for
part the generic content, we’ll try to match one of these
URLs depending on whichever one we think is the most
relevant that might be, for example, the location–
these are different country versions– and we’ll
just show that one. So one of these
will be shown here. That’s one we’ll show. We’ll show other
search results as well, but essentially, we’re
just picking one of these to show in the search results. And what happens then
if someone searches for part of the generic
content that also something specific to the page
itself, then, of course, we’ll show that specific
version in the search results. So the generic text,
of course, that people were searching for as
well something unique. So if this article is an article
about a certain type of shoe, for example, if someone searches
for that shoe type directly, we’ll show one of
these pages depending on whatever our algorithms
think makes sense. If someone is searching
for this type of shoe and mentions that they
want to buy it in the UK, then we’ll probably pick
the version that actually has UK address on it and show
that one in the search results. So this isn’t something where
we’re penalizing your website, but essentially we’re
taking these three pages and trying to pick the best
one to show the search results and the others will
be filtered out because it doesn’t
make sense to show the same content to the
user multiple times. So I hope that kind of helps
with the duplicate content question. AUDIENCE: And John,
and to have any sort of play in and this
whole thing, you know, when I looked at quality
across all the pages, would you in an ideal
world be better off doing something different? I mean, in our
circumstances, obviously, we have locations in close
proximity to each other. Or perhaps if you’re looking
for within a certain distance office space that has
one desk available, you’re going to start to get
these groupings of locations where there is duplicate
content that quite suitably across the board. And I wonder
whether people still are talking about this
in multiple forums, is Panda still going to have
some effect on your site, in other words, on the quality. JOHN MUELLER: So
primarily this type of filtering that we do
for duplicate content is a technical thing
that we do just to find the right match
there, and it’s not something that we primarily use
as a quality factor. There are lots of
really good reasons why you’d duplicate some of your
content across different sites. There are technical reasons,
there are marketing reasons, there might be legal reasons
why you would want to do that. That’s essentially
not something that we would say this technical
thing that you’re doing is a sign of
lower quality content. On the other hand,
this is something that users might notice
if you do it excessively, if you create doorway pages, if
you kind of duplicate content from other people’s websites
and act like it’s your content. We see that a lot with
Wikipedia content, for example. And those are the
type of things where the duplicate content itself
isn’t really the problem, but essentially the
problem is you’re not providing any unique value
of your own on these pages. You’re not providing
a reason for us to actually index and to build
these pages in the search results. In this case, where you
have multiple versions for different
countries, for example, there’s a good reason to have
all of these different pages. There’s a good reason
to have them indexed. There’s a good reason to
have them shown in search sometimes because it is
very relevant for the type of queries here. But if you kind of
take this excessively and you create pages for every
city and actually the content itself isn’t really relevant
for every city, then that’s something where
our algorithms would start seeing that as a sign of lower
quality content, maybe even WebSpan. So that’s a situation you’d
want to watch out for. Technically, having content
duplicated by itself is not a reason for us to say
this is lower quality content. There can always be
really good reasons why you’d want to do that. AUDIENCE: Sure. Thanks, John. Is there a sort of a
set amount of words you use as a guide
to what you would consider to be
duplicate content. I mean, would anything let’s
say over five words that are similar be considered to be
duplicate content or would you simply take it right
down to a single stroke to letter words,
sentences, or do you take it in batches of sort
of 10 or 20 words in a row? Does that make
sense, that question? JOHN MUELLER: Yeah. So when it comes to technically
recognizing duplicate content like this, we use a
variety of ways to do that, and sometimes we’ll
even look at things like saying, well, this
looks very similar. It’s not exactly duplicate, but
it’s very similar, therefore, we’ll treat it as
being the same. So that’s something
where we wouldn’t say you need to focus on
a certain number of words or where you’d need to like
alternate every other sentence or something like that. I don’t think this
type of filtering is something you’d need
to kind of work around. This is essentially
the normal part of web search, a
normal thing that you don’t need to kind of like
eliminate from your website. AUDIENCE: Yeah. OK. Excellent. Thanks, John. JOHN MUELLER: Sure. AUDIENCE: Just a
question on when you mentioned duplicate
content from other websites. Would you recommend
putting a kind of locking citationing
back to that website just to kind of ensure
that you kind of want to show that you’re not trying
to steal that content, you’re actually just referencing it? JOHN MUELLER: You can do that. It’s not something that
our algorithms specifically look for, but I think that’s
always a good practice. It helps people to understand
your content a little bit better, but it’s not
something where we technically watch out for. And similarly using
something like I think the HTML5 site
tag is something you can do if you
want to do that. It’s not something
that we would say, oh, this looks like a
quote from another website, therefore, we shouldn’t
count it for this website. Sometimes discussions
around a quote are just as valuable
as a quote itself. So it’s something I
think is a good practice, but it’s not something that our
algorithms would specifically look for. AUDIENCE: OK. No, that’s fine. JOHN MUELLER: All right. Let’s look at a
few more of these. Robots.txt, we have a lot
around robots.txt nowadays. So it used to be
that people would say they used robots.txt to
eliminate duplicate content. We’d see a lot of things
like this on the side where the robots.txt
file disallows everything with parameters in it. And that’s essentially
causing more problems than it solves, because if we
can’t crawl those pages, if we can’t recognize that
they’re duplicate, and we can’t filter
them out afterwards. Well, essentially you have
your original content, all of these duplicates with
the parameters, which will index just with the
URL alone, and we won’t know that they
actually belong together. So if someone were to
link to one of these URLs with a parameter, we wouldn’t
know that we can actually forward this link onto
your main content. We’d say, oh well,
someone is linking to this URL that
happens to be roboted. It must be relevant,
so maybe we’ll show it in the search
results as well. So robots.txt is
a really bad way of dealing with
duplicate content. Best is, of course, to
have a clean URL structure from the start. That’s not always
possible if you’re working with an existing site. Using 301 redirects is a
great way to clean that up. Using rel canonical
is very useful. The parameter handling
tool also helps you in situations
where you might not be able to use redirects
or rel canonical. AUDIENCE: John, just
going back to that. You’re saying just let
Google crawl everything, because we do have some of that
stuff blocked in robots where the dropdowns for search
and dropdowns for filtering are all being or
were being indexed, and so we’d have
thousands of pages. And previously that
was a general feeling that that was duplicate content,
so it doesn’t look good. Particularly if you go into
the web master talks forums and ask that question, you
get jumped on straight away. The general feeling
in those forums is, block it, get
rid of it, don’t have it expire if
it’s duplicate. JOHN MUELLER: Yeah. I think it’s always
a good practice to clean up duplicate
content if you recognize it. I think that’s
something I’d recommend doing if you can do that. AUDIENCE: No, the
duplicate I’m talking about is category pages where
someone has, like on our side, for example, which
I know you know, someone sorts everything in
California by price, but then by location, and then by
something, activity type. You’ve got three
different filters there, which gives half of [INAUDIBLE]
depending on how you do it. You can’t clean that up. You have to let people, the
users, use those dropdowns, because they’re
useful for the user. So you either let Google
crawl those dropdowns and crawl a half a
million URLs and decide, or just you say
actually it’s one, or do you call everything
in canonical back to the main category? JOHN MUELLER: It depends. So again, we have
a blog post I think on faceted navigation
from a while back. I’d double check that. I’m really exactly– AUDIENCE: The one
with the gummy candy that I think was
referenced there. JOHN MUELLER: No, I
think it’s a newer one from this year or
late last year maybe, but it kind of goes into the
faceted navigation aspect there where it’s
tricky sometimes because we have all these
different categories and filters and options
and ways to kind of sort this content out. But I touch upon that I
think a little bit back here, so we can look at that
a bit later as well. But those are always
tricky to handle, the faceted navigation
and how much of that should you allow it to
crawl, how much of that should be no index
or rel canonical. That’s always hard. AUDIENCE: Is there a [INAUDIBLE]
or don’t we just stick it in [INAUDIBLE]? JOHN MUELLER: it really
depends on the website. Some content it makes
sense to index separately because it provides
value when you index it like that or
people are searching for events in California then
having a California landing page for that. AUDIENCE: Right. Which we know we do have. JOHN MUELLER: Yeah. That makes sense. AUDIENCE: But some
people in California are just actually looking by
price and they’re not that. JOHN MUELLER: Yeah. But I think that’s the
situation where people would go to your website
and they kind of use it first where nobody would
be going to search and say, I’d like to have all
events in California listed by price in search. AUDIENCE: No. I mean, it does
happen a little bit from everything in California
with gifts under $100. People would do search for that
stuff in particular, these days search for a budget because they
assume Google knows everything. So it will do it. JOHN MUELLER: Yeah. I mean, that’s
something where you have to make a judgment call
yourself, like how relevant are these specific pages
or how random are they. Are they’re just like people
entering search queries and we’re indexing every word
that people are searching for, and that’s probably
not that useful, but that really depends
a lot on your website and how you have that set up. AUDIENCE: But in
general, best practice is to canonical all
of the search results into the main category. JOHN MUELLER: It depends. I think this is a good
topic for another Hangout. Yeah. I can see that there
are lots of aspects here that we could cover. So let me just go
through these and– AUDIENCE: One little bit of
advice for everybody regarding that robot subtext
though is that I actually blocked a hell of a lot of
pages on my site in the effort to stop Googlebot from
crawling all of those pages and wasting its time doing it. What I’ve done by homing
in on what Googlebot needs to look at, it’s
actually now indexing calling my pages faster than
ever before because it’s not wasting time crawling
pages it doesn’t need to, and we’ve seen an incredible
change in robust calls as a result of that. So in my experience, if
you know what you’re doing, block whatever
you can and you’re going to allow
Googlebot to really do a good job on your site. That’s my opinion. JOHN MUELLER: I’d have to take
a look at how you implemented that before saying, yes, but the
kind of the caveat if you know what they’re doing
applies to a lot of things especially around websites. AUDIENCE: Yeah. JOHN MUELLER: OK. So let’s go through
some more of these. With regard to
robots.txt, we especially want to be able to crawl CSS
JavaScript pages nowadays because we want to know
what they just look like, which is especially
important when you have a mobile-friendly page. Because if we can’t crawl
your CSS and JavaScripts, we can’t recognize
that this page is a really great
mobile page, then we wouldn’t be able to be treated
as such and search them. Another aspect is error pages
where if you have a URL, for example, where all of
your 404 pages redirect to, if we can’t crawl that
URL, we can’t recognize that these URLs
are errors, and we will try to recrawl those
URLs more often than we might otherwise. So being able to see
that a URL as an error is really useful for us
and not something that causes problems for you. Mobile pages, we sometimes
see that the end.domain, for example, is
blocked by robots.txt. That’s a problem. International and translations,
we still occasionally see that that people say,
well, this is my main page, and this is my page for Germany,
but the German version is just a translation,
therefore I’ll kind of block Google from
kind of crawling that. That might be a
problem because then we can’t see the German version. With regards to robots.txt
threads practices, if your content shouldn’t
be seen by Google at all, then by all means mark it. If there is, for example,
legal reasons why you don’t want that
indexed, that’s something you might want to
just block it by robots.txt. If you have
resource-expensive content, for example,
complicated searches, if you have tools that
take a long time to run, then those are type of
things you might also want to block by robots.txt
so that Googlebot doesn’t go through your site
and say, oh well, let me try possible words
that I found on the internet and insert them into
your search page because I might be able to
find something new there. That might cause a lot
of kind of CPU usage on your website
slowing things down. So that’s the type of
thing where you’d probably want to block that
with robots.txt. Robots.txt doesn’t
prevent indexing so if you don’t
want it indexed then I’d recommend using no index
or server-side authentication. For example, if
there is something confidential on your
website, the robust text file isn’t going to prevent it from
ever showing up in search. You really need to block
that on the server itself using something
like a password so that people, when
they find the URL can’t actually
access the content. As I mentioned, don’t
block JavaScript, CSS, other embedded resources. If you use AJAX, don’t
block those replies from being crawled so
that we can actually pick up all of this content
and use it for indexing. Another myth we always
hear is that my website worked for the last five years. Why is it suddenly not
showing up in search anymore, and where webmasters
essentially say, well, it’s been working
so far, so I’m not going to change anything. And it’s important to keep in
mind that the web constantly changes. Just because it
worked earlier doesn’t mean it will continue working. Google also constantly
works on its algorithms, and the user needs to
change over time as well. So I definitely recommend
staying on top of things and making sure that
you’re not like just like sticking to
your old version out of unnecessary reason. So make sure that you’re
kind of going with the times and really offering something
that users want now in a way that they can use now,
which sometimes means kind of enabling
mobile-friendly websites, for example, to let all the new
users who are using smartphones as a primary device
also get your content. Some other myths that
we see regularly, shared IP addresses,
that’s fine. We know that there’s a limited
number of IP addresses. I wouldn’t worry about
it if someone else is using the same IP address. That’s something that
happens on a lot of hosters. Too many 404 pages, that’s
also fine unless, of course, these pages are ones that
you want to have indexed. So that’s something you
want to watch out for, but if we’re crawling
pages that shouldn’t exist, and we see a 404
and we report that in Webmaster Tools, that’s fine. That’s the way it should be. Affiliate sites are also
OK from our point of view, but you should really
have your own content. So don’t be an
affiliate site and just copy all of the affiliate
content as everyone else has. Really make sure
that you’re providing something useful of your own. The value should be with your
content, not with the link that you’re providing. Disavow files, we sometimes
see webmasters say, I don’t want to submit
one because then Google would think I did
something wrong, and that’s totally wrong. You shouldn’t hold off
on using a disavow file. If you find something
that is problematic that’s linking to your
website that you don’t want to be associated with, go
ahead and disavow that. For us, it’s primarily
a technical tool. it takes these links
out of our system. It treats them similarity
to no follow links and then you don’t have to
worry about those links anymore. So even if those
links are things that you didn’t have
anything to do with, maybe a previous SEO
is set up and you don’t want to kind of admit that
maybe they did something wrong, use of this and make sure
that they’re out of the system so that you don’t have
to worry about it. The order of text in an
HTML file isn’t important. You can put your main content
on the top or the bottom. We can get pretty large
HTML files in the meantime and still recognize
the content there, so that’s not
something where you have to kind of micro
optimize at that level. Keyword density is something
we always hear regularly, just write naturally instead. AUDIENCE: So in order
to manage HTML5 file, you’re saying text though. That’s only from a
coding point of view. You’re not referencing it
from a visual point of view, so the content text
wise I think you’ve discussed before is essentially
probably better being higher up rather than from
a visual aspect? JOHN MUELLER: I’d
definitely make sure that at least part of
your primary content is visible when the user
first lands on your page so that when they click
on the search result, they can recognize, oh, this
is what I was looking for. And from that point
of view, that’s something that’s
generally higher up. It doesn’t have to be the
first thing on the page. But essentially what
I’m mentioned here is sometimes people
think that if they move the div with
the primary content to the top of their page
and then use CSS to show it at the right position, then
that would be better than just having a div wherever
it is in the HTML, and that’s not something
you need to worry about. AUDIENCE: Yeah. I mean, we’re actually
internally discussing right now we’re about to put nice,
large images at the very top of our location
pages, and this is one of the questions I actually
wanted to put to you today. They’re going to be somewhere
in the region of sort of an 800 byte, 400 view. So the first thing really
you’re going to see is this beautiful
image of a location that you were
actually looking for and our customers identify
better with an image than they do with a lot of text
but more on the term they’re just interested in pricing. So are we going to be
affected by maybe Hummingbird or something like
that by doing this? JOHN MUELLER: No. You should be fine. AUDIENCE: [INAUDIBLE]? JOHN MUELLER: Yeah. I mean, this is something
where if this image is your primary content for
that page, that’s fine. I wouldn’t do it in
a way that the logo is the primary
image on this page. So if your company’s logo
is taking up the whole page and you have some
random information from a sidebar showing
up on the first page, then that’s probably
not that great, but if the image of this
specific location, the image matching this specific
content is primarily visible, that’s fine. AUDIENCE: Yeah. So Googlebot is pretty
much or some of your crawls are able to kind of distinguish
the difference between what is a consistent
top image as a logo and what is a unique image
to that specific page. JOHN MUELLER: Yeah. AUDIENCE: OK. Wonderful Thanks. JOHN MUELLER: Another
one I didn’t put on here, but we get a lot of questions
about is the keywords meta tag. It’s one of our most
popular blog posts even now, and essentially we don’t
use the keyword meta tag at all for ranking. So I imagine some of
you well know that. If you’re new to
this area, then that might be something
where you think, oh, this a great way to
put keywords for ranking, but we don’t use that at all. Let’s see. This is the last slide so– AUDIENCE: John? JOHN MUELLER: Yes? AUDIENCE: Which
other tags would you say are almost
completely irrelevant? JOHN MUELLER: There are
a lot of tags out there. AUDIENCE: Yes, a lot of them
like the ones that say follow and abating and stuff
like that, things that– JOHN MUELLER: Those are the
ones that we essentially ignore. We ignore, what is it,
the revisit after tag. That’s something that’s
very commonly used. That’s something we
don’t use at all. I’d have to think which
ones we don’t use. That’s always harder than
which ones we do use. But essentially we try to
look at the primary content on the pages instead and not
focus so much on the meta tags there. AUDIENCE: In fact, it’s
important, but in many cases, it doesn’t get used. It’s not relevant
enough to query. Is that right? JOHN MUELLER: Which
one did you mean? AUDIENCE: This description. JOHN MUELLER: Description. Yeah, we use that for the
snippet in the search results, but we don’t use
that for ranking. So if you have a specific
snippet you want to have shown, that’s something you can
use, but that’s not something where you need to
stuff any keywords. Essentially make
it so that users will recognize what
your content is about. And maybe it includes
a keyword say we’re searching
for so that we know this is relevant
for their query. AUDIENCE: A lot of people
have used different tools that will put matching keywords
and all their tags and URL, and surprisingly this
is still very common. There’s the popular SEO
tool used in WordPress, and I suggest
people not use those because it makes everything
very much the same. JOHN MUELLER: Yeah. That’s something
where I’d primarily focus on what works
for your users. Sometimes it’s easy for
users to work with wordy URLs and with like
identifiers in the URL, but essentially that’s something
that we wouldn’t focus on primarily. So if you’re spending
a lot of time tweaking those kind of keywords,
you’re probably spending time on something
that we’re not really valuing that much. All right. Let me go through
these last four and then we can jump to
the Q and A. One thing that I think is
important is make sure you have all the
versions of your site listed in Webmaster Tools. We treat these sites
as separate sites, so you’ll have
different information potentially for
some of these sites. If you have a clear canonical
setup for your website, then we’ll have most
of your information in that canonical version. If you’ve never set up a
canonical for your website, then we might have some
of this information split across different versions
of the URL than we do indexed, so that can include things
like links to your site. It can include things
like the index status information, those
kind of things. The Fetch as Google render view
is an extremely valuable tool that I recommend
using regularly. Go, for example, to the search
queries feature in Webmaster Tools, pull out the top
10 URLs from your website, and make sure that they
really render properly with Fetch as Google
so that there’s no embedded content that’s
blocked by robots.txt that it kind of matches
what you would see when you look at
it in a browser. Mobile is extremely
important at the moment. There are lots of people who
are using mobile primarily to access the
internet, so make sure you can use your
site on a smartphone, and don’t just look
at your home page. Try to actually complete a task. So if you have an
e-commerce site, try to search for something
within your website. Try to actually order it. See that you can fill out all
of the fields that are required, that’s it’s not
a complete hassle to actually go through
an order something there. Kind of take it
step by step and go from someone who’s
first visiting your website to actually
completing whatever task that you’d like them to do. And finally don’t
get comfortable. Always measure. Always ask for feedback. Always think about ways that
you can improve your website. The whole internet is
changing very quickly, and our algorithms
are changing quickly. What users want is
changing quickly, and if you get too
comfortable, then it’s easy to get stuck in a situation
that everything has changed around you and
suddenly your website isn’t as relevant
as it used to be. So make sure you’re kind
of staying with the trends. And with that, I think that’s
it with this presentation. AUDIENCE: John, why is the
Fetch as Google limited when it comes to submissions? And so you’ve got I think it’s
500 submissions for singular pages and only 10 for
the larger option, and why is it limited
to Webmaster Tools it counts, specifically,
because it’s not a site. JOHN MUELLER: I
don’t know why we chose to do that
specifically there, but that’s something
where we kind of have to reprocess those
URLs in a special way. So that’s something where we’d
like to kind of limited the use there so that it doesn’t
become like the primary way that content is
embedded in Google. And in general, if you make
normal changes on your website, we can pick that
up fairly quickly. I wouldn’t use this tool
as a normal way of kind of letting us know about
changes on your website, instead I’d use
things like sitemaps and feeds to kind of
automate that whole process. So that something where
they’re usually better ways to get this content into
Google, but there’s sometimes exceptions where you
need to kind of manually tell Google to, OK, go ahead
and re index this content as quickly as you can. And that’s kind of why we have
that there and to avoid people from kind of overusing
this for things that it doesn’t actually
need to be used for. We have those limits there. AUDIENCE: So if you were to
set something in your sitemap and to say this page was
updated 10 minutes ago, whatever it is, if
Google then picks up on that sitemap on a
very regular basis, the minute it picks
that up, will it kind of perform
a similar action? JOHN MUELLER: Usually, yeah. It’s something where if we
can trust your sitemap pile, if we can see that the URLs
you submit there are real URLs, if we can trust your
dates to be correct, then we’ll try to take that into
account as quickly as possible, and that can in some cases, be
seconds or minutes after you submit the site that file. For new sites, for example,
that happens really quickly, if we see news content
on a site that we know has been
submitting good sitemap files with the
proper dates in them, then we pick that up
within a few seconds. So that’s something that
is fairly automated. It works really well,
and it’s a great way to kind of get the new content
into the search results. AUDIENCE: And so if you were
to incorrectly index something to say these 20 pages
will get updated weekly but really it’s more monthly
or sometimes six monthly, is there a signal that kind
of distrusts your sitemap? JOHN MUELLER: It’s not so
much that we kind of have a signal about your website and
say we don’t trust you on that, but if we look at
a site map file and we see all the pages on
the whole website change just 5 seconds ago, then chances
are your server is kind of just sending the
server date, and that’s the kind of thing
we’d be looking for. If we look at the site
map file, and we say, well, this kind of matches what
we expect from this website and there are some
new pages there, then that’s kind of a really
strong signal for us to say, OK well, this time around this
site map file looks great. We should trust it. We should pick up
this new content. And that’s kind of
what we’re looking for. That change frequency
is something we don’t use that
much from a sitemap file so if you can, just submit
the actual date of the change instead. AUDIENCE: Excellent. Thanks, John. AUDIENCE: Just a very quick
question on the sitemap bit. Do you use priority in sitemaps? So I should prioritize
pages that you think is more important on
your website than others. Is that something
you use at all? JOHN MUELLER: I think
we don’t use that at the moment for web search. I think it might be used
for custom search engines, but I’m not 100% sure. But for web search,
we’ve gone back and forth about using that. We’ve looked at ways that
people are using this, and to a large extent, it hasn’t
been as useful as we initially thought, so we’re not
really using that. AUDIENCE: OK. No, that’s perfect. And just very
quickly on RSS feeds. Is there anywhere in
particular that you kind of advise to upload that feature? You don’t upload it
as a sitemap do you within Webmaster Tools,
that sort of thing? JOHN MUELLER: You can submit
it as a sitemap in Webmaster Tools. What I’d recommend
doing if you can do that, if you have a really
like fast-changing website is make sure you’re using
pubsubhubbub as well. So pick a hub and
make sure that you’re kind of setup on
your CMS supports pubsubhubbub so that you can
push the content that way, because with pubsubhubbub,
you’re essentially telling us every time you make a
change on your website then we can pick
that up immediately. AUDIENCE: No, that’s perfect. Thank you. JOHN MUELLER: All right. Let’s go through some of
the submitted questions and a bunch of these that
people voted on as well. You spoke of duplicate
content within the website. What about duplicate content
on a site that has copied or scrape view. I realize we should attempt
to try DMCA removal request, but is there any way
Google can better determine the original content? We do try to do
that to some extent. Sometimes that
works really well. Sometimes it doesn’t
work so well. Oftentimes, when I see
it not working so well, I see that the site
that’s being copied has a lot of other
issues as well. So it’s always a
tricky situation. If we run across one
website that’s essentially where we can tell that actually
this is the original content but this website has
so many bad signals attached to it that
we don’t really know how much of this
content we should trust, how much you trust this website. So that’s something where if
you’re seeing scrapers ranking above you, I’d just
double check to make sure that your website is
doing things right as well and maybe use a DMCA tool if
that’s something you can use. Check with your lawyers,
your legal team. If you can’t use a DMCA
or if something just doesn’t work right
with the ranking where you’re seeing these
scrapers rank above your site and you’re actually doing
everything properly, then that’s something
you can definitely always send feedback to us about
so that we can take a look and see what we should
be doing different here. That’s always a
tricky situation. I know our teams are looking at
this kind of a problem as well and seeing what we
can do to kind of make that easier for the web master. Links added in the
source code, but not present on the front end. Does Google consider
this as a backlink? Is it something spammy or a way
of connecting to your website? So I think the
question is more around like hidden text, hidden
content, hidden links, those kind of things. In general, we recognize when
text or content is hidden and we try to kind of
discount it for web search. The same applies to
links essentially. So if these are
hidden links, then I wouldn’t count on those
always being used the same as something that’s
directly visible. So if this is a
link that you want to have counted
for your website, then that’s something I’d make
sure that within your website or however you’re linking that
is something that’s actually visible to users,
that’s usable for users so that users can use
those links as well and it’s not just
something that you’re kind of hiding in your HTML
code for technical reasons, so make sure it’s
actually something that works for your users. Is it true that Google
is in the final stages of releasing Penguin. Would you estimate it’s
days or weeks away now? Yes, we’re working on it. I estimate probably like a few
weeks, not too much longer. But as always, these
kind of time frames are really hard to
judge because we do a lot of testing
towards the end, and we need to make sure
that things are working right before we actually release
these kind of changes. So that’s something where
if it’s not released yet, then I can’t really
promise you that it will be released whenever. I know it’s pretty close
and I know a lot of you are waiting for that,
so soon, but not today. When rebranding,
would you recommend going through backlinks in
requesting the anchor text if it’s a brand name in the
URL we changed in the new brand name and domain? We definitely
recommend making sure that the links change to your
new domain as much as possible so that we can
forward kind of page rank or the value of that link
on specifically to the right URL directly instead of having
to jump through redirects. Also for users, when they click
on those links, the redirect, sometimes, for example, if
they’re on a mobile device takes fairly long to get
processed and kind of go through. So if you can have those links
go directly to your site, if these are important
links, then that’s something I recommend
checking in with them and having them update. Whenever I make a change to
any of the content on my site, I see a drop in performance
often a few weeks before I’m back to where I used to be. Why is this? When you make significant
changes on your websites specifically around the layout
and the text [INAUDIBLE] of that, I think that’s
normal because we have to reprocess
everything, especially if you’re changing things
like your internal linking structure. If you’re switching to a
slightly different CMS, then that’s something
where we essentially have to really understand
the whole website again, and that would be normal to
see changes and fluctuations in search because of that. If you’re just fixing typos
or changing small text pieces on a page, then
usually that shouldn’t be something where you’d see
fluctuations in search for. So it kind of depends
on what kind of changes you are making there. For small textual
changes, I wouldn’t worry about that causing
any kind of fluctuations unless, of course,
you’re removing something that people are very
desperately searching for. So if you have a
page about blue shoes and everyone is
searching for blue shoes because they’re really
popular at the moment and you change that page to
talk about green t-shirts, for example, then,
of course, you’re going to see some changes
there because people aren’t finding what
they’re looking for. Our algorithms can’t confirm
that this is actually a good page anymore. We have to understand it again. Why is my site
crawled every week? How often should it be crawled? Well, that kind of
depends on your website and how often you’re
changing your content. The important thing
to keep in mind is we’re not crawling the
sites, we’re crawling pages. So usually we kind
of differentiate between the types of
pages on our site. We’ll try to pick up a home
page maybe a little bit more frequently than some of
the lower-level pages especially when we can
recognize that something has to change for
a really long time. So if you have an
archive from 1995, chances are those articles
aren’t changing that regularly and we don’t have to recrawl
them every couple of days. So maybe we won’t recall
those every couple of months since then. On the other hand,
maybe the home page has current news
events on it, so we’ll have to recrawl it
every couple of hours or every couple of days. So that’s something where there
isn’t any fixed time where I’d say this page should be
recrawled this frequently. It really depends
on your website, on the number of changes
you make better, but also, to some extent, also on
how we value your website and how important we
think it is to pick up all of these changes. So we’ll sometimes see that with
like lower-quality blogs that are essentially
just like resharing copied content from other
sites already that sometimes it will happen that we’ll
kind of slow down our crawling of those sites
just because we’re saying, well, there’s nothing really
important that we’ve missed here if we went to crawling
every couple of days instead of every couple of minutes. So that’s something
where we’d probably adjusted our crawling
scheme, but if you have a normal website and you
have regular changes on there, you’re using something
like sitemaps or RSS to let us know
about those changes, then we should be kind
of keeping up with that and trying to keep up
with the crawling there. The relationship between
the usability and SEO. I guess that’s really kind of
a big and almost philosophical topic. Essentially if your website
isn’t usable for users, then users aren’t
going to recommend it and that’s something that will
recognize in search as well and try to kind of
reflect in search, but it’s not the case that
we read any kind of usability tests on normal
web pages to say, oh, this has, I don’t know,
wrong text color, therefore, people can’t read
it directly, and we should be kind of
demote it in search. So that’s something
where there’s more of an indirect
relationship there. What are some
current SEO tactics as per the latest
algorithm updates? I guess the best tactic, if
you will, that is still current is having good content. Go ahead, Joshua. JOSHUA: I think maybe
that’s tactics in quotes. Latest tactics. JOHN MUELLER: I think, I
mean, the good parts about all of this at the moment is that
if you’re using a normal CMS system, then the technical
foundation for your website is probably pretty sane,
and it’s not something where you’d have to apply
any kind of special tactics to kind of get that into Google. And the actual
tactics on ranking better are more about
like finding ways to create really good
and useful content, finding ways to be
timely in search, finding ways to kind of provide
something that users value, and that’s not a technical
tactic, if you will. There’s no step-by-step
guide to getting there. That’s something where you
have to work with your users and find the right solution. Is it true that any
changes you do to site, disavow bad links, for
example, won’t come into effect before the next
algorithm update? Disavow file is
processed continuously, so that’s taking into
account with every change that we do on our side. Let me see if there
are any higher– AUDIENCE: Rodney had a question. AUDIENCE: John, can I ask one? JOHN MUELLER: Sure. AUDIENCE: Let me just throw this
doc into the chat, if I can. Is someone playing
squash in the background? JOHN MUELLER: Ooh. Ken, let me mute you
out for a second. I can’t seem to mute you. OK. Go ahead. AUDIENCE: OK. I just pasted a URL,
a doc into the chat, and it’s something
that I’ve asked before, and Gary has asked
before in relation to hreflang and secure. So I wondered if given that
we have two versions of each now on four sites,
as per that diagram, can we effectively cut out the
middle and not bother with– I don’t know if everyone
else can see that, but I know other
people have have asked similar questions before. JOHN MUELLER: Yeah. Oh, gosh. Yeah, I wanted to
make a slide on that. Yeah. I know someone who is
working on a blog post– AUDIENCE: I did one for you. JOHN MUELLER: I know
someone who is working on a blog post around
hreflang and canonicals, so I think that would
apply there as well. But I think in a case like
this, essentially what you have is your two sites. I’m just going to assume
like one is US, one is UK. AUDIENCE: No, it’s all US. We’re only using the
hreflang as a kind of work around as you know. They’re both US so we just
had the one site which was previously subject to
some unknown algorithm issues unless you have an
update for me on that. But moving with hreflang to also
a US-based site helped that. But then the secure
thing was releasing, if possible, use secure. So we did, but we actually
saw a drastic drop when secure was released and the same
as when Barry Schwartz was the call a couple of weeks ago
and he said his secure saw it. He showed you his Webmaster
Tools if you remember, and we had very similar
percentage drops. We’re now actually considering
unwinding that because it’s not recovered within
the last 30 days. After moving to secure
we’ve lost another 50% of our traffic, but
it’s just not covered. So we’re thinking
of unwinding that. But aside from that,
we think there’s no point in having four sites
in the mix when we could just have two and go from the
original to the secure new rather than the
original to secure then over to new non-secure
then over to new secure. Because it’s just more
work and more processing for– the more steps
the more that Google can in one way or
another misunderstand what we’re trying to do
whether rightly or wrongly. JOHN MUELLER: So essentially
with the hreflang, you need to make sure
that you’re doing it between URLs that are canonical. So for example, if you have one
URL with a parameter and one URL without a parameter, and
you say you’re rel canonical or you’re a redirect is going to
the one without the parameter, then that’s the one you should
be using for the hreflang pair. So essentially if
you’re using redirects that point to one
version of your site, then that’s the
version you should be using kind of between
the hreflang settings there. AUDIENCE: These four are
absolutely identical, just so you. They are absolutely identical
apart from HTTP versus HTTPS. The content is 100% identical. So going from one
to four, given what you said there would
make sense rather than going from one to two,
from three to four and then two to four. JOHN MUELLER: I’m trying
to visualize how– AUDIENCE: I may draw
you a diagram for you so you can visualize easily. JOHN MUELLER: I see some of
that but I’m not actually sure how it’s actually
implemented at practice, but essentially you just
need to make sure that you’re kind of connecting the
hreflang between canonical URLs because if we see an hreflang
tag pointing at a URL that we’re not
indexing like that, then essentially we
ignore that hreflang tag. So it really needs to
be between the URLs that we’re actually indexing. So if you’re using
301s or rel canonical to point to one
version of the URL, then that’s the one you should
be using for the hreflang param, and if you’re
not using 301s, if you’re like keeping
one version like this and the other version
is the other canonical, then using the hreflang
directly between those two is fine as well. But it should just be the one
that we’re actually picking up for search and
indexing like that. AUDIENCE: Right. Which I believe is– JOHN MUELLER: I’ll
double check your sites because I’m a bit confused
which one is which and how you have
that actually set up. If you want, feel free to send
me a note on Google+ and I’ll– AUDIENCE: Yeah, I’ll send
another email with what we– because I know I
answered your last one. So I’ll send you another
note of what we did, but I think it’s come home again
because of the secure issue, and I was hoping Barry will
be here so he could say, but if anyone else in the
chat, in the call area has had the same
issues with secure. I haven’t seen it on any forums. JOHN MUELLER: Yeah. So we looked at a
whole bunch of clients. AUDIENCE: [INAUDIBLE]
Barry’s does. JOHN MUELLER: Yeah. We looked at a
whole bunch of sites because we thought it
seemed strange to hear this from a handful of
people at the same time, but for most of them, it’s
actually doing the right thing. So I think there might
be some kind of quirkies in some of our algorithms
there’s still, but on a large, I think, moving to HTTPS should
just work out for most sites. But I am happy to take another
look at how you have yours set specifically because
that sounds like it’s not the typical hreflang or
canonical or secure site move. Yeah. AUDIENCE: Right. I mean, if all else,
they equal the boost that you would normally
get from secure I assume is negligible anyway
if everything else is normal, so it’s better to unwind it
than going the 50% traffic back, because you’re
never going to gain 50% by having secure
versus non-secure. JOHN MUELLER: Yeah. I think if you can confirm that
it’s really from that then that definitely makes sense,
or it’s something where you can say,
well, I will just role this back to the moment
and reconsider it maybe in a couple of months. When I see that other
people are posting lots of good experiences
about that move, that might be something to do. I think that’s always
a sane approach when it comes to these
type of issues. AUDIENCE: OK. I’ll drop you an email. with diagrams. JOHN MUELLER: Great. Yeah, I’ll double
check the pages. OK. So with that, I think
we’re a bit out of time. Thank you all for joining in. Lots of good questions. Lots of good feedback. I hope I’ll see you guys again
in one of the future Hangouts as well. AUDIENCE: Thanks, John. AUDIENCE: Excellent Hangout. Thanks, John. Have a wonderful weekend. JOHN MUELLER: Thank
you, everyone. AUDIENCE: Bye. JOHN MUELLER: Have
a good weekend. Bye. AUDIENCE: Thanks so much.

4 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *