How To Optimise Your Crawl Budget – Francois Goube at BrightonSEO
Articles,  Blog

How To Optimise Your Crawl Budget – Francois Goube at BrightonSEO


Thank you, OK, hi everybody my name is
Francois, today I’m very happy to be with you today. As you may have noticed with
my accent, I am French and I’m really happy to be with you today because we’re
going to talk about something that is very underestimated in all SEO
communities around the world and as Chris has pointed out, he talked about
the crawl budget. We’re gonna go deeper into that concept and you will see that you can boost your rankings if you manage your crawl budget
right. So I’m the CEO and founder of OnCRAWL. I’m also the French Majestic
Ambassador. What you need to know about me is that I’m a nerd. I love reading Google
patents so that’s why I love technical SEO. Today my mission is to help you grow
your inner SEO superhero. To do that we need some knowledge, we need some tools. I’ll try to point you to the right metrics, the right tools you can use to
optimise your crawl budget so let’s get the party started. First things first, you
need to have a look at what Google has to say about Google, about it’s crawl
budget. To sum it all up, Google says there there are a lot of things involved
in what people are calling crawl budget. Basically, you shouldn’t focus too much
on your crawl budget because most of the time we are doing a very very good
job at indexing your website but, I’m sorry, this is not true, this is bullshit. Truth is in most of the cases, Google is not
knowing all your website. To be aware of that you need to dive into your log
files. The problem with log files, a log file
looks like that: it’s ugly and you need the proper tools to understand what’s in
it. You can use software, like OnCrawl, and plenty of SaaS solution on the
market, there are plenty of open source initiatives so that you can do it for
free. So we even folk Cubana at OnCrawl so to tell all Cubana for SEO purpose so
for future downloaded it from our github so you don’t have any excuse not to look at
your log file. The main thing with the log file analysis is that you can know
what Google is doing on your web pages. How often the Googlebot
is coming to your website, which pages are first, stuff like that and the
coolest thing is that you also have all referring traffic data within your log
so that you can spot what are the zones of your website that are receiving the
most part of your traffic. I think that every webmaster should keep an eye on
it’s crawl budget. First thing, as Chris said, you can check it within your Search
Console. Within the explorations statistics tab but you need to explain this line and the only way to explain it is to look at your log files
because there can be many issues, many parameters that will affect
how Google behaves on your website. What you need to know is that by doing some
log analysis you you can check some weird behaviour.
Whis is the number of bots hits and the number of pages crawled on a particular
day and we find out that when Google is emulating JavaScript resources, they
don’t do it very often. They are doing it once in a while and
you can spot when Google is is crawling your website emulating JavaScript. The
reason why they don’t do it very often is that because it’s costing a lot
to give you an accurate view on that. We are crawling websites emulating JavaScript
as well. It costs something like 10 times standard URL coin so even Google has
limited resources and it’s very important to understand that because it’s very important to understand the crawl budget. Google tries to save its
resources while crawling the web and what you need to learn from this slide
is that when you’re you’re optimising your Google crawl budget you will
end up with better rankings. So you should think the whole budget as
basically the number of pages Google wants to fetch every day and you will
see that this number may not vary a lot, except if you are doing the right
optimisations so that’s what we we are going to see now. Just one point,
as an SEO you want to point Google to the right pages, call it money pages,
priority pages, whatever… all your pages are not born equal so you want
Google to see your top selling products, your brand pages, stuff like that. So let
me give you a quick example here. You can see this is a log monitoring. You can see here that I have mapped my website. This website is an e-commerce vendor selling tyres, one of the biggest player in France.
Here you can see the unique number of pages Google is fetching, category by
category. You can see they have a category called author.
This is garbage category pages with no value at all and it’s a tiny line on top
of each bar chart. When you looking at the number of pages matching the other
category, it’s just a hundred unique pages so very small amount of
pages but when you’re looking at how often Google is crawling those pages you
can see that the grey category, the other category,
Google is wasting like 70% 80% of its resources crawling those invaluable pages.
This means you have an internal linking structure problem. You may have other
problems as well and what you can see here is that they have made a new release of their website and they improve things
but there is a lot of work to do. To understand how Google behaves we went through tons of patterns from Google and what we’ve learned is that Google is
trying to crawl the best pages on the web so to do that they rely on several
parameters and we’re going to go into this but something you need to learn is
that crawl scheduling is a big thing because as Google is trying to save its
resources, scheduling is very important: to crawl first online media and not spam
porn website, there are many ways to do that. So the slides will be shared on Slideshare so you will have access to all these buttons if you’re interested.
Something very very important, we find out looking at those patterns is page
importance. Page importance is not the page rank. Page importance is score
computed by Google to help him schedule its crawl. So it’s a score based on
several parameters such as a page depth level, what you can cal authority, seeing trust flow from Majestic, what’s the internal popularity of a page, we have a score at OnCrawl what we call the in rank. What is a kind of documents you have, are
your pages included in sitemaps, do you have a variety. Most of all do you have no duplicates, what’s the size of your content, stuff
like that. To understand how Google may be affected by those parameters you need
to combine crawl data and log data to understand what Google is doing. So the
main question is what are the factors that are really influencing Google crawl budge?t And again all websites are not born equal so depending on the size
of your website, your topic, Google won’t behaves the same so if your classified
website or niche ecommerce player there are different parameters that will get
involved. So just a few examples here is page depth level compared to what Google
is doing so you can see the deeper you you are going into your architecture the
less Google wants to crawl your pages. Number of words, does it have an
impact? Of course as an SEO guy you are thinking, “OK,
let’s add some content, make it unique” but the big question is how many words
do I need to add to my product page to get it crawled. So this is a way to understand from which level you are assured to be to be crawled or not. The
quality of your HTML code, the quality of your hn optimisations. You can see here
that when on this website and again all website are not born equal so
depending on the size of your website, depending on the topic, these kind of
parameters may be may vary you can see that when you have an h1 that is unique
you get a lot of pages that are crawled but when it’s duplicated you are shooting
yourself in the feet. Same thing for the number of in links
pointing to your pages because depending on the number of links you get to a
particular page, you are driving more or less of everything, more or less popular,
internal popularity so again the question is, it seems obvious right? As
an SEO if I get links I will get crawled but again the big question
is how many links do I need to have? And you can also check the crawl
frequency and to think about how big is this parameter? What’s the influence
of that kind of parameter? You can check it for any ranking factor and something
you need to think about as well is orphan pages. An orphan page is a page that is
not linked from your internal linking structure but that Google knows.
There can be many many cases, there can be out of stock products because in your
online shop you have this rule that is saying okay if
is out of stock then delete the link to that particular product page. That can be
2016 collection that is not available anymore so you deleted all the links
but the pages are still responding to OK so Google is still fetching them
even if they they don’t get any links. There can be mistake while you revamped
your internal linking structure, your menus, your faceted navigation, stuff like
that. There can be often pages because they are included in your sitemap but
not in your internal linking structure so it can be a problem because they don’t
get any links, they don’t get any popularity so chances are they can’t
rank except if they get backlinks and you’re wasting Google budget on those
pages. So the main question is how to deal with my orphan pages… so first thing
first, you need to ask the question is it
normal? Os it a mistake? Or is it normal that I
don’t link to those orphan pages. If it’s normal, OK I don’t want them to be them
to be linked. So they receive organic traffic? If I don’t want them to be
linked is that I don’t want them to be on my website so I should use a redirect
if they are receiving organic traffic because you want to point the juice,
the traffic through other parts of your website. If they don’t, ask yourself is
the page valuable for your current business and if the answer is yes, ask the first question again. If it’s not normal that you
don’t point a link to those orphan pages then obviously add a link from your
structure and if the page here is not valuable for your current business
put a no index. The robots.txt, it’s very important because if you add no
index within your meta robot, Google still has to fetch the page to read the
meta robots so you’re not saving Google’s crawl budget. It will take
some time for Google to understand that he does not have to fetch that kind of
page. And if you can’t answer those questions, asked an agency. I know that
there are lots of people here in the UK. So you should ask an
expert to find a way to improve that kind of stuff. What you can expect from
optimising Google’s crawl budget is that you can do more
with less by pointing Google in the right direction to your priority pages.
Then you will end up having some more active pages because you remember
Google has barely the same number of page he wants to fetch every day
so by pointing him to the right zone of your website you are making
Google understand that these are the parts of your website that are valuable.
So you will end up yet more active pages, better indexation and of course you will
boost your organic traffic so feel free to put a no index, it’s a weapon
of mass destruction but if you’re following Chris’ advice I believe
you will do a good job but don’t be afraid to use
no index because it will correct a lot of problems very quickly. So with that in
mind you can grow your organic traffic with less pages. Thank
you all guys, thank you for having me here. If you want to stop by our booth
we’re booth 29 downstairs. You can try to win a one-year subscription to OnCrawl
log analyser and SEO crawler so feel free to ask questions as well, thank you.

Leave a Reply

Your email address will not be published. Required fields are marked *