Log File Analysis for Better SEO
Articles,  Blog

Log File Analysis for Better SEO


Hi, guys. Thank you for coming here today.
My name is Omi Sido. And as most of you know I am very active online and I love talking
about Digital Marketing and SEO. Today I’m gonna be talking about Server logs
analysis and how understanding what the Googlebot does and what the Googlebot sees on your website
can massively boost your digital marketing efforts. I’m gonna start with a short story.
A few months ago – this is a real story by the way – a few months ago a guy came to me
and asked me to teach him the basics of SEO so he can collaborate with his SEO team. Of
course, I said yes. So I gave him some books and some blogs to read and we agreed to meet
again in two weeks. Hi, David. So in two weeks, in two weeks we went to Costa and of course,
I asked the guy him about his SEO knowledge. His answer: Content is king and currently
he is reading a book about content creation and semantic search. While he was talking
I wrote a short poem, an SEO poem on a piece of paper and after he stopped talking paused
I simply asked him: Can go and upload this poem to the Internet, on the Internet. He
was like confused obviously: Omi what do you mean? And I was like: Well, you’ve just told
me that content is king. So upload this poem on the Internet and make me famous. I wanna
be the King of the Internet. Of course again confused And he said something like: But Omi
I need a website. End of story. A lot of people come to me. A lot of companies
and tell me: Omi, our website is beautiful. Our pictures are amazing. We publish content
on a regular basis but somehow we are not ranking well. Why?
I said it before and I would say it again: Yes, content is King. But every King needs
a castle, a home to live in. Technical SEO is this castle and by analysing your server
logs you basically know whether you’ve got a solid structure or not.
If every time when the Googlebot comes to your website it can’t understand the structure
of your website if every time when the Googlebot comes to your website and ignores strategic
sections of your website your content counts for nothing. Shall I repeat this one? Your
content counts for nothing. Sorry, I can see a lot of fo people tagging
me. The only way for the Googlebot to understand
what’s on your website is to literary come and crawl all your pages. Now some of you
may say ‘But what is a log file?’ This is a log file. I know it’s a bit confusing but
in all honesty, you don’t have to be very technical unless you wanna be an SEO Geek
to benefit from the data, from the information, from the wisdom coming from the server logs.
From the server logs. So to make all these very simple I will give
you another story. Six months ago a company came to me and asked
me to analyse their website and give them some recommendations for improving their rankings.
Obviously, I started crawling their website and the first thing I saw was half a million
pages but only roughly 20 000 of them active pages. You all know this website. I’m not
gonna mention the name because of the NDA but you’ve been there at least once for the
last one month. Half a million pages but only twenty thousand
of them active pages. By active pages, you know what I mean pages that get organic traffic.
So I’ve started analysing their website, I’ve started crawling it and of course analysing
the server logs so this is what I see. OnCrawl. 154 orphan pages. Yes, 154 000 orphan pages.
Some websites don’t even have 154 000 pages. Out of 154 000 pages only 3 000 of them active
pages. Now, what are orphan pages? Omi, what’s an active page? How do you qualify
that? Getting organic visits. Three thousand remember
this number. So what are orphan pages? Orphan pages are
pages that not linked from anywhere in your website structure. This is the definition
you see online. What is my definition for orphan pages? Omi
Sido’s definition for orphan pages. Stop hurting your SEO. Please.
How do we find Orphan pages? The only way to find Orphan pages is to crawl, fully crawl
a website. Take all the log file data, combine it together and analyse it. In this case out
of 154 000 pages only 3 000 active pages. I had no choice but to literary delete all
inactive pages. And I know it sounds a little bit harsh.
Then I continued analysing this website and I find that the bot is spending a lot of time,
literary stuck in a section full of non-complaint pages instead of crawling the sections with
complaint pages. Remember my what I said earlier about the strategic crawling of your website.
I had no choice but to delete another big chunk of this website. And then what I call
duplicate URL crawling. By analysing this website I realised that the bot is spending
a lot of its resources crawling pages with parameters even although they were properly
canonicalized. We had to literally reshuffle the whole navigation and stuff like that.
And now have a look at this picture. Yeah, this graph. Pages crawled and not crawled
by depth against SEO visits distribution by depth. As I told you earlier you don’t have
to be very technical to understand the importance of analysing your log files. Have a look at
this section of the graph. Only 49% are crawled. Yet, this section gives the most SEO organic
visits. By the way, I don’t like calling them organic. For me, they are just SEO visits.
But anyway. Very strange for this website page depth five is giving more organic visits
than those two. So I had a lot of conversations basically the idea was what’s gonna happen
if I literary delete this group. What’s gonna happen if I force the bot to crawl this section
more often and index more pages? What’s gonna happen if I actually combine those two without
deleting this one or I combine those two? Notice this is only 19%. I hope you can see
it from far. Nineteen percent. You have to really think how you wanna spend your Crawl
Budget. We deleted, in all honesty, we’ve deleted more than 60% of this website went
to the bin. Just for clarity. These 60% any of these is unique content stuff like that.
You are not suggesting No, I am not suggesting that. Normally Orphan pages that are not visited
by anybody even the bots are. So ok. So to explain. Let’s go a little bit to Oprpahn
pages. Normally those are development mistakes, expired product pages. Do we agree on this
one? Ok, thanks. So we’ve deleted. It’s ok. No, no of course yeah. The point I am trying
to make and thank you very much because many people don’t actually know what Orphan pages
are. You are absolutely right. We’ve deleted more than 60%. More than 60% of this website
went to the bin. Yet, six months down the line this client sells more products than
ever. I didn’t say visits I said money. They sell more products than ever. Now that the
bot is allowed to crawl the good pages more often resulting in more pages present in the
SERPs and-and in a better position. Some of you may say ‘Omi, this is a big website. They
had a literary room for deleting pages’ and in fact, I have a lot of clients coming to
me telling me ‘Omi, I don’t care about analysing log files because I only have 10-20 thousand
pages’. So let me give you a quick example of a relatively small website. This website
was about to be migrated 6-7 months ago and I was asked to analyse it. 22 000 pages in
the structure. 8 000 orphan pages. Eight thousand. After finding this one they nearly fired the
whole Digital Marketing team. Thank God they didn’t. I’ve got more followers on LInkedIn.
23% are only bringing 3% of organic visits. Literary. Three percent of organic visits.
Is it worth keeping those pages? On the other side in the previous example – sorry I can’t
find it now – in the previous example the Orphan pages were actually bringing 37% of
organic visits. Why are you not linking to them? Internally linking to them first so
your customers can find them when they come on your website and second you can improve
their SEO value so they bring even more visits in the future. Guys, I hope I gave you, I’ve
given you a good idea of how to. I have a question. Of course. Have you done any analysis
of incoming backlinks to those pages that you? Yes, you have to do that. You have to
do that. How to do that all feed into the whole process? With the example, I gave you
there were no literary no backlinks. But you have to do that, you know. Guys, I hope I’ve
given you enough information but by all means, ask questions.Only 3000 of them. What do you
mean by that? Yeah, they are organic visits. Literary coming
from Google. They could have. I doubt it. I doubt it. With 37% of organic visits. I
doubt they were all have been bookmarked. So 3000 pages bringing out of 145 Orphan out
of 500 altogether. 3000 pages orphan bringing 37% of organic visits. So that’s why the question
why are you not linking to them. Obviously, they are important.
No, no nothing to do with organic visits. It’s all about the way you link them in your
website structure. They are just not linked from anywhere. And they are not necessarily
bad pages. You know. I can give you a good example. If you type Kentucky fried chicken
takeaway the first page is actually an Orphan page.
Those are pages from your website that you haven’t linked to. Yes, they are not linked
from anywhere. Yes, you have t crawl your website and you
have to analyse, combine the data. There is no other way to find them, unfortunately.
Unless there is a way. I was actually hoping that somebody will say something. Yes, yes
go on. Not you. Yes, you can do that. In this example 140, quoter of a million you know
it’s a bit difficult to go through all of them you know. So the only thing I really
wanted to know is how many people are actually hitting those pages. So if nobody is hitting
them for a year. You have to have something like a benchmark for your business. If nobody
is coming to those pages for a very long time you really have to start questioning yourself
why do I have those pages in my website structure? But I am not talking about any pages, I am
only talking about Orphan pages really. Let’s make it clear. because if you just start deleting
pages from your website for a not particular reason you gonna end up in a very bad place.
He will be at the bar.

2 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *