Articles

What hardware and software powers Googlebot?


Today’s question comes
from Cardiff, UK. Tristan Perry asks
a fun question. “Hi Matt. Could you give any insight
into the sort of hardware and/or server-side software
which power a typical Googlebot– web crawler– server?” What a fun question! So one of the secrets of Google
is that rather than employing these mainframe
machines, this heavy iron, big iron kind of stuff, if you were
to go into a Google data center and look at an example
rack, it would look a lot like a PC. So there’s commodity PC parts. It’s the sort of thing where
you’d recognize a lot of the stuff from having opened
up your own computer. And what’s interesting is rather
than have like special Googlebot web crawling servers,
we tend to say, OK, build a whole bunch of different
servers that can be used interchangeably for things
like Googlebot, or web serving, or indexing. And then we have this fleet,
this armada of machines, and you can deploy it on different
types of tasks and different types of processing. So hardware wise, they’re not
exactly the same, but they look a lot like regular
commodity PCs. And there’s no difference
between Googlebot servers versus regular servers
at Google. You might have differences in
RAM or hard disk, but in general, it’s the same
sorts of stuff. Now as far as server-side
software, there’s a little bit of a joke at Google that
says we don’t just build the cars ourselves. And we don’t just build
the tires ourselves. We actually vulcanize the rubber
on the tires ourselves. So we tend to look at
everything all the way down to the metal. I mean, if you think about
it, there’s data center efficiency. There’s power efficiency
on the motherboards. And so if you can sort of keep
an eye on everything all the way down, you can make your
stuff a lot more efficient, a lot more powerful. You’re not wasting things
because you use some outside vendor and it’s black box. So Google tends to use a lot
of Linux-based machines, Linux-based servers. We’ve got a lot of Linux
kernel hackers. And we tend to have software
that we’ve built pretty much from the ground up
to do all the different specialized tasks. So even to the point
of our web servers. We don’t use Apache. We don’t use IIS. We use something called GWS,
which stands for the Google Web Server. So by having our own binaries
that we’ve built from our own stuff and building that stack
all the way up, it really unlocks a lot of efficiency. It makes sure that there’s
nothing that you can’t go in and tweak to get performance
gains or to fix if you find bugs. So that’s just a little bit
of a view of hardware and software side as far as what
goes behind Googlebot and crawling the web.

Leave a Reply

Your email address will not be published. Required fields are marked *