Evolution of Search – Whiteboard Friday – Danny Sullivan
Articles,  Blog

Evolution of Search – Whiteboard Friday – Danny Sullivan

Hey Moz fans. Welcome to Whiteboard Friday.
I’m not Rand. I’m Danny Sullivan, the founding editor of SearchEngineLand.com and MarketingLand.com.
Because it’s 8,000 degrees here in Seattle, Rand has decided not to be around, and I am
here sweating like a pig, because I walked over here. So I’m very excited to be doing
a Whiteboard Friday. This is my first solo one, and I’m told I have to do it in 11 minutes,
and in 1.5 takes. No, just one take. The topic today will be the evolution of search, trademark
Google. No, they don’t own search. There was a time when they didn’t own search,
which brings us to Search 1.0. Did you know, kids, that search engines used to be multiple,
that we didn’t talk about Googling things? We actually used things like Alta Vista, Lycos,
and WebCrawler. Do you remember those names? There were things like OpenText, and what
was that other one, Magellan. Well, these were search engines that existed before Google,
and they went out onto the web and they crawled up all the pages, about a dozen pages that
existed at the time, and then we would do our searches and try to find how to rank them
all up. That was all determined by just the words
that were out on the page. So if you wanted to rank well for, I don’t know, something
like movies, you would put movies on your page 100 times in a row. Then if somebody
else wanted to outrank you, they’d put movies on their page 150 times in a row, because
a search engine said, “Hey, we think relevancy is all about the number of words of the page,
and a little bit about the location of those words.” The words at the top of the page would
count for a little bit more than if they were further on down below. Bottom line is this was pretty easy to spam.
The search engines didn’t really want you to be doing better for movies because you
said the word “movies” 150 times over somebody who said it 100 times. They needed to come
up with a better signal. That signal, they took their time getting around to. Long story short, they weren’t making a lot
of money off of search so they really didn’t pay attention to it. But Google, they were
sitting over there thinking, “You know what? If we create a search engine, someday someone
might make a movie with Owen Wilson and Vince Vaughn. So let’s go out there and come up
with a better system,” and that brought us into Search 2.0. We are now here. Search 2.0 started looking
at things that we refer to as off-the-page ranking factors, because all of the on-the-page
stuff was in the complete control of the publisher. The publisher could change it all around.
There was even a time, when you used Infoseek, where you could submit a web page, and it
was instantly added to the index, and you could see how well you ranked. If you didn’t
like it, you’d instantly make a change and put it back out again. Then you could move
up that way. So off-the-page kind of said, “Let’s go out there and get some recommendations
from beyond the publisher and decide what other people think about these web pages,
because maybe that’s less spammable and would give us better quality search results.” By the way, I said not Yahoo over here, because
I’m talking about search engines in terms of crawler-based search engines, the ones
that use automation to go out there and find web pages. Yahoo for the longest time – well
it feels that way to me – was a directory, or a human-based search engine where they
listed stuff because some human being actually went to a website, wrote up a review, and
added it. Now back to Search 2.0, Google came along
and started making much more use of something called link analysis. So the other search
engines kind of played with it, but hadn’t really gotten the formula right and didn’t
really depend on it so much. But Google, new kid on the block, said, “We’re
going to do this a lot. We’re going to consider links to be like votes, and people with a
lot of links pointing at them, maybe they got a lot of votes, and we should count them
a little bit higher and rank them better.” It wasn’t just in sheer amount of numbers,
however. Google also then wanted to know who has the best votes, who is the real authority
out there. So they tried to look at the quality of those links as well. You’ve got other people who were doing some
off-the-page stuff. One of them, you might recall, was by the name of Direct Hit. They
actually looked at things like click through. They would look and they’d say, “Well, we’ve
looked at 10 search results, and we can see that people are clicking on the third search
result completely out of proportion to the normal way that we would expect. Rather than
it getting say 20% of the clicks, it’s pulling 80% of the clicks.” That might tell them that
they should move it up to number one, and then they could move things that were down
a bit further. These are some of the things that we started
doing, but it was really links that carried us along for about a decade. Now links, off-the-page
stuff, that’s been powering and still to this day kind of powers the web search results
and how they start ranking better, but we have a little bit of an intermission, which
we would call or I call Search 3.0. By the way, I made all this stuff up, so you can
disagree with it or you can figure out however you want to kind of go with it. But a few
years ago I was trying to explain how I had seen the evolution of search and some of these
changes that were coming along. What happened in this Search 3.0 era is that,
even though we were using these links and we were getting better quality results, it
was also so much information that was coming in that the signals alone weren’t enough.
You needed another way to get more relevancy, and the way the search engines started doing
that was saying, “Let’s take, instead of having you search through 100 billion pages, let
you search through a smaller collection of pages of just focused content.” That’s called
vertical search. Now in horizontal search, you’d do a search
for things like news, sports, entertainment, shopping, and you just throw it all into one
big search box. It goes out there, and it tries to come back with all the pages from
across the web that it thinks is relevant to whatever you searched for. In vertical
search, it’s like a vertical slice, and that vertical slice of the web is just only the
news content. Then when you do a search for something like NSA, it’s only going to look
through the news content to find the answers about news that is relating to the NSA at
the moment. Not trying to go over there and see if maybe there is some sports information
or shopping information that may match up with that as well. That’s important right now, by the way. You
have all this talk about something like PRISM that is happening. It’s a spy program or an
eavesdropping program or a data mining program, depending on who you want to talk to, that
the US government is running. Prism is also something that you use just to filter light,
and so if you are doing a search and you are just trying to get information about filtering
light, you probably don’t want to turn to a news search engine because right now the
news stuff is full of the PRISM stuff. On the other hand, if you want the latest stuff
that is happening just within this whole Prism area, then turning to the news search engine
is important, because you won’t get all of the other stuff that is not necessarily related. So we have this Search 3.0 thing, vertical
search, and Google, in particular, referred to it as universal search. Trying to solve
that problem that, if someone types into a box “pictures of flowers,” they should actually
show you pictures of flowers, rather than 10 links that lead you to maybe pictures of
flowers. Now we’re pretty solid on this right now. Bing does these sorts of things as well.
They have their own blending that goes on there. Then it’s Search 4.0. Now we are here, or
right here just because I feel compelled to write something on that board. Search 4.0
is kind of a return to what Yahoo over here was using, which was human beings. By the
way, I don’t write very much anymore because the typing thing. To refer to using human beings, one of the
biggest things that has happened with search engines is that they, in a very short period,
completely changed how we sought out information. For thousands of years, if you needed to know
something, you talked to a human being. Even when we had libraries and people had all that
kind of data, typically you would go into a library and you would talk to a librarian
and say, “Hey, I’m trying to find some information about such and such.” Or you would need a
plumber, you would ask somebody, “Hey, you know a good plumber?” Babysitter, doctor,
or is this a good product? Does anybody know this TV? Does this work well? Should I buy
that? You would tend to turn to human beings or things that were written by human beings. Then all of a sudden we had these search engines
come along, and they just took all these pages out there, and they really weren’t using a
huge amount of human data. Yeah, the links were put in there by human beings. Yeah, some
human being had to write the content as well, but we kind of lost another aspect of the
human element that was out there, the recommendations that were out there en masse. That is kind of what has been going on with
Search 4.0. The first thing that is going on with Search 4.0 is that they started looking
at the things that we had searched for over time. If they can tell that you constantly
go back to say a computing site, like Diverge or CNET, then they might say, “Well, the next
time you search for something, let me give the weight of those sites a little bit higher
bump, because you really seem to like the stuff that’s there. So let’s kind of reward
them in that regard.” Or “I can see that you’re searching for travel right now, and I can
see that you just searched for New York. Rather than me pretend that these things are disconnected,
let me put them together on your subsequent searches because you are probably looking
for information about New York travel, even though you didn’t put in all those words.
So I’ll take use of your history that’s going there.” The other thing that they have been doing,
and some of this mixes across in the earlier times, but they are looking at your location.
You do a search for football in the UK, you really don’t want to get information about
the NFL for the most part. You want information about what Americans would call soccer. So
looking and knowing that you’re in the UK when you do a search for football, it helps
the search engine say, “We should go through and we should just come up with information
that is relevant to the UK, or relevant to the US, based on where you’re at.” That greatly
changed though, and these days it goes down even to your metropolitan area. You do a search
for zoos, you’re in Seattle, you’re going to get information about zoos that are in
Seattle rather than the Washington Zoo, or zoos that are in Detroit or so on. The last thing, the really, really exciting
thing is the use of social, which the search engines are still trying to get their head
around. I talked earlier about the idea of links as being like votes, and I always like
to use this analogy that, if links are like votes and links are somehow the democracy
of the web, which is how Google still will describe them on some of their pages, then
the democracy of the web is how the democracy in the United States started when to vote,
you had to be 25 years and older, white, and own property. That wasn’t really representative
of everybody that was out there. In order for you to vote in this kind of system,
you really have to say, “Wow, that was a great restaurant I went to. I want to go through
now and I want to write a blog post about that restaurant, and I’m going to link to
the restaurant, and I’m going to make sure that when I link to it, I’m going to use a
platform that doesn’t automatically put things like no follow on top of the link so that
the link doesn’t pass credit. Oh, and because it’s a great restaurant, I’m going to remember
to make sure that the anchor text, or the words near the anchor text, say things like
great restaurant because I need to make sure that the link is relevant and passing along
that kind of context. Now when I’ve done all that, I’ve cast my vote.” Probably the 99 other people that went to
the restaurant are not going to do that. But what those people are likely to do is like
it on Facebook, plus it on Google+, make a recommendation on Yelp, use any one of the
number of social systems that effectively enable people to vote much more easily. So
I think a lot of the future where we are going to be going is in this social direction. These
social signals are very, very important in the future as to how the search engines will
determine what are the best pages that are out there. Unfortunately, they’ve put so much into this
whole link system and figuring out that this is a good link, this is a bad link, this is
a link that we are going to disavow, this is a link that you disavowed, and so on and
so on and so on, that they still need to work on making all this social stuff better. That’s
going to become important as well. Not saying the links are going to go away, but I think
the social stuff is going to be coming up much more heavily as we go forward into the
future. Now on the way up here I was thinking, because
I was asked, “Will you talk about the evolution of search?” I’m like, “Yeah, no problem because
I’ve done this whole Search 1 through 4 thing before.” There’s a whole blog post if you
search for Search 4.0. Search for Search 4.0 and you’ll find it. I was thinking, “What is coming after that?”
On the way up, as I was sweating coming up the staircase, not the staircase here. There’s
a staircase, because I was at sea level and I had to apparently climb up to 300 feet here,
where we are located in the Moz building. If there was a swear jar, I would put a dollar
into it. Search 5.0, and this is really about search
where it’s no page at all. Remember on-the-page factors, off-the-page factors, which are really
off this page but on some other page, this stuff is I don’t even care that it’s a page.
I did a blog post, and I can’t remember the title of it. But if you search for “Google
conversational search,” you’ll find it. If you don’t find it, clearly Google is a very
bad search engine. In the conversational search thing that I
was demonstrating, if you have Chrome and you click on the microphone, you can talk
to Google now on your desktop, kind of like how you can do it on the phone. You can say,
“Barack Obama,” and Google will come along and it will show you results for Barack Obama,
and it will talk back to you and say, “Barack Obama is President of the United States,”
blah blah blah blah. It gives you a little box for him, and he appears and there is a
little description they pull from Wikipedia. Then you can say to it, “How old is he,” or
something very similar to that. Then the search engine will come back, Google will come back
and will say, “Barack Obama is . . .” I can’t remember how old he is. But you should Google
it and use that voice search thing. It will come back and say Barack Obama is this age.
You can go further and say, ‘Well, how tall is he?” It will say, “Barack Obama is . . .” I
think he is 6 foot 1. And you say, “Who is he married to?” Then it comes back and it
says, “Barack Obama is married to Michelle Obama.” And you say, “How old is she?” Then
Google will come back and say, “It’s really an impolite thing to ask a woman, but she’s
a certain age.” I believe 39. Yeah, you’re usually safe with that. To do all of that it has to understand that
Barack Obama, when you searched for him, wasn’t just these letters on a web page. It had to
understand that he is a person, that he is an entity, if you will, a person, place, or
thing, a noun, but an entity, that there is a thing out there called Barack Obama that
it can link up to and know about. When you ask for its age, and you said, “How old is
he,” it had to understand that “he” wasn’t just words, but that actually “he” refers
to an entity that you had specified before, the entity being Barack Obama. When you said,
“his age,” that age wasn’t just a bunch of letters that match on a web page, but age
is equal to a value that it knows of because Barack Obama has an age value over here, and
it’s connecting it there. When you said, “How tall is he,” same thing.
That tall wasn’t just letters, but tall is actually a height element that it knows. That
says height, trust me. When you said, “Who’s his wife,” that wife, with an f kids, not
a v, later we’ll do potatoes without an e, that his wife is a person that is equal to
spouse, which is a thing that it understands, an entity. It’s not just words again. It’s
like a thing that it actually understands, and that actually that that is Michelle and
that she has all of these things about her, and [inaudible 15:38]. All those sorts of
things along there. That is much different than Search 1.0 where,
when we were searching, we were really just looking for letters on a page. When you typed
in “movies,” its going, “How many pages out there do I have that have these six letters
in this order? Start counting them up and putting it together.” We are looking for entities, and that the
Google knowledge graph is that kind of demonstration of where things are going to be going forward.
That’s all very exciting as well, because, for one thing as a marketer, it’s always exciting
when your space changes because if you’re staying on top of things and you’re seeing
where it’s going, there are always new opportunities that come along. It’s also exciting because
some of these things are broken and they don’t work as well, so this has the opportunity
to better reward things that are coming along. It’s a little scary though because as Google
learns about entities and it learns about things like facts, it also decides that, “You
know what, you’re looking for movies in a place. I have a database of all those movies.
I no longer need to point at a web page that has that sort of stuff.” The big takeaway
from that is, if your job is just creating web pages that are all about known facts that
are out there, it’s going to get harder, because people are no longer going to get pointed
to you facts that are off of Google. People are going to get pointed to facts that Google
can answer directly. Your job is to make sure that you always have the information that
Google doesn’t have, the facts that aren’t easily found that are out there. As for Search 6.0, it involved this PRISM
system, but we can’t talk about that anymore, so that’s sort of gone away, and we’ll leave
that off. In a few years from, it won’t make any sense. Right now, hopefully, it’s still
very timely. I think that’s probably it. So I thank you
for your indulgence with my first solo Whiteboard Friday. I hope didn’t go too fast. I hope
that all makes sense, and thank you very much.

One Comment

  • Oremo Ochillo

    Wow Danny how did they ever pull you away from search engine land. Better yet how did they trick you into leaving California to come to Seattle

Leave a Reply

Your email address will not be published. Required fields are marked *