How Search Engines Really Work

Arthur C. Clarke once wrote that “any sufficiently advanced technology is indistinguishable from magic” an insight that sheds a great deal of light on why our historical predecessors, without access to much of the knowledge we take for granted today, believed some of what they did. But it also applies to contemporary technologies, some of which we depend upon greatly yet understand only in part (or perhaps not at all).

The evolution of the meaning and use of the word “Google”—from proper noun to verb—corresponds with the increasing disconnect between web users and search technology. Ten years ago, searching for content on the web was a difficult process, but today one has only to enter a few words into Google’s search bar, and Presto! (magical incantation intended) instant and accurate results. As much as this might seem like magic, it’s a thoroughly mundane—albeit ingenious—technology at work. But if search engine technology is indistinguishable from magic, the process of optimizing web content for search engines will seem just as mysterious. Unfortunately, it’s difficult to trust what we don’t understand, and mistrust breeds the very kind of problems that are rampant in the search engine optimization industry: myths, abuses, and profit for those that would rather be seen as magicians than marketers.

Fortunately, we know enough about how search engines work to optimize our content with words, not wands. While there is some value in examining the myths and abuses of SEO, I think it makes sense to first explore how it works.

How Google Works

Ultimately, Google’s purpose is to index and rank web content in order to help searchers find what they are looking for. While this is done, in part, by organizing pages on the basis of authority, the goal of Google’s increasingly sophisticated algorithm is to understand the particular queries users submit—which are more likely to be specific than general, like “synthetic insulation shell” rather than “coat”—in order to direct them to the best source for the information they need. I like the way Alexis Madrigal put it in a recent Atlantic Monthly article. While she was writing primarily about online matchmaking, I think she gets right at the heart of what Google is all about without being too technical:

“If only you could Google your way to The One. The search engine, in its own profane way, is a kadosh generator. Its primary goal is to find the perfect Web page for you out of all the Web pages in the world, to elevate it to No. 1.”

So how does Google know which pages are the most authoritative? Actually, Google outsources some of this work to us. Google’s PageRank algorithm (named for cofounder Larry Page) took an entirely new approach in ranking pages purely on the basis of incoming links, rather than calculating the frequency of keywords within a page’s content in order to discern which web pages were authoritative on any given subject. What this means is that the more important a website is—the more incoming links it has—the more influential its outgoing links will be. So a link from the New York Times website, which has a PageRank of 9/10, will have a greater influence over the PageRank of the site being linked to than one from a local news source, like wral.com, which has a PageRank of 7.

PageRank ranks web pages based upon the number and influence of incoming links.

Authority or Influence?

But PageRank is only one piece of the authority puzzle. Because it is primarily concerned with scoring a website based upon the volume of its incoming links, PageRank isn’t as much an indicator of authority over a particular subject as it is authority in general, so let’s call that “influence” instead. And this differentiation is really for the best. After all, even though the New York Times is a nationally trusted news source, you probably wouldn’t expect them to be a better source for information on SEO than, say, this website, even though Newfangled.com’s PageRank is 6. (Go ahead and search for “how to do SEO.” There we are, the 5th result on the first page, but the New York Times is nowhere to be seen.) By balancing PageRank with its constantly changing index of the web’s content, Google can provide search results that are representative of the most influential and authoritative sources even as those sources shift in either aspect. So, a site with a lower PageRank, or less overall influence on the web, could have a much greater authority over a particular subject. This insight is what Chris Anderson and Clay Shirky had in mind when they popularized the idea of the long tail.

It is also this differentiation that makes search engine optimization possible. Being in control of “on page” factors—those that frame a page’s content using metadata, heading specifications, friendly links, etc.—enables you to compete in the marketplace of authority. So, in my next post, I’ll cover just that. Stay tuned…

2 thoughts on “How Search Engines Really Work

  1. Chris Butler

    Tio! 
    You raise a great question. Other than the crowdsourcing approach to guarding truth online–which shouldn’t be taken lightly as it seems to work well for Wikipedia–I’m not sure of any larger services that exist to do just that. Factcheck.org is probably a needed service as no news outlet is going to be without some kind of bias. The only other thing I can think of that attempts to aggregate opinion is metacritic.com, which gathers the various critical entries for books/films/music/etc. so you can get an overview of how they are received all over.
    So, who watches the watchers? GREAT question. I wish I had the answer!
    Chris

  2. Tio

    Hello Chris,
    Thank you for that explanation.  It seems to me your distinction between influence and authority is critically important.  My interest may be no more thant tangential (if that) to many of your readers.  While influence would perhaps be more important to a service provider, accuracy might be more important to a consumer like me.  Although the NYT certainly has a great reputation, there are some who think that McClatchy and Knight-Ridder did a better job of sifting fact from fiction during the run-up to and conduct of much of the Iraq war.  Others think that while that may be true, NYT at least tried, while other news services (perhaps Fox?) simply accepted disinformation, ahem, information disseminated by the then federal administration without concern for veracity.
    Can assurance of authority be had by an end user in preference to assurance of influence?  I sometimes refer to factcheck.org re accuracy of particular stories in the news.  Is there any service for checking the statistical reliability of websites for facts asserted on those websites?  And if so . . . who watches the watchers?

COMMENT