Being a Good Search Engine Citizen

screenshot moderate.gifscreenshot tip97.gif

Five don'ts and one do for getting your site indexed by Google.

A high ranking in Google can mean a great deal of traffic. Because of that, there are lots of people spending lots of time trying to figure out the infallible way to get a high ranking from Google. Add this. Remove that. Get a link from this. Don't post a link to that.


Submitting your site to Google to be indexed is simple enough. Google's got a site submission form (https://www.google.com/addurl.html), though they say if your site has at least a few inbound links (other sites that link to you), they should find you that way. In fact, Google encourages URL submitters to get listed on The Open Directory Project (DMOZ, http://www.dmoz.org/) or Yahoo! (http://www.yahoo.com/).

Nobody knows the holy grail secret of high page rank without effort. Google uses a variety of elements, including page popularity, to determine page rank. Page rank is one of the factors determining how high up a page appears in search results. But there are several things you should not be doing combined with one big thing you absolutely should.

Does breaking one of these rules mean that you're automatically going to be thrown out of Google's index? No; there are over 2 billion pages in Google's index at this writing, and it's unlikely that they'll find out about your rule-breaking immediately. But there's a good chance they'll find out eventually. Is it worth it having your site removed from the most popular search engine on the Internet?

Thou shalt not:

  • Cloak. "Cloaking" is when your web site is set up such that search engine spiders get different pages from those human surfers get. How does the web site know which are the spiders and which are the humans? By identifying the spider's User Agent or IP - the latter being the more reliable method. An IP (Internet Protocol) address is the computer address from which a spider comes from. Everything that connects to the Internet has an IP address. Sometimes the IP address is always the same, as with web sites. Sometimes the IP address changes - that's called a dynamic address. (If you use a dial-up modem, chances are good that every time you log on to the Internet your IP address is different. That's a dynamic IP address.) A "User Agent" is a way a program that surfs the Web identifies itself. Internet browsers like Mozilla use User Agents, as do search engine spiders. There are literally dozens of different kinds of User Agents; see the Web Robots Database (http://www.robotstxt.org/wc/active.html) for an extensive list. Advocates of cloaking claim that cloaking is useful to absolutely optimize content for spiders. Anticloaking critics claim that cloaking is an easy way to misrepresent site content - feeding a spider a page that's designed to get the site hits for pudding cups when actually it's all about baseball bats. You can get more details about cloaking and different perspectives on it at http://pandecta.com/, http://www.apromotionguide.com/cloaking.html, and http://www.webopedia.com/TERM/C/cloaking.html.
  • Hide text. Text is hidden by putting words or links in a web page that are the same color as the page's background - putting white words on a white background, for example. This is also called "fontmatching." Why would you do this? Because a search engine spider could read the words you've hidden on the page while a human visitor couldn't. Again, doing this and getting caught could get you banned from Google's index, so don't. That goes for other page content tricks too, like title stacking (putting multiple copies of a title tag on one page), putting keywords in comment tags, keyword stuffing (putting multiple copies of keywords in very small font on page), putting keywords not relevant to your site in your META tags, and so on. Google doesn't provide an exhaustive list of these types of tricks on their site, but any attempt to circumvent or fool their ranking system is likely to be frowned upon. Their attitude is more like: "You can do anything you want to with your pages, and we can do anything we want to with our index - like exclude your pages."

  • Use doorway pages. Sometimes doorway pages are called "gateway pages." These are pages that are aimed very specifically at one topic, which don't have a lot of their own original content, and which lead to the main page of a site (thus the name doorway pages). For example, say you have a page devoted to cooking. You create doorway pages for several genres of cooking - French cooking, Chinese cooking, vegetarian cooking, etc. The pages contain terms and META tags relevant to each genre, but most of the text is a copy of all the other doorway pages, and all it does is point to your main site. This is illegal in Google and annoying to the Google-user; don't do it. You can learn more about doorway pages at http://searchenginewatch.com/webmasters/bridge.html
    or http://www.searchengineguide.com/whalen/2002/0530_jw1.html.
  • Check your link rank with automated queries. Using automated queries (except for the sanctioned Google API) is against Google's Terms of Service anyway. Using an automated query to check your PageRank every 12 seconds is triple bad; it's not what the search engine was built for and Google probably considers it a waste of their time and resources.
  • Link to "bad neighborhoods". Bad neighborhoods are those sites that exist only to propagate links. Because link popularity is one aspect of how Google determines PageRank, some sites have set up "link farms" - sites that exist only for the purpose of building site popularity with bunches of links. The links are not topical, like a specialty subject index, and they're not well-reviewed, like Yahoo!; they're just a pile of links. Another example of a "bad neighborhood" is a general FFA page. FFA stands for "free for all"; it's a page where anyone can add their link. Linking to pages like that is grounds for a penalty from Google. Now, what happens if a page like that links to you? Will Google penalize you page? No. Google accepts that you have no control over who links to your site.

Thou shalt:

  • Create great content. All the HTML contortions in the world will do you little good if you've got lousy, old, or limited content. If you create great content and promote it without playing search engine games, you'll get noticed and you'll get links. Remember Sturgeon's Law ("Ninety percent of everything is crud.") Why not make your web site an exception?

What Happens if You Reform?

Maybe you've got a site that's not exactly the work of a good search engine citizen. Maybe you've got 500 doorway pages, 10 title tags per page, and enough hidden text to make an O'Reilly Pocket Guide. But maybe now you want to reform. You want to have a clean lovely site and leave the doorway pages to Better Homes and Gardens. Are you doomed? Will Google ban your site for the rest of its life?

No. The first thing you need to do is clean up your site - remove all traces of rule breaking. Next, send a note about your site changes and the URL to help@google.com. Note that Google really doesn't have the resources to answer every email about why they did or didn't index a site - otherwise, they'd be answering emails all day - and there's no guarantee that they will reindex your kinder, gentler site. But they will look at your message.

What Happens if You Spot Google Abusers in the Index?

What if some other site that you come across in your Google searching is abusing Google's spider and pagerank mechanism? You have two options. You can send an email to spamreport@google.com
or fill out the form at https://www.google.com/contact/spamreport.html. (I'd fill out the form; it reports the abuse in a standard format that Google's used to seeing.)