Searching Article Archives

screenshot beginner.gifscreenshot tip20.gif

Google serves as a handy searchable archive for back issues of online publications.
link

Not all sites have their own search engines, and even the ones that do are sometimes difficult to use. Complicated or incomplete search engines are more pain than gain when attempting to search through archives of published articles. If you follow a couple of rules, Google is handy for finding back issues of published resources.

The trick is to use a common phrase to find the information you're looking for. Let's use the New York Times as an example.

Articles from the NYT

Your first intuition when searching for previously published articles from NYTimes.com might be to simply use site:nytimes.com in your Google query. For example, if I wanted to find articles on George Bush, why not use:

"george bush" site:nytimes.com

This will indeed find you all articles mentioning George Bush published on NYTimes.com. What it won't find is all the articles produced by the New York Times but republished elsewhere.

While doing research, keep credibility firmly in mind. If you're doing casual research, maybe you don't need to double-check a story to make sure it actually comes from the New York Times, but if you're researching a term paper, double-check the veracity of every article you find that isn't actually on the New York Times site.

What you actually want is a clear identifier, no matter the site of origin, that an article comes from the New York Times. Copyright disclaimers are perfect for the job. A New York Times copyright notice typically reads:

Copyright 2001 The New York Times Company

Of course, this would only find articles from 2001. A simple workaround is to replace the year with a Google full-word wildcard [Tip #13]:

Copyright * The New York Times Company

Let's try that George Bush search again, this time using the snippet of copyright disclaimer instead of the site: restriction:

"Copyright * The New York Times Company" "George Bush"

At this writing, you get over three times as many results for this search as for the earlier attempt.

Magazine Articles

Copyright disclaimers are also useful for finding magazine articles. For example, Scientific American's typical copyright disclaimer looks like this:

Scientific American, Inc. All rights reserved.

(The date appears before the disclaimer, so I just dropped it to avoid having to bother with wildcards.)

Using that disclaimer as a quote-delimited phrase along with a search word - hologram, for example - yields the Google query:

hologram "Scientific American, Inc. All rights reserved."

At this writing, you'll get one result, which seems like a small number for a general query like hologram. When you get fewer results than you'd expect, fall back on using the site: syntax to go back to the originating site itself.

hologram site:sciam.com

In this example, you'll find several results that you can grab from Google's cache but are no longer available on the Scientific American site.

Most publications that I've come across have some kind of common text string that you can use when searching Google for its archives. Usually it's a copyright disclaimer and most often it's at the bottom of a page. Use Google to search for that string and whatever query words you're interested in, and if that doesn't work, fall back on searching for the query string and domain name.