Performing Proximity Searches

advanced tipscreenshot tip71.gif

GAPS performs a proximity check between two words.
link

There are some times when it would be advantageous to search both forward and backward. For example, if you're doing genealogy research, you might find your uncle John Smith as both John Smith or Smith John. Similarly, some pages might include John's middle name - John Q Smith or Smith John Q.

If all you're after is query permutations, the Permute tip [Tip #62] might do the trick.

You might also need to find concepts that exist near each other but aren't a phrase. For example, you might want to learn about keeping squirrels out of your bird feeder. Various attempts to create a phrase based on this idea might not work, but just searching for several words might not find specific enough results.

GAPS, created by Kevin Shay, allows you to run searches both forward and backward and within a certain number of spaces of each other. GAPS stands for "Google API Proximity Search," and that's exactly what this application is: a way to search for topics within a few words of each other without having to run several queries in a row. The program runs the queries and organizes the results automatically.

You enter two terms (there is an option to add more terms that will not be searched for proximity) and specify how far apart you want them (1, 2, or 3 words). You can specify that the words be found only in the order you request (wordA, wordB) or in either order (wordA, wordB, and wordB, wordA). You can specify how many results you want and in what order they appear (sorted by title, URL, ranking, and proximity).

Search results are formatted much like regular Google results, only they include a distance
ranking beside each title. The distance ranking, between one and three, specifies how far apart the two query words were on the page. Figure 6-14 shows a GAPS search for google and tips within two words of one another, order intact.

Figure 6-14. GAPS search for "google" and "tips" within two words of one another
screenshot google-tips-0614.gif

Click the distance rating link pass the generated query on to Google directly.

Making the Most of GAPS

GAPS works best when you have words on the same page that are ambiguously or not at all related to one another. For example, if you're looking for information on Google and search engine optimization, you might find that searching for the words Google and SEO don't find the results you want, while using GAPS to search for the words Google and SEO within three words of each other find material focused much more on search engine optimization for Google.

GAPS also works well when you're searching for information about two famous people who might often appear on the same page, though not necessarily in proximity to each other. For example, you might want information on Bill Clinton and Alan Greenspan, but might find that you're getting too many pages that happen to list the two of them. By searching for their names in proximity to each other, you'll get better results.

Finally, you might find GAPS useful in medical research. Many times your search results will include "index pages" that list several symptoms. However, including symptoms or other medical terms within a few words of each other can help you find more relevant results. Note that this technique will take some experimentation. Many pages about medical conditions contain long lists of symptoms and effects, and there's no reason that one symptom might be within a few words of another.

The Code

The GAPS source code is rather lengthy so we're not making it available here. You can, however, get it online at http://www.staggernation.com/gaps/readme.html.

Other Staggernation Scripts

If you like GAPS, you might want to try a couple of other scripts from Staggernation:

    GAWSH (http://www.staggernation.com/gawsh/)
  • Stands for Google API Web Search by Host. This program allows you to enter a query and get a list of domains that contain information on that query. Click on the triangle beside any domain name, and you'll get a list of pages in that domain that match your query. This program uses DHTML, which means it'll only work with Internet Explorer or Mozilla/Netscape.
    GARBO (http://www.staggernation.com/garbo/)
  • Stands for Google API Relation Browsing Outliner. Like GAWSH, this program uses DHTML so it'll only work with Mozilla and Internet Explorer. When you enter an URL, GARBO will do a search for either pages that link to the URL you specify or pages related to that URL. Run a search and you'll get a list of URLs with triangles beside them. Click on a triangle, and you'll get a list of pages that either link to the URL you chose or are related to the URL you chose, depending on what you chose in the initial query.