Searching Google Topics

advanced tipscreenshot tip83.gif

A tip that runs a query against some of the available Google API specialty topics.

link

Google doesn't talk about it much, but it does make specialty web searches available. And I'm not just talking about searches limited to a certain domain. I'm talking about searches that are devoted to a particular topic. The Google API makes four of these searches available: The U.S. Government, Linux, BSD, and Macintosh.

In this tip, we'll look at a program that takes a query from a form and provides a count of that query in each specialty topic, as well as a count of results for each topic. This program runs via a form.

Why Topic Search?

Why would you want to topic search? Because Google currently indexes over 3 billion pages. If you try to do more than very specific searches you might find yourself with far too many results. If you narrow your search down by topic, you can get good results without having to exactly zero in on your search.

You can also use it to do some decidedly unscientific research. Which topic contains more iterations of the phrase "open source"? Which contains the most pages from .edu (educational) domains? Which topic, Macintosh or FreeBSD, has more on user interfaces? Which topic holds the most for Monty Python fans?

The Code

#!/usr/local/bin/perl
# gootopic.cgi
# Queries across Google Topics (and All of Google), returning 
# number of results and top result for each topic.
# gootopic.cgi is called as a CGI with form input
# Your Google API developer's key my $google_key='insert key here';
# Location of the GoogleSearch WSDL file my $google_wdsl = "./GoogleSearch.wsdl";
# Google Topics my %topics = (
 '' =» 'All of Google',
 unclesam =» 'U.S. Government',
 linux =» 'Linux',
 mac =» 'Macintosh',
 bsd =» 'FreeBSD'
);
use strict;
use SOAP::Lite;
use CGI qw/:standard *table/;
# Display the query form print
 header( ),
 start_html("GooTopic"),
 h1("GooTopic"),
 start_form(-method=»'GET'),
 'Query: ', textfield(-name=»'query'), '   ',
 submit(-name=»'submit', -value=»'Search'),
 end_form( ), p( );
my $google_search = SOAP::Lite-»service("file:$google_wdsl");
# Perform the queries, one for each topic area if (param('query')) {
 print 
 start_table({-cellpadding=»'10', -border=»'1'}),
 Tr([th({-align=»'left'}, ['Topic', 'Count', 'Top Result'])]);
 foreach my $topic (keys %topics) {
 my $results = $google_search -» 
 doGoogleSearch(
 $google_key, param('query'), 0, 10, "false", $topic, "false",
 "", "latin1", "latin1"
 );
 my $result_count = $results-»{'estimatedTotalResultsCount'};
 my $top_result = 'no results';
 if ( $result_count ) {
 my $t = @{$results-»{'resultElements'}}[0];
 $top_result = 
 b($t-»{title}||'no title') . br( ) .
 a({href=»$t-»{URL}}, $t-»{URL}) . br( ) .
 i($t-»{snippet}||'no snippet');
 }
 # Output
 print Tr([ td([
 $topics{$topic},
 $result_count,
 $top_result
 ])
 ]);
 }
 print 
 end_table( ),
}
print end_html( );

Running the Tip

The form code is built into the tip, so just call the tip with the URL of the CGI script. For example, if I was running the program on researchbuzz.com and it was called gootopics.pl, my URL might look like http://www.researchbuzz.com/cgi-bin/gootopic.cgi.

Provide a query and the script will search for your query in each special topic area, providing you with an overall ("All of Google") count, topic area count, and the top result for each. Figure 6-21 shows a sample run for "user interface" with Macintosh coming out on top.

Figure 6-21. Google API topic search for "user interface"
screenshot google-tips-0621.gif

Search Ideas

Trying to figure out how many pages each topic finds for particular top-level domains (e.g., .com, .edu, .uk) is rather interesting. You can query for inurl:xx site:xx, where xx is the top-level domain you're interested in. For example, inurl:va site:va searches for any of the Vatican's pages in the various topics; there aren't any. inurl:mil site:mil finds an overwhelming number of results in the U.S. Government special topic - no surprise there.

If you are in the mood for a party game, try to find the weirdest possible searches that appear in all the special topics. "Papa Smurf" is as good a query as any other. In fact, at this writing, that search has more results in the U.S. Government specialty search than in the others.