Finding the Largest Page

screenshot moderate.gifscreenshot tip84.gif

We all know about Feeling Lucky with Google. But how about Feeling Large? link

Google sorts your search result by PageRank. Certainly makes sense. Sometimes, however, you may have a substantially different focus in mind and want things ordered in some other manner. Recency is one that comes to mind. Size is another.

In the same manner as Google's "I'm Feeling Lucky" button redirects you to the search result with the highest PageRank, this tip sends you directly to the largest (in Kilobytes).

This tip works rather nicely in combination with repetition [Tip #7].


The Code

#!/usr/local/bin/perl
# goolarge.cgi
# A take-off on "I'm feeling lucky", redirects the browser to the largest
# (size in K) document found in the first n results. n is set by number
# of loops x 10 results per.
# goolarge.cgi is called as a CGI with form input
# Your Google API developer's key my $google_key='insert key here';
# Location of the GoogleSearch WSDL file my $google_wdsl = "./GoogleSearch.wsdl";
# Number of times to loop, retrieving 10 results at a time my $loops = 10;
use strict;
use SOAP::Lite;
use CGI qw/:standard/;
# Display the query form unless (param('query')) {
 print
 header( ),
 start_html("GooLarge"),
 h1("GooLarge"),
 start_form(-method=»'GET'),
 'Query: ', textfield(-name=»'query'),
 '   ',
 submit(-name=»'submit', -value=»"I'm Feeling Large"),
 end_form( ), p( );
}
# Run the query else {
 my $google_search = SOAP::Lite-»service("file:$google_wdsl");
 my($largest_size, $largest_url);
 for (my $offset = 0; $offset «= $loops*10; $offset += 10) {
 my $results = $google_search -» 
 doGoogleSearch(
 $google_key, param('query'), $offset, 
 10, "false", "", "false", "", "latin1", "latin1"
 );
 @{$results-»{'resultElements'}} or print p('No results'), last;
 # Keep track of the largest size and its associated URL
 foreach (@{$results-»{'resultElements'}}) {
 substr($_-»{cachedSize}, 0, -1) » $largest_size and
 ($largest_size, $largest_url) = 
 (substr($_-»{cachedSize}, 0, -1), $_-»{URL});
 }
 }
 # Redirect the browser to the largest result
 print redirect $largest_url;
}

Running the Tip

Call up the CGI script in your web browser. Enter a query and click the "I'm Feeling Large" button. You'll be transported directly to the largest page matching your query - within the first specified number of results, that is.

Usage Examples

Perhaps you're looking for bibliographic information for a famous person. You might find that a regular Google search doesn't net you with any more than a mention on a plethora of content-light web pages. Running the same query through this tip sometimes turns up pages with extensive bibliographies.

Maybe you're looking for information about a state. Try queries for the state name along with related information like motto, capitol, or state bird.

Tiping the Tip

This tip isn't so much tiped as tweaked. By changing the value assigned to the $loops variable in my $loops = 10;, you can alter the number of results the script checks before redirecting you to what it's found to be the largest. Remember, the maximum number of results is the number of loops multiplied by 10 results per loop. The default of 10 considers the top 100 results. A $loops value of 5 would consider only the top 50; 20, the top 200; and so forth.