Preface
Search engines for large collections of data preceded the World Wide Web by decades. There were those massive library catalogs, hand-typed with painstaking precision on index cards and eventually, to varying degrees, automated. There were the large data collections of professional information companies such as Dialog and LexisNexis. Then there are the still-extant private, expensive medical, real estate, and legal search services.
Those data collections were not always easy to search, but with a little finesse and a lot of patience, it was always possible to search them thoroughly. Information was grouped according to established ontologies, data preformatted according to particular guidelines.
Then came the Web.
Information on the Web - as anyone knows who's ever looked at half-a-dozen web pages knows - is not all formatted the same way. Nor is it necessarily particularly accurate. Nor up to date. Nor spellchecked. Nonetheless, search engines cropped up, trying to make sense of the rapidly-increasing index of information online. Eventually, special syntaxes were added for searching common parts of the average web page (such as title or URL). Search engines evolved rapidly, trying to encompass all the nuances of the billions of documents online, and they still continue to evolve today.
Google
™ threw its hat into the ring in 1998. The second incarnation of a search engine service known as BackRub, the name "Google" was a play on the word "googol
," a one followed by a hundred zeros. From the beginning, Google was different from the other major search engines online - AltaVista, Excite, HotBot, and others.
Was it the technology? Partially. The relevance of Google's search results was outstanding and worthy of comment. But more than that, Google's focus and more human face made it stand out online.
With its friendly presentation and its constantly expanding set of options, it's no surprise that Google continues to get lots of fans. There are weblogs devoted to it. Search engine newsletters, such as ResearchBuzz, spend a lot of time covering Google. Legions of devoted fans spend lots of time uncovering documented features, creating games (like Google wtiping) and even coining new words (like "Googling," the practice of checking out a prospective date or hire via Google's search engine.)
In April 2002, Google reached out to its fan base by offering the Google API. The Google API gives developers a legal way to access the Google search results with automated queries (any other way of accessing Google's search results with automated software is against Google's Terms of Service.)
- ..
- General Guidelines Overview
- Ad Blocking Extensions
- Product Queries: Action (Do) vs. Information (Know) Intent
- Mobile Landing Pages
- Using the Evaluation Platform
- Introduction
- Notes about Using the Block Utility Rating Interface
- Using the Report a Problem / Release this Task Button
- Reporting Results with Duplicate Landing Pages
- Pre-Identified Duplicates
- Rater-Identified Duplicates
- Using the Page Quality Rating Scale in Block Utility Rating Tasks
- Understanding the Query
- Understanding User Intent
- Task Language and Task Location (Locale)
- Queries with Multiple Meanings
- Query Meanings Can Change Over Time
- Understanding Websites
- Understanding Web Page Content
- Utility Rating Scale for Result Blocks
- Vital (V)
- Useful (U)
- Relevant (R)
- Slightly Relevant (SR)
- Off-Topic or Useless (OT)
- Rating Porn (P), Foreign Language (F), and Didn?t Load (D) Results
- Utility Rating Guidelines
- Porn (P) Flag
- Porn Result Utility Rating
- Reporting Illegal Images
- Rating Results with Content that Doesn?t Load
- Queries that Ask for Nearby Results or Nearby Information
- User Location and Understanding the Query
- Queries with an Explicit Location
- Vital Ratings for Queries with a User Location
- Rating Queries with User Location and Explicit Location
- Rating Local Intent Queries
- Rating Examples with User Location, Explicit Location, and Local Intent
- Page Quality Rating Guidelines
- Overview of Page Quality Evaluation
- About Identifying the Purpose of the Page
- Rating the Quality of the Main Content
- Rating the Quantity of Helpful Main Content
- Rating the Helpfulness of the Supplementary Content
- Rating the Layout of the Page/Use of Space on the Page
- What Kind of Reputation Does the Website Have?
- The Purpose of Search Quality Rating
- Assigning a Page Quality Rating to Pages with no Main Content/Error Messages
- Raters Must Represent the User
- What is Webspam?
- When to Check for Spam
- Ads and Spam Pages
- Thin Affiliates
- Page Quality Rating FAQs
- Block Utility Rating
- Browser Requirements
- Overview of Block Utility (BU) Rating
- The Relationship between PQ and Utility
- Queries with Multiple Interpretations and Intents
- Specificity of Queries and Landing Pages
- Utility and Freshness
- Misspelled and Mistyped Queries and Results
- Misspelled and Mistyped Queries
- Name Queries
- Spelling Suggestion Result Blocks
- Non-Vital Results for URL Queries
- Specialized Vocabularies: Slang and Terminology
- Getting Around the 10 Word Limit
- Word Order Matters
- Repetition Matters
- Mixing Syntaxes
- ing Google URLs
- ing Google Search Forms
- Date-Range Searching
- Understanding and Using Julian Dates
- Using Full-Word Wildcards
- What Google Isn't
- inurl: Versus site:
- Checking Spelling
- Consulting the Dictionary
- Consulting the Phonetutorial
- Tracking Stocks
- Google Interface for Translators
- Searching Article Archives
- Finding Directories of Information
- Finding Technical Definitions
- Finding Weblog Commentary
- What Google Is
- The Google Toolbar
- The Mozilla Google Toolbar
- The Quick Search Toolbar
- GAPIS
- Googling with Tutorialmarklets
- Google Basics
- The Special Syntaxes
- Advanced Search
- Setting Preferences
- Language Tools
- Anatomy of a Search Result
- Searching Google
- Google's Current Offerings
- Google Directory
- Google Groups
- Google Images
- Google News
- Google Catalogs
- Froogle
- Google Labs
- Google Special Services and Collections
- Google People
- Of Google, but Not Google
- Tinkering with the UI
- Expanding the Options with the Google API
- Thinking Way Outside the Box
- XooMLe: The Google API in Plain Old XML
- Google by Email
- Simplifying Google Groups URLs
- What Does Google Think Of...
- Third-Party Google Services
- Scraping the Google Phonetutorial
- Don't Try This at Home
- Building a Custom Date-Range Search Form
- Building Google Directory URLs
- Scraping Google Results
- Scraping Google AdWords
- Scraping Google Groups
- Scraping Google News
- Scraping Google Catalogs
- Non-API Google Applications
- s #50-59
- Looping Around the 10-Result Limit
- The SOAP::Lite Perl Module
- Plain Old XML, a SOAP::Lite Alternative
- NoXML, Another SOAP::Lite Alternative
- Programming the Google Web API with PHP
- Programming the Google Web API with Java
- Programming the Google Web API with Python
- Programming the Google Web API with C# and .NET
- Programming the Google Web API with VB.NET
- Why an API?
- Signing Up and Google's Terms
- The Google Web APIs Developer's Kit
- Using the Key in a Tip
- What's WSDL?
- Understanding the Google API Query
- Understanding the Google API Response
- Programming the Google Web API with Perl
- Introducing the Google Web API
- Tracking Result Counts over Time
- Visualizing Google Results
- Meandering Your Google Neighborhood
- Running a Google Popularity Contest
- Building a Google Box
- Capturing a Moment in Time
- Feeling Really Lucky
- Gleaning Phonetutorial Stats
- Performing Proximity Searches
- Blending the Google and Amazon Web Services
- The Ingenuity of Millions
- Getting Random Results (On Purpose)
- Restricting Searches to Top-Level Results
- Searching for Special Characters
- Digging Deeper into Sites
- Summarizing Results by Domain
- Scraping Yahoo! Buzz for a Google Search
- Measuring Google Mindshare
- Comparing Google Results with Those of Other Search Engines
- SafeSearch Certifying URLs
- Syndicating Google Search Results
- Learning to Code
- Searching Google Topics
- Finding the Largest Page
- Instant Messaging Google
- What You'll Find Here
- Finding More Google API Applications
- The Possibilities Aren't Endless, but They're Expanding
- Date-Range Searching with a Client-Side Application
- Adding a Little Google to Your Word
- Permuting a Query
- Google Web API Applications
- The No-Result Search (Prank)
- Google Wtiping
- GooPoetry
- Creating Google Art
- Google Bounce
- Google Mirror
- Finding Recipes
- Google Pranks and Games
- Generating Google AdWords
- Inside the Google PageRank Algorithm
- Steps to 15K a Day
- Being a Good Search Engine Citizen
- Cleaning Up for a Google Visit
- Getting the Most out of AdWords
- Removing Your Materials from Google
- Google's Preeminence
- Google's Importance to Webmasters
- The Mysterious PageRank
- The Equally Mysterious Algorithm
- Google's Ad Programs
- Keeping Up with Google's Changes
- In a Word: Relax
- A Webmaster's Introduction to Google
- The Webmaster Side of Google
- Internet Safety Information
- Balancing Page Level and Website Level Questions to Assign an Overall Page Quality Rating
- How to Check for Copied Content
- Assigning an Overall Page Quality Rating
- Highest Quality Pages
- High Quality Pages
- Medium Quality Pages
- Low Quality Pages
- Lowest Quality Pages
- Webspam: a Special Type of Lowest Page Quality
- What is Webspam?
- Looking for Technical Signals
- Releasing Tasks
- Understanding the Query
- Understanding User Intent
- Doorway Pages
- Thin Affiliates
- Task Language and Task Location (Locale)
- Page Quality Rating FAQs
- The Relationship between PQ and Utility
- Queries with Multiple Interpretations and Intents
- Specificity of Queries and Landing Pages
- Utility and Freshness
- Misspelled and Mistyped Queries and Results
- Misspelled and Mistyped Queries
- Name Queries
- Spelling Suggestion Result Blocks
- Queries with Multiple Meanings
- Non-Vital Results for URL Queries
- Product Queries: Action (Do) vs. Information (Know) Intent
- Mobile Landing Pages
- Using the Evaluation Platform
- Introduction
- Accessing the Evaluation Platform (EP)
- Evaluation Platform Screenshot
- The red numbers represent the following:
- Recent tasks
- Sign out
- Acquire if available
- Block Utility Task Page Screenshot
- The red numbers represent the following:
- Recent tasks
- Sign out
- Web
- Experimental
- Average Estimated Time
- Classification of User Intent: Action, Information, and Navigation: Do-Know-Go
- Instructions
- Query
- Locale
- Report a Problem / Release this Task
- Utility Rating Slider
- Page Quality Rating Slider
- P ? F ? D Flags
- Report Dupe
- Comment
- Dupe Confirmation Checkbox
- Action Queries: Do
- Submit
- Submit and Stop Rating
- Cancel
- Notes about Using the Block Utility Rating Interface
- Using the Report a Problem / Release this Task Button
- Reporting Results with Duplicate Landing Pages
- Pre-Identified Duplicates
- Rater-Identified Duplicates
- Reporting Duplicate Results
- Information Queries: Know
- Using the Page Quality Rating Scale in Block Utility Rating Tasks
- Navigation Queries: Go
- Queries with Multiple User Intents (Do-Know-Go)
- Understanding Websites
- Important Definitions and Ideas
- Identifying the Purpose of the Page
- Understanding Web Page Content
- Utility Rating Guidelines
- Utility Rating Scale for Result Blocks
- Vital (V)
- Introduction to Utility Rating
- Useful (U)
- Porn (P) Flag
- Porn Result Utility Rating
- Important Rating Definitions and Ideas
- Reporting Illegal Images
- Foreign Language Results
- Rating Results with Content that Doesn?t Load
- The Purpose of Search Quality Rating
- Location and Utility Rating
- Locale is Important
- Queries that Ask for Nearby Results or Nearby Information
- User Location and Understanding the Query
- Queries with an Explicit Location
- Vital Ratings for Queries with a User Location
- Raters Must Represent the User
- Introduction to Page Quality
- Important Information about the Page Quality Guidelines
- Your Money or Your Life (YMYL)
- Landing Page Considerations
- Identifying the Main Content, Supplementary Content, and Advertisements
- Rating the Quality of the Main Content
- Rating the Quantity of Helpful Main Content
- Rating the Helpfulness of the Supplementary Content
- Rating the Layout of the Page/Use of Space on the Page
- Answering Homepage and Website Questions
- Finding the Homepage of the Website
- Is the Purpose of the Page Consistent with the Website?
- Who is Responsible for the Content of the Website and the Content of the Page?
- Does the Website Have an Appropriate Amount of Contact Information?
- What Kind of Reputation Does the Website Have?
- Is the Homepage of the Website Updated and Maintained?
- Additional Page Quality Rating Guidance
- Assigning a Page Quality Rating to Your Money or Your Life (YMYL) Pages
- Assigning a Page Quality Rating to Encyclopedia Pages