How to Check for Copied Content


So how do you determine whether the content is copied or copied with minimal alteration? How do you identify the original source of the content? This can be difficult, but please follow these steps.

• Copy a sentence or a series of several words in the text. It may be necessary to try a few sentences or phrases just to be sure. When deciding what sentence or phrase to copy, try to find a sentence or series of several words without punctuation, unusual characters, or suspicious words that may have replaced the original text.

• Search using Google by putting the entire sentence or phrase within quotation marks inside the search box.

For example, try searching for the sentence [“Many details are omitted or altered while many of the perils that Dorothy encountered in the novel are not at all mentioned in the feature film”] or the phrase [“timid Munchkins come out of hiding to celebrate the demise”]. Sometimes, it is helpful to try the same search without the quotes, e.g. [timid Munchkins come out of hiding to celebrate the demise].

• Compare the pages you find that match the sentence or phrase. Is most of their MC the same? If so, does one clearly come from a highly authoritative source which is known for original content creation (newspaper, magazine, medical foundation, etc)? Does one source appear to have the earliest publication date? Does one source seem to reasonably be the original?
Use your best judgment. Sometimes it is clear that the content is copied from somewhere, but you cannot tell what the original source is. Or sometimes the content found on the original source has changed enough that searches for sentences or phrases may no longer match the original source. For example, Wikipedia articles can change dramatically over time. Old copies may not match the current content. If you strongly suspect the page you are evaluating is not the original source, go ahead and use the Low quality MC rating.
Sometimes content is intentionally revised to make it difficult to determine that the content has been copied. Content may even be put through a translator to revise it. For example, if the original content is in English, it may be put through a translator twice: first to change it to a foreign language and second to translate it back to English. Text that has been changed in this way will often sound nonsensical.

Any time you find copied content or suspect the page has copied content, please explain in the comment box. Please include the original source (URL or description) if you are able to find it.
We will now walk you through two examples to determine if the content is copied.

Example 1 – No clear original source

Copied content - suspiciousThere is a paragraph at the top, followed by a line. Then there is an article below.

Notice something a bit funny? Instead of the word "flower" or "flowers," this article uses "f" or "fs" in the text. This page looks suspicious. Let's try to figure out if the content is copied from elsewhere. Let's use the sentence: "Flowers which last only one day, like day lilies, do not dry well." It does not have the odd "fs" abbreviation.


• Do a search on Google with that sentence in quotes: ["Flowers which last only one day, like day lilies, do not dry well."].

You will see that there are many webpages with this sentence, though most use the word "flowers" rather than "fs" in the text. In fact, if you go to the last page of Google results, you'll find this:

"In order to show you the most relevant results, we have omitted some entries very similar to the 25 already displayed. If you like, you can repeat the search with the omitted results included."
Clicking on the blue link will give you over a hundred results for this sentence, many of which contain this article text or links to pages containing this article text. But none of these results seems to be very authoritative, and no single web result looks like it is the original source of the article.
In some cases, even though the original source of content may be difficult or impossible to identify, you can still be fairly certain you have a copy. In the example above, it's highly unlikely that this is the original source of the content. It has the odd "fs" and "f" abbreviations (probably done so that search engines will not detect that this is copied content). There are also copying errors and other alterations on this page. Look at the bottom and you'll see "For Ber Colors: Rapid drying in a very warm, dry and bly-lit place will produce b blossoms; s drying in a more humid spot will produce more muted colors."
While we cannot be sure of the original source, this is clearly a copy. It's actually less helpful than an unaltered copy. The abbreviations make it difficult to understand the text in places. If you see similar issues when rating, please make a note in your comments.

Example 2 – Clear original source


Clear original source of content found on other sites
This example is from an actual page quality rating task.

Some raters indicated that this page has copied content. Does it? Let's continue through the steps to see.


• Search for this phrase: “A master of creating the illusion of three-dimensional forms and figures on flag walls.” You will find many results. We should try to determine if there is one plausible original source for this content.


• Let's start with our URL. On the landing page, click the "about us" link to see who is responsible for the content of the website: About us page of clear original source
. You'll find this information which shows that this is a very authoritative source for the content:

Since the laying of the Capitol cornerstone by George Washington in 1793, the Architect of the Capitol (AOC) has served the United States as builder and steward of many of the nation's most iconic and indelible landmark buildings. These include

the U.S. Capitol, Capitol Visitor Center, Senate Office Buildings, House Office Buildings, Supreme Court, Library of Congress, U.S. Botanic Garden and Capitol Grounds.
Now let's look at this result which came up when we searched for the phrase on Google: Page with copied content. This is a far less authoritative source. In addition, the text at the bottom of the article cites "the Architect of the Capitol" at the bottom in the Credits section: “Images and descriptions online, courtesy Architect of the Capitol.”
There are other copies of this article on the web, but we can see that the original URL is a page on a highly authoritative website for this content and that other sources cite this page. At this point, we can conclude that Clear

original source is the original source of the article, though there are many copies on other websites.


« Previous    Next »