Plagiarism of website content is all too easy to do – and therefore very common. Content may be stolen by ‘scraper’ sites (spam sites that automatically copy and paste content from other sites) or by individuals – perhaps rewording the odd phrase to avoid detection.
If you own a business website or publish a blog, sites which copy your content and republish it as their own are not just annoying – they may hurt your own site’s ranking amongst search engines like Google – so your site becomes less popular.
Sites can republish your work so quickly that their plagiarized content might be indexed by search engines before your original version i.e. it could look as if you had stolen content from their site. In this case your site could be penalized and drop down the search rankings…
Tip: WordPress publishers – use the PubSubHubbub plugin to notify Google and other services of your newly published RSS feed content in real time. This also helps ensure that Google and other search engines know that you are the original content creator.
Another good reason to check for plagiarism is if you plan to buy content from a paid writer or post an article contributed by a guest. You need to be sure their work is not just copied from an article posted elsewhere or you will face the same search engine penalties as above – and possibly claims for copyright infringement too.
Tip: to find out if anyone is stealing your website’s pictures, use Google Image Search to do a reverse image search.
To find out who is stealing your website content you could just copy a paragraph from your own (or paid/guest writer’s) article into Google and search for matches. However, it would be very time consuming as such searches are limited – you can’t look for the whole article in one go.
A better solution is to use a plagiarism search service – here are four to choose from, both paid and free options:
Copyscape – Probably the best known online plagiarism detection service – ranked the best solution in independent tests (in 2008). The free option finds duplicate content of your web page online – visit Copyscape and type your page URL then press Go.
It is limited to 10 results – click a result to see a comparison with the content on your site with colored highlighting of blocks of duplicate text. Copyscape Premium is a paid service (5 cents per search) with many extra features. It lifts the limit of 10 results and also lets you paste text into the search box, not just a URL – useful for checking content not yet published e.g. from guest authors.
Another paid option called CopySentry provides proactive protection – it automatically scans the web daily or weekly and emails you when new copies of your content are found. Costs vary from $4.95 to $19.95 per month for weekly/daily protection respectively (up to 10 pages).
Plagium – Plagium offers a free search for simple text up to 25,000 characters and recently added scanning of text within Facebook postings and Twitter feeds.
Any larger text or paragraph by paragraph ‘deep search’ analysis requires purchase of credits.
Search results are ranked by percentage and can be opened in a popup window with the requested text highlighted in the article. You can open up the search result to visit the target webpage.
Plagiarism-Detect – The site looks basic and has little information on it, not even a contact form.
However, it is free, has no word length restrictions and offers checking of essays/articles (upload text file or copy/paste text) or websites via URL.
Interestingly it is driven by Microsoft’s Bing rather than Google so it may produce different results to the others.
The service summarizes the number of words/sentences in the text and provides a ‘plagiarized from source’ percentage score against each result. It provides direct links to the result pages and options to save, print or hide less relevant sources.
It may be basic but I liked this service – it takes longer than the others to produce a report but the interface is neat and results useful.
CopyGator – Copygator is a free service to monitor your RSS feed and find where your content has been republished. It looked promising and offers automatic notifications when your post is copied to another feed.
However it does need to validate the feed first – a process which is supposed to take 10-15 minutes but which I left for hours and it never completed.
Maybe I was unlucky and hit it on a particularly busy day but in the end I gave up.
The four free services offer similar but subtly different features – searching was performed via URL, text, RSS feed or a combination of the three. Most gave useful results but they did vary – presumably due to the methods used in searching.
In my limited testing there wasn’t an overall winner – each had its own merits. Try a free option before paying for any additional features to check if it meets all your requirements.