Every night before I go to bed I brush my teeth, pray to the mighty god’s of SEO, and then usually read the latest happenings on the Google Panda update. A few of my favorite sites were affected by Panda and this mishap has been the center of my attention for the past 2 months.
A recent thread I found questioned the connection between the Panda slap and scrapers reposting dupe content (Are DMCAs the Answer to Panda?). This thread poses an interesting question: Could you be getting punished because someone stole your content?
It’s a scary thought but some are coming to the conclusion that this just might be the case. For some people, their websites hardest hit by Panda are also the ones most scraped and plagarised.
I wanted to check out if that was the case for me, could my fitness blog hit hard by Panda just be a victim of hardcore copyright infringement?
Scaling the Plagarism Checking
After fumbling around with copyscape for, oh I don’t know, 3 minutes I realized this just isn’t going to work.
I needed to find a better solution, something that could check my website’s in BULK.
I eventually came on this thread @ BHW: Check for Duplicate Content in Bulk
After reading the above post on there I downloaded the freeware Uncover made by textbroker.
Uncover: A Free Tool for Finding Duplicate Content in Bulk
Basically you can point Uncover to your sitemap or achives page and it will grab all the links on the page. You can even go several levels deep (more than 1) but I don’t recommend doing this. It will collect a bunch of URLs that you don’t need checked (like feed urls and plugin urls) and will make the checking a lot slower.
After it’s grabbed all the urls you want checked you can click the button and off it goes. This runs on your computer and depending on how many articles you’ve got it should be done within 10-20 minutes or so.
After it’s complete you can go through each URL checked one by one and it gives a little list of potential copies, how many words are copied, and what percentage % you’re copied.
An interesting detail about this program is that it seems to work even better than Copyscape itself. I cross-referenced the dupe content it was spitting out with Copyscape and it was picking up MAJOR copies that Copyscape wasn’t even reporting. Pretty interesting considering Copyscape seems to be the standard in the industry and not reporting on 80% copied articles is a bit alarming.
My Results from Using Uncover For Just 10 Minutes….Holy Auto-Scraping Batman
Only after checking the first 5 posts I came across a user that was set up about 3 years ago on Zimbio to scrape several of my blogs at a time using the autoimport feature. It was taking posts the very day they were published off a lot of my blogs and slapping them up there.
Same with a feed site I found.
What kind of affect this could have on my sites I’m not sure, but it certainly can’t be healthy. A couple support e-mails I’m sure the problem will be taken care of.
Usually I am just focused on building links/content, not protecting what I’ve already done. So I guess it’s understandable that it’s taken me this long to catch these CONTENT OFFENDERS.
There are a couple other random sites I found copying my exercise site’s content, I think a DMCA notice or two should do the trick here.
Filing DMCA’s and Releasing the Dogs
This is the step I’m currently at.
Here is the click path:
- Web Search ->
- “I have a legal issue that is not mentioned above” ->
- “I have found content that may violate my copyright” ->
- “Yes, I am the copyright owner *or am authorized to act on behalf of the owner of an exclusive right that is allegedly infringed.”
*Outsource this if you have a ton of these to do.
I’ve seen some people suggest finding offending sites e-mail and try to e-mail them to take it down. Unless its some kind of reasonably credible site with a support e-mail I think that’s pretty much a waste of time.
Therefore, we must release the DMCA dogs across all four corners of the web to smite our foes.
Will keep you updated.
SEO TEST STUFF BELOW