Duplicate Content Revisited
Google tries to select the dupes and then put all but one of them into the “supplemental index”. If a domain has just a few instances of duplication like this in the Google index, things tend to go on as normal. But when many, many urls start showing up, all with identical content, then something seems to get tripped at Google and a site can start to see trouble.
One of my domains in supplemental hell (where not even my home page can be found on the first page of a site: query) uses a few directories to link off to sponsors - this adds up to a few thousand supplemental URLs out of 7,000 indexed. I also have session ID type seed in urls that results in thousands of supplementals. The result? I’m seeing even unqiue pages end up being listed by Google as supplemental.
Still, didn’t Matt Cutts say there’s no such thing as “duplicate content penalty?” What tedster says doesn’t sound like a mere filter - it sounds like a site wide penalty. Then again, I never fully trust what anyone says. I mean…Google didn’t even notice the sandbox phenomenon until someone pointed it out to them :)
Why doesn’t Google just want to forget supplementals? (speculation) Because if any page points to them, Moz-bot will have to recrawl them later, and it’s better to have records of these urls to avoid wasting time recrawling and indexing them.
Makes you want to start a few domains over from scratch, doesn’t it? My only hope at this point is to increase the number of incoming links, beef up PR sitewide and write more content. Still, seeing site:domain.com return the same SERP for the last 2-3 weeks isn’t encouraging.