Years have passed since the topic of duplicate
content has generated a series of myths in the SEO community, which still confuse many
beginners but also experts. Basically, a content is said to be duplicated
when it appears several times on different URLs of blogs, e-commerce or other sites. You must be wondering why it is a problem
and to answer this question I’ll show various situations: Search engines: Google does not know which
page version should be included in the index or which version should be shown to users
in the search results. Webmasters: at link building level, some webmaster
may link content to unofficial sources, causing a loss of important backlinks.
Plagiarism: an attacker could force the indexing of his copy of others’ content before Google
visits the official version, taking advantage of the efforts of the real owner.
About 29% of the web, according to Raven Tools ⁽²³⁾ , contains duplicate content. In
recent years, Google has warned webmasters on this topic. For example, with the Digital
Millennium Copyright Act (DMCA) tool, the real owner can request the removal of all
copied pages from the search engine index. To debunk some myths, Google does not officially
penalize duplicate content, because it now has various ways to recognize it, but takes
action when it verifies that someone uses it to deliberately try to manipulate search
results or to perform other activities of this type. Where are duplicate contents? Equivalent URLs: some CMS can generate different
URLs for the same content (eg .com /article-1?p=true vs.com/article-1?p=false)
Versions of the site: if you have a separate version of the site that runs on http or https,
www or without www, it can happen that various URLs are generated for the same content.
Content theft: many people still copy and publish other people’s articles.
What are the solutions to these situations? Canonicalization: in the sectionof
all duplicate pages enterto indicate to Google
which is the official version. Redirect: use a 301 redirect to send anyone
who visits a duplicate page to the correct one. This practice also applies to http vs
https and www vs non-www. Noindex: use meta robots with noindex attribute
on duplicate pages to prevent them from being indexed.
To find out who is copying you, you can use services like Copyscape.com or manually search
Google for a phrase in quotation marks and then contact the webmaster to ask to remove