Equal but different: a contextual analysis of duplicated videos on YouTube

  • Tiago Rodrigues1,
  • Fabrício Benevenuto1,
  • Virgílio Almeida1Email author,
  • Jussara Almeida1 and
  • Marcos Gonçalves1
Journal of the Brazilian Computer Society201016:19

Received: 8 March 2010

Accepted: 23 July 2010

Published: 19 August 2010


Videos have become a predominant part of users’ daily lives on the Web, especially with the emergence of online video sharing systems such as YouTube. Since users can independently share videos in these systems, some videos can be duplicates (i.e., identical or very similar videos). Despite having the same content, there are some potential context differences in duplicates, for example, in their associated metadata (i.e., tags, title) and their popularity scores (i.e., number of views, comments). Quantifying these differences is important to understand how users associate metadata to videos and to understand possible reasons that influence the popularity of videos, which is crucial for video information retrieval mechanisms, association of advertisements to videos, and performance issues related to the use of caches and content distribution networks (CDNs). This work presents a wide quantitative characterization of the context differences among identical contents. Using a large video sample collected from YouTube, we construct a dataset of duplicates. Our measurement analysis provides several interesting findings that can have implications for how videos should be retrieved in video sharing websites as well as for advertising systems that need to understand the role that users play when they create content in services such as YouTube.


Video duplicates Metadata association Social network YouTube