Equal but different: a contextual analysis of duplicated videos on YouTube

Rodrigues, Tiago; Benevenuto, Fabrício; Almeida, Virgílio; Almeida, Jussara; Gonçalves, Marcos

doi:10.1007/s13173-010-0019-x

Original Paper
Open access
Published: 19 August 2010

Equal but different: a contextual analysis of duplicated videos on YouTube

Tiago Rodrigues¹,
Fabrício Benevenuto¹,
Virgílio Almeida¹,
Jussara Almeida¹ &
…
Marcos Gonçalves¹

Journal of the Brazilian Computer Society volume 16, pages 201–214 (2010)Cite this article

1040 Accesses
7 Citations
Metrics details

Abstract

Videos have become a predominant part of users’ daily lives on the Web, especially with the emergence of online video sharing systems such as YouTube. Since users can independently share videos in these systems, some videos can be duplicates (i.e., identical or very similar videos). Despite having the same content, there are some potential context differences in duplicates, for example, in their associated metadata (i.e., tags, title) and their popularity scores (i.e., number of views, comments). Quantifying these differences is important to understand how users associate metadata to videos and to understand possible reasons that influence the popularity of videos, which is crucial for video information retrieval mechanisms, association of advertisements to videos, and performance issues related to the use of caches and content distribution networks (CDNs). This work presents a wide quantitative characterization of the context differences among identical contents. Using a large video sample collected from YouTube, we construct a dataset of duplicates. Our measurement analysis provides several interesting findings that can have implications for how videos should be retrieved in video sharing websites as well as for advertising systems that need to understand the role that users play when they create content in services such as YouTube.

References

Adar E, Zhang L, Adamic L, Lukose R (2004) Implicit structure and the dynamics of blogspace. In: Workshop on the Weblogging Ecosystem
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. ACM/Addison-Wesley, New York/Reading
Google Scholar
Benevenuto F, Duarte F, Rodrigues T, Almeida V, Almeida J, Ross K (2008) Understanding video interactions in youtube. In: ACM int’l conference on multimedia (MM)
Benevenuto F, Rodrigues T, Almeida V, Almeida J, Zhang C, Ross K (2008) Identifying video spammers in online social networks. In: Workshop on adversarial information retrieval on the web (AIRWeb)
Benevenuto F, Rodrigues T, Almeida V, Almeida J, Gonçalves M (2009) Detecting spammers and content promoters in online video social networks. In: Int’l ACM SIGIR
Benevenuto F, Rodrigues T, Almeida V, Almeida J, Ross K (2009) Video interactions in online video social networks. In: ACM trans on multimedia computing, communications and applications (TOMCCAP)
Cha M, Kwak H, Rodriguez P, Ahn Y, Moon S (2007) I tube, you tube, everybody tubes: analyzing the world’s largest user generated content video system. In: ACM SIGCOMM conference on Internet measurement (IMC)
Cherubini M, Oliveira R, Oliver N (2009) Understanding near-duplicate videos: a user-centric approach. In: ACM int’l conference on multimedia (MM)
Comscore (2010) http://www.comscore.com/. June 2010
Comscore (2010) Youtube now 25 percent of all Google searches. http://tinyurl.com/4t32l4. June 2010
del.icio.us web site (2010) http://www.delicious.com. June 2010
Flickr web site (2010) http://www.flickr.com. June 2010
Gill P, Arlitt M, Li Z, Mahanti A (2007) Youtube traffic characterization: a view from the edge. In: ACM SIGCOMM conference on Internet measurement (IMC)
Golbeck J (2008) Trust and nuanced profile similarity in online social networks. Technical report
Hauptmann A, Wu X, Ngo C, Tan H (2009) Real-time near-duplicate elimination for web video search with content and context. IEEE Trans Multimedia 11(2):196–207
Article Google Scholar
Heymann P, Koutrika G, Garcia-Molina H (2007) Fighting spam on social web sites: a survey of approaches and future challenges. IEEE Internet Comput 11:36–45
Article Google Scholar
Huang Z, Wang L, Shen H, Shao J, Zhou X (2009) Online near-duplicate video clip detection and retrieval: an accurate and fast system. In: IEEE int’l conference on data engineering (ICDE)
Ispell (2010) http://www.gnu.org/software/ispell/ispell.html. June 2010
Jain R (1991) The art of computer systems performance analysis: techniques for experimental design, measurement, simulation, and modeling. Wiley, New York
MATH Google Scholar
Jones KS, Willett P (eds) (1997) Readings in information retrieval. Morgan Kaufmann, San Mateo
Google Scholar
Koutrika G, Effendi F, Gyöngyi Z, Heymann P, Garcia-Molina H (2007) Combating spam in tagging systems. In: Workshop on adversarial information retrieval on the Web (AIRWeb)
Lerman K, Jones L (2007) Social browsing on Flickr. In: Int’l conference on weblogs and social media (ICWSM)
Li X, Guo L, Zhao Y (2008) Tag-based social interest discovery. In: Int’l World Wide Web conference (WWW)
Marshall CC (2009) No bull, no spin: a comparison of tags with other forms of user metadata. In: ACM/IEEE conference on digital libraries (JCDL)
Oliveira R, Cherubini M, Oliver N (2009) Human perception of near-duplicate videos. In: Int’l conference on human-computer interaction (INTERACT)
Rijsbergen C (1979) Information retrieval. Butterworth, Stoneham
Google Scholar
Rodrigues T, Benevenuto F, Almeida V, Almeida J, Gonçalves M (2009) Uma análise contextual de conteúdo duplicado no youtube. In: Simpósio Brasileiro de sistemas multimídia e Web (WebMedia)
Suchanek F, Vojnovic M, Gunawardena D (2008) Social tags: meaning and suggestions. In: ACM conference on information and knowledge management (CIKM)
Tan H-K, Ngo C-W, Hong R, Chua T-S (2009) Scalable detection of partial near-duplicate videos by visual-temporal consistency. In: ACM international conference on multimedia (MM)
Wu X, Hauptmann A, Ngo C (2007) Practical elimination of near-duplicates from web video search. In: Int’l conference on multimedia
Zhu J, Hoi S, Lyu M, Yan S (2008) Near-duplicate keyframe retrieval by nonrigid image matching. In: ACM int’l conference on multimedia (MM)
Zink M, Suh K, Gu Y, Kurose J (2008) Watch global, cache local: Youtube network traces at a campus network—measurements and implications. In: IEEE multimedia computing and networking (MMCN)

Download references

Author information

Authors and Affiliations

Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
Tiago Rodrigues, Fabrício Benevenuto, Virgílio Almeida, Jussara Almeida & Marcos Gonçalves

Authors

Tiago Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
Fabrício Benevenuto
View author publications
You can also search for this author in PubMed Google Scholar
Virgílio Almeida
View author publications
You can also search for this author in PubMed Google Scholar
Jussara Almeida
View author publications
You can also search for this author in PubMed Google Scholar
Marcos Gonçalves
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Virgílio Almeida.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Rodrigues, T., Benevenuto, F., Almeida, V. et al. Equal but different: a contextual analysis of duplicated videos on YouTube. J Braz Comput Soc 16, 201–214 (2010). https://doi.org/10.1007/s13173-010-0019-x

Download citation

Received: 08 March 2010
Accepted: 23 July 2010
Published: 19 August 2010
Issue Date: September 2010
DOI: https://doi.org/10.1007/s13173-010-0019-x