- Original Paper
- Open access
- Published:
Equal but different: a contextual analysis of duplicated videos on YouTube
Journal of the Brazilian Computer Society volume 16, pages 201–214 (2010)
Abstract
Videos have become a predominant part of users’ daily lives on the Web, especially with the emergence of online video sharing systems such as YouTube. Since users can independently share videos in these systems, some videos can be duplicates (i.e., identical or very similar videos). Despite having the same content, there are some potential context differences in duplicates, for example, in their associated metadata (i.e., tags, title) and their popularity scores (i.e., number of views, comments). Quantifying these differences is important to understand how users associate metadata to videos and to understand possible reasons that influence the popularity of videos, which is crucial for video information retrieval mechanisms, association of advertisements to videos, and performance issues related to the use of caches and content distribution networks (CDNs). This work presents a wide quantitative characterization of the context differences among identical contents. Using a large video sample collected from YouTube, we construct a dataset of duplicates. Our measurement analysis provides several interesting findings that can have implications for how videos should be retrieved in video sharing websites as well as for advertising systems that need to understand the role that users play when they create content in services such as YouTube.
References
Adar E, Zhang L, Adamic L, Lukose R (2004) Implicit structure and the dynamics of blogspace. In: Workshop on the Weblogging Ecosystem
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. ACM/Addison-Wesley, New York/Reading
Benevenuto F, Duarte F, Rodrigues T, Almeida V, Almeida J, Ross K (2008) Understanding video interactions in youtube. In: ACM int’l conference on multimedia (MM)
Benevenuto F, Rodrigues T, Almeida V, Almeida J, Zhang C, Ross K (2008) Identifying video spammers in online social networks. In: Workshop on adversarial information retrieval on the web (AIRWeb)
Benevenuto F, Rodrigues T, Almeida V, Almeida J, Gonçalves M (2009) Detecting spammers and content promoters in online video social networks. In: Int’l ACM SIGIR
Benevenuto F, Rodrigues T, Almeida V, Almeida J, Ross K (2009) Video interactions in online video social networks. In: ACM trans on multimedia computing, communications and applications (TOMCCAP)
Cha M, Kwak H, Rodriguez P, Ahn Y, Moon S (2007) I tube, you tube, everybody tubes: analyzing the world’s largest user generated content video system. In: ACM SIGCOMM conference on Internet measurement (IMC)
Cherubini M, Oliveira R, Oliver N (2009) Understanding near-duplicate videos: a user-centric approach. In: ACM int’l conference on multimedia (MM)
Comscore (2010) http://www.comscore.com/. June 2010
Comscore (2010) Youtube now 25 percent of all Google searches. http://tinyurl.com/4t32l4. June 2010
del.icio.us web site (2010) http://www.delicious.com. June 2010
Flickr web site (2010) http://www.flickr.com. June 2010
Gill P, Arlitt M, Li Z, Mahanti A (2007) Youtube traffic characterization: a view from the edge. In: ACM SIGCOMM conference on Internet measurement (IMC)
Golbeck J (2008) Trust and nuanced profile similarity in online social networks. Technical report
Hauptmann A, Wu X, Ngo C, Tan H (2009) Real-time near-duplicate elimination for web video search with content and context. IEEE Trans Multimedia 11(2):196–207
Heymann P, Koutrika G, Garcia-Molina H (2007) Fighting spam on social web sites: a survey of approaches and future challenges. IEEE Internet Comput 11:36–45
Huang Z, Wang L, Shen H, Shao J, Zhou X (2009) Online near-duplicate video clip detection and retrieval: an accurate and fast system. In: IEEE int’l conference on data engineering (ICDE)
Ispell (2010) http://www.gnu.org/software/ispell/ispell.html. June 2010
Jain R (1991) The art of computer systems performance analysis: techniques for experimental design, measurement, simulation, and modeling. Wiley, New York
Jones KS, Willett P (eds) (1997) Readings in information retrieval. Morgan Kaufmann, San Mateo
Koutrika G, Effendi F, Gyöngyi Z, Heymann P, Garcia-Molina H (2007) Combating spam in tagging systems. In: Workshop on adversarial information retrieval on the Web (AIRWeb)
Lerman K, Jones L (2007) Social browsing on Flickr. In: Int’l conference on weblogs and social media (ICWSM)
Li X, Guo L, Zhao Y (2008) Tag-based social interest discovery. In: Int’l World Wide Web conference (WWW)
Marshall CC (2009) No bull, no spin: a comparison of tags with other forms of user metadata. In: ACM/IEEE conference on digital libraries (JCDL)
Oliveira R, Cherubini M, Oliver N (2009) Human perception of near-duplicate videos. In: Int’l conference on human-computer interaction (INTERACT)
Rijsbergen C (1979) Information retrieval. Butterworth, Stoneham
Rodrigues T, Benevenuto F, Almeida V, Almeida J, Gonçalves M (2009) Uma análise contextual de conteúdo duplicado no youtube. In: Simpósio Brasileiro de sistemas multimídia e Web (WebMedia)
Suchanek F, Vojnovic M, Gunawardena D (2008) Social tags: meaning and suggestions. In: ACM conference on information and knowledge management (CIKM)
Tan H-K, Ngo C-W, Hong R, Chua T-S (2009) Scalable detection of partial near-duplicate videos by visual-temporal consistency. In: ACM international conference on multimedia (MM)
Wu X, Hauptmann A, Ngo C (2007) Practical elimination of near-duplicates from web video search. In: Int’l conference on multimedia
Zhu J, Hoi S, Lyu M, Yan S (2008) Near-duplicate keyframe retrieval by nonrigid image matching. In: ACM int’l conference on multimedia (MM)
Zink M, Suh K, Gu Y, Kurose J (2008) Watch global, cache local: Youtube network traces at a campus network—measurements and implications. In: IEEE multimedia computing and networking (MMCN)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Rodrigues, T., Benevenuto, F., Almeida, V. et al. Equal but different: a contextual analysis of duplicated videos on YouTube. J Braz Comput Soc 16, 201–214 (2010). https://doi.org/10.1007/s13173-010-0019-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13173-010-0019-x