Topical Video Search: Analysing Video Concept Annotation through Crowdsourcing Games

Riste Gligorov, Michiel Hildebrand, Jacco van Ossenbruggen, Lora Aroyo, Guus Schreiber


Games with a purpose (GWAPs) are increasingly used in audio-visual collections as a mechanism for annotating videos through tagging. One such GWAP is Waisda?, a video labeling game where players tag streaming video and win points by reaching consensus on tags with other players. The open-ended and unconstrained manner of tagging in the fast-paced setting of the game has fundamental impact on the resulting tags. We find that Waisda? tags predominately describe visual objects and rarely refer to the topics of the videos. In this study we evaluate to what extent the tags entered by players can be regarded as topical descriptors of the video material.  Moreover, we characterize the quality of the user tags as topical descriptors with the aim to detect and filter out the bad ones. Our results show that after filtering,  game tags perform equally well compared to the manually crafted metadata when it comes to accessing the videos based on topic. An important consequence of this finding is that tagging games can provide a cost-effective alternative in situations when manual annotation by professionals is too costly.


Games with a purpose; video; tagging; tag quality; topical video retrieval

Full Text:



Ding, Y, Yan, E, Frazho, A. R, and Caverlee, J. (2009). PageRank for ranking authors in co-citation networks. CoRR abs/1012.4872 (2009).

Floeck, F, Putzke, J, Steinfels, S, Fischbach, K, and Schoder, D. (2011). Imitation and Quality of Tags in Social Bookmarking Systems - Collective Intelligence Leading to Folksonomies. In On Collective Intelligence, TheoJ. Bastiaens, Ulrike Baumöl, and BerndJ. Krämer (Eds.). Advances in Intelligent and Soft Computing, Vol. 76. Springer Berlin Heidelberg, 75-91.

Furnas, G. W, Landauer, T. K, Gomez, L. M, and Dumais, S. T. (1987). The Vocabulary Problem in Human-system Communication. Commun. ACM 30, 11 (Nov. 1987), 964-971.

Geisler, G and Burns, S. (2007). Tagging video: conventions and strategies of the YouTube community. In JCDL. ACM, New York, NY, USA, 480-480.

Gligorov, R, Hildebrand, M, van Ossenbruggen, J, Aroyo, L, and Schreiber, G. (2013). An Evaluation of Labelling-Game Data for Video Retrieval. In ECIR. 50-61.

Golder, S. A and Huberman, B. A. (2006). Usage patterns of collaborative tagging systems. Journal of information science 32, 2 (2006), 198-208.

Halvey, M. J and Keane, M. T. (2007). Analysis of Online Video Search and Sharing. In HT. ACM, New York, NY, USA, 217-226.

Hanneman, R. A and Riddle, M. (2005). Introduction to social network methods. University of California, Riverside.

Hildebrand, M, Brinkerink, M, Gligorov, R, Steenbergen, M. V, Huijkman, J, and Oomen, J. (2013). Waisda?: video labeling game.. In ACM Multimedia. ACM, 823-826.

Hildebrand, M and van Ossenbruggen, J. (2012). Linking user generated video annotations to the web of data. In MMM. Springer-Verlag, Berlin, Heidelberg, 693-704.

Hofmann, T. (1999). Probabilistic latent semantic indexing. In SIGIR. ACM, New York, NY, USA, 50-57.

Hotho, A, Jäschke, R, Schmitz, C, and Stumme, G. (2006). Information Retrieval in Folksonomies: Search and Ranking. In ESWC.

Springer-Verlag, Berlin, Heidelberg, 411-426.

Ji-Lung, H and Li-Chiao, C. (2011). Network Analysis of Tagging Structure. ASIST 2011 (2011).

Jin, Y, Khan, L, and Prabhakaran, B. (2010). Knowledge Based Image Annotation Refinement. J. Signal Process. Syst. 58, 3 (March 2010), 387-406.

Jones, K. S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28 (1972), 11-21.

Junker, B. H and Schreiber, F. (2008). Analysis of biological networks. Vol. 2. John Wiley & Sons.

Kennedy, L, Slaney, M, and Weinberger, K. (2009). Reliable Tags Using Image Similarity: Mining Specificity and Expertise from Large-scale Multimedia Databases. In WSMC. ACM, New York, NY, USA, 17-24.

Koschützki, D and Schreiber, F. (2008). Centrality Analysis Methods for Biological Networks and Their Application to Gene Regulatory Networks. Gene Regulation and Systems Biology 2 (05 2008), 193-201.

Lee, S, De Neve, W, and Ro, Y. M. (2010). Tag Refinement in an Image Folksonomy Using Visual Similarity and Tag Co-occurrence Statistics. Image Commun. 25, 10 (2010), 761-773.

Lee, S, De Neve, W, and Ro, Y. M. (2012). Towards Data-driven Estimation of Image Tag Relevance Using Visually Similar and Dissimilar Folksonomy Images. In SAM. ACM, New York, NY, USA, 3-8.

Li, M, Tang, J, Li, H, and Zhao, C. (2012). Tag Ranking by Propagating Relevance over Tag and Image Graphs. In ICIMCS. ACM, New York, NY, USA, 153-156.

Li, X, Snoek, C. G, and Worring, M. (2008). Learning Tag Relevance by Neighbor Voting for Social Image Retrieval. In MIR. ACM, New York, NY, USA, 180-187.

Liu, D, Hua, X.-S, Yang, L, Wang, M, and Zhang, H.-J. (2009). Tag Ranking. In WWW ’09. ACM, New York, NY, USA, 351-360.

Marshall, C. C. (2009). No Bull, No Spin: A Comparison of Tags with Other Forms of User Metadata. In JCDL. ACM, New York, NY, USA, 241-250.

Miller, G. A. (1995). WordNet: A Lexical Database for English. COMMUNICATIONS OF THE ACM 38 (1995), 39-41.

Morrison, P. J. (2008). Tagging and searching: Search retrieval effectiveness of folksonomies on the World Wide Web. Inf. Process. Manage. 44 (July 2008), 1562-1579. Issue 4.

Oomen, J, Gligorov, R, and Hildebrand, M. (2014). Crowdsourcing our Cultural Heritage. Ashgate, Chapter Waisda?: making videos findable through crowdsourced annotations.

Panofsky, E. (1972). Studies in Iconology: Humanistic Themes in the Art of the Renaissance. Harper & Row. 262 pages.

Rorissa, A. (2010). A comparative study of Flickr tags and index terms in a general image collection. JASIST 61, 11 (2010), 2230-2242.

Salton, G and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. In INFORMATION PROCESSING AND MANAGEMENT. 513-523.

Shatford, S. (1986). Analyzing the Subject of a Picture: A Theoretical Approach. Cataloging & Classification Quarterly 6 (1986), 39 - 62.

Shen, K and Wu, L. (2005). Folksonomy as a Complex Network. CoRR abs/cs/0509072 (2005).

Sigurbjörnsson, B and van Zwol, R. (2008). Flickr Tag Recommendation Based on Collective Knowledge. In WWW. New York, NY, USA, 327-336.

Smucker, M. D, Allan, J, and Carterette, B. (2007). A comparison of statistical significance tests for information retrieval evaluation. In CIKM. ACM, New York, NY, USA, 623-632.

Sun, A and Bhowmick, S. S. (2010). Quantifying Tag Representativeness of Visual Content of Social Images. In MM. ACM, New York, NY, USA, 471-480.

Truong, B. Q, Sun, A, and Bhowmick, S. S. (2012). Content is Still King: The Effect of Neighbor Voting Schemes on Tag Relevancefor Social Image Retrieval. In ICMR. ACM, Article 9, 8 pages.

Voorhees, E. M. (2002). The Philosophy of Information Retrieval Evaluation. In CLEF. Springer-Verlag, London, UK, UK, 355-370.

Wang, C, Jing, F, Zhang, L, and Zhang, H.-J. (2006). Image Annotation Refinement Using Random Walk with Restarts. In MULTIMEDIA. ACM, New York, NY, USA, 647-650.

Wang, Y and Gong, S. (2007). Refining Image Annotation Using Contextual Relations Between Words. In CIVR ’07. ACM, New York, NY, USA, 425-432.

Wu, C. (2008). Analysis of Tags as a Social Network.. In CSSE (4) (2009-01-28). IEEE Computer Society, 651-654.

Yanbe, Y, Jatowt, A, Nakamura, S, and Tanaka, K. (2007). Can Social Bookmarking Enhance Search in the Web?. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ’07). ACM, New York, NY, USA, 107-116.

Zhao, Y, Zha, Z.-J, Li, S, and Wu, X. (2010). Which Tags Are Related to Visual Content?. In MMM’10. Springer-Verlag, Berlin, Heidelberg, 669-675.


  • There are currently no refbacks.