Predicting the Working Time of Microtasks Based on Workers' Perception of Prediction Errors

Susumu Saito; Chun-Wei Chiang; Saiph Savage; Teppei Nakano; Tetsunori Kobayashi; Jeffrey Bigham

doi:10.15346/hc.v6i1.10

Authors

Susumu Saito Waseda University
Chun-Wei Chiang West Virginia University
Saiph Savage Universidad Nacional Autónoma de México
Teppei Nakano Waseda University
Tetsunori Kobayashi Waseda University
Jeffrey Bigham Carnegie Mellon University

DOI:

https://doi.org/10.15346/hc.v6i1.10

Keywords:

Amazon Mechanical Turk, Working time prediction

Abstract

Crowd workers struggle to earn adequate wages. Given the limited task-related information provided on crowd platforms, workers often fail to estimate how long it would take to complete certain microtasks. Although there exist a few third-party tools and online communities that provide estimates of working times, such information is limited to microtasks that have been previously completed by other workers, and such tasks are usually booked immediately by experienced workers. This paper presents a computational technique for predicting microtask working times (i.e., how much time it takes to complete microtasks) based on past experiences of workers regarding similar tasks. The following two challenges were addressed during development of the proposed predictive model --- (i) collection of sufficient training data labeled with accurate working times, and (ii) evaluation and optimization of the prediction model. The paper first describes how 7,303 microtask submission data records were collected using a web browser extension --- installed by 83 Amazon Mechanical Turk (AMT) workers --- created for characterization of the diversity of worker behavior to facilitate accurate recording of working times. Next, challenges encountered in defining evaluation and/or objective functions have been described based on the tolerance demonstrated by workers with regard to prediction errors. To this end, surveys were conducted in AMT asking workers how they felt regarding prediction errors in working times pertaining to microtasks simulated using an "imaginary" AI system. Based on 91,060 survey responses submitted by 875 workers, objective/evaluation functions were derived for use in the prediction model to reflect whether or not the calculated prediction errors would be tolerated by workers. Evaluation results based on worker perceptions of prediction errors revealed that the proposed model was capable of predicting worker-tolerable working times in 73.6% of all tested microtask cases. Further, the derived objective function contributed to realization of accurate predictions across microtasks with more diverse durations.

Author Biography

Susumu Saito, Waseda University

Currently Ph.D. student in Dept. of Computer Science and Communications Engineering (March 2020; expected).Studied in Carnegie Mellon University, Pittsburgh, PA, as a visiting scholar (September 2017 - March 2018).B.E. (March 2015) and M.E. (March 2017) in Dept. of Computer Science and Communications Engineering.

References

Alsayasneh, M, Amer-Yahia, S, Gaussier, E, Leroy, V, Pilourdault, J, Borromeo, R. M, Toyama, M, and Renders, J.-M. (2017). Personalized and diverse task composition in crowdsourcing. IEEE Transactions on Knowledge and Data Engineering 30, 1 (2017), 128–141.

Barowy, D. W, Berger, E. D, Goldstein, D. G, and Suri, S. (2017). Voxpl: Programming with the wisdom of the crowd. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 2347–2358.

Bederson, B. B and Quinn, A. J. (2011). Web workers unite! addressing challenges of online laborers. In CHI'11 Extended Abstracts on Human Factors in Computing Systems. ACM, 97–106.

Berg, J. (2015). Income security in the on-demand economy: Findings and policy lessons from a survey of crowdworkers. Comp. Lab. L. & Pol'y J. 37 (2015), 543.

Brewer, R, Morris, M. R, and Piper, A. M. (2016). Why would anybody do this?: Understanding older adults' motivations and challenges in crowd work. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 2246–2257.

Britt, S. H. (1975). How Weber's law can be applied to marketing. Business Horizons 18, 1 (1975), 21–29.

Callison-Burch, C. (2014). Crowd-workers: Aggregating information across turkers to help them find higher paying work. In Second AAAI Conference on Human Computation and Crowdsourcing.

Cheng, J, Teevan, J, Iqbal, S. T, and Bernstein, M. S. (2015). Break it down: A comparison of macro-and microtasks. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 4061–4064.

Chiang, C.-W, Kasunic, A, and Savage, S. (2018). Crowd Coach: Peer Coaching for Crowd Workers' Skill Growth. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 37.

Chilton, L. B, Horton, J. J, Miller, R. C, and Azenkot, S. (2010). Task search in a human computation market. In Proceedings of the ACM SIGKDD workshop on human computation. ACM, 1–9.

Coetzee, D, Lim, S, Fox, A, Hartmann, B, and Hearst, M. A. (2015). Structuring interactions for large-scale synchronous peer learning. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, 1139–1152.

Dontcheva, M, Morris, R. R, Brandt, J. R, and Gerber, E. M. (2014). Combining crowdsourcing and learning to improve engagement and performance. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3379–3388.

Durward, D, Blohm, I, and Leimeister, J. M. (2016). Crowd work. Business & Information Systems Engineering 58, 4 (2016), 281–286.

Fechner, G. T, Howes, D. H, and Boring, E. G. (1966). Elements of psychophysics. Vol. 1. Holt, Rinehart and Winston New York.

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189–1232.

Hanrahan, B. V, Willamowski, J. K, Swaminathan, S, and Martin, D. B. (2015). TurkBench: Rendering the market for Turkers. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1613–1616.

Hara, K, Adams, A, Milland, K, Savage, S, Callison-Burch, C, and Bigham, J. P. (2018). A data-driven analysis of workers' earnings on amazon mechanical turk. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 449.

Hara, K and Bigham, J. P. (2017). Introducing people with ASD to crowd work. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility. ACM, 42–51.

Hitlin, P. (2016). Research in the Crowdsourcing Age, a Case Study: How Scholars, Companies and Workers are Using Mechanical Turk, a "gig Economy" Platform, for Tasks Computers Can't Handle. Pew Research Center.

Horton, J. J. (2011). The condition of the Turking class: Are online employers fair and honest? Economics Letters 111, 1 (2011), 10–12.

Horton, J. J and Chilton, L. B. (2010). The labor economics of paid crowdsourcing. In Proceedings of the 11th ACM conference on Electronic commerce. ACM, 209–218.

(ILO), I. L. O. (2016). Non-standard employment around the world: Understanding challenges, shaping prospects. (2016).

Ipeirotis, P. G. (2010). Analyzing the amazon mechanical turk marketplace. XRDS: Crossroads, The ACM Magazine for Students, Forthcoming (2010).

Irani, L. C and Silberman, M. (2013). Turkopticon: Interrupting worker invisibility in amazon mechanical turk. In Proceedings of the SIGCHI conference on human factors in computing systems. ACM, 611–620.

Irani, L. C and Silberman, M. (2016). Stories we tell about labor: Turkopticon and the trouble with design. In Proceedings of the 2016 CHI conference on human factors in computing systems. ACM, 4573–4586.

Kaplan, T, Saito, S, Hara, K, and Bigham, J. P. (2018). Striving to earn more: a survey of work strategies and tool use among crowd workers. In Sixth AAAI Conference on Human Computation and Crowdsourcing.

Karger, D. R, Oh, S, and Shah, D. (2011). Iterative learning for reliable crowdsourcing systems. In Advances in neural information processing systems. 1953–1961.

Katz, M. (2017). Amazon Mechanical Turk Workers Have Had Enough. (2017). https://www.wired.com/story/amazons-turker-crowd-has-had-enough/

Ke, G, Meng, Q, Finley, T, Wang, T, Chen, W, Ma, W, Ye, Q, and Liu, T.-Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems. 3146–3154.

Ketkar, N. (2017). Introduction to pytorch. In Deep learning with python. Springer, 195–208.

Kuek, S. C, Paradi-Guilford, C, Fayomi, T, Imaizumi, S, Ipeirotis, P, Pina, P, and Singh, M. (2015). The global opportunity in online outsourcing. (2015).

Kuroda, T and Hasuo, E. (2014). The very first step to start psychophysical experiments. Acoustical Science and Technology 35, 1 (2014), 1–9.

Litman, L, Robinson, J, and Abberbock, T. (2017). TurkPrime. com: A versatile crowdsourcing data acquisition platform for the behavioral sciences. Behavior research methods 49, 2 (2017), 433–442.

Martin, D, Hanrahan, B. V, O'Neill, J, and Gupta, N. (2014). Being a turker. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, 224–235.

Mason, W and Suri, S. (2012). Conducting behavioral research on Amazon's Mechanical Turk. Behavior research methods 44, 1 (2012), 1–23.

McInnis, B, Cosley, D, Nam, C, and Leshed, G. (2016). Taking a HIT: Designing around rejection, mistrust, risk, and workers' experiences in Amazon Mechanical Turk. In Proceedings of the 2016 CHI conference on human factors in computing systems. ACM, 2271–2282.

Nair, V and Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10). 807–814.

O'neill, J and Martin, D. (2013). Relationship-based Business Process Crowdsourcing?. In IFIP Conference on Human-Computer Interaction. Springer, 429–446.

Reichl, P, Egger, S, Schatz, R, and D'Alconzo, A. (2010). The logarithmic nature of QoE and the role of theWeber-Fechner law in QoE assessment. In 2010 IEEE International Conference on Communications. IEEE, 1–5.

Rzeszotarski, J. M and Kittur, A. (2011). Instrumenting the crowd: using implicit behavioral measures to predict task performance. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 13–22.

Saito, S, Chiang, C.-W, Savage, S, Nakano, T, Kobayashi, T, and Bigham, J. P. (2019)a. TurkScanner: Predicting the Hourly Wage of Microtasks. In The World Wide Web Conference. ACM, 3187–3193.

Saito, S, Nakano, T, Kobayashi, T, and Bigham, J. P. (2019)b. MicroLapse: Measuring Workers' Leniency To Prediction Errors of Microtasks' Working Times. In Submitted to the 22nd ACM Conference on Computer Supported Cooperative Work and Social Computing Companion (Submitted). ACM.

Salehi, N, Irani, L. C, Bernstein, M. S, Alkhatib, A, Ogbe, E, Milland, K, and others, . (2015). We are dynamo: Overcoming stalling and friction in collective action for crowd workers. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. ACM, 1621–1630.

Sher, V, Bemis, K. G, Liccardi, I, and Chen, M. (2017). An empirical study on the reliability of perceiving correlation indices using scatterplots. In Computer Graphics Forum, Vol. 36. Wiley Online Library, 61–72.

Silberman, M, Ross, J, Irani, L, and Tomlinson, B. (2010). Sellers' problems in human computation markets. In Proceedings of the acm sigkdd workshop on human computation. ACM, 18–21.

Thies, W, Ratan, A, and Davis, J. (2011). Paid crowdsourcing as a vehicle for global development. In CHI Workshop on Crowdsourcing and Human Computation.

Woodworth, R. S and Schlosberg, H. (1954). Experimental psychology. Oxford and IBH Publishing.

Wu, M.-H and Quinn, A. J. (2017). Confusing the crowd: Task instruction quality on amazon mechanical turk. In Fifth AAAI Conference on Human Computation and Crowdsourcing.

Yelle, L. E. (1979). The learning curve: Historical review and comprehensive survey. Decision sciences 10, 2 (1979), 302–328.