Big data, Google and unemployment

  • Raymundo M. Campos Vázquez El Colegio de México, A. C.
  • Sergio E. López-Araiza B. El Colegio de México, A. C.
Keywords: unemployment, Google, big data, machine learning, prediction
JEL Classification: C52, C53, E24, J64, O54


We use Google Trends data for employment opportunities related reply in order to forecast the unemployment rate in Mexico. We begin by discussing the literature related to big data and nowcasting in which user generated data is used to forecast unemployment. Afterwards, we explain the basics of several machine learning algorithms. Finally, we implement such algorithms in order to find the best model to predict unemployment using both Google Trends queries and unemployment lags. From a public policy perspective, we believe that both user generated data and new statistical methods may provide great tools for the design of policy interventions.


Aguiar, E., H. Lakkaraju, N. Bhanpuri, D. Miller, B. Yuhas y K.L. Addison. 2015. Who, When, and Why: A Machine Learning Approach to Prioritizing Students at Risk of not Graduating High School on Time, Proceedings of the 5th Learning Analytics and Knowledge Conference.

Athey, S. 2017. Beyond prediction: Using big data for policy problems, Science, 355(6324): 483-485.

Athey, S. 2018. The Impact of Machine Learning on Economics (2018), en A. Agrawal, J. Gans y A. Goldfarb (comps.), The Economics of Artificial Intelligence: An Agenda, NBER, cap. 21, pp. 507-547,

Baker, S. y A. Fradkin. 2017. The Impact of Unemployment Insurance on Job Search: Evidence from Google Search Data, The Review of Economics and Statistics, 99(5): 756-768.

Banco Mundial. 2014. Central America: Big Data in Action for Development,

Barjamovic, G., T. Chaney, K. Cosar y A. Hortacsu. 2017. Trade, Merchants, and the Lost Cities of the Bronze Age, WP núm. 23992, NBER, Cambridge,

Beblavy, M., L. Kurekova y A. Thum. 2014. Using Internet Data to Analyse the Labour Market: A Methodological Enquiry, IZA DP núm. 8555,

Belloni, A., V. Chernozhukov y C. Hansen. 2014. High-Dimensional Methods and Inference on Structural and Treatment Effects, Journal of Economic Perspectives, 28(2): 29-50.

Blumenstock, J. y D. Donaldson. 2013. How Do Labor Markets Equilibrate? Using Mobile Phone Records to Estimate the Effect of Local Labor Demand Shocks on Internal Migration and Local Wages, Proposal Summary for Application C2-RA4-205 (mimeo).

Bok, B., D. Caratelli, D. Giannone, A. Sbordone y A. Tambalotti. 2018. Macroeconomic Nowcasting and Forecasting with Big Data, Annual Review of Economics, 10(1): 615-643.

Carton, S., A. Mahmud, C. Cody, J. Helsby, Y. Park y R. Ghani. 2016. Identifying Police Officers at Risk of Adverse Events, Conference on Knowledge Discovery and Data Mining,

Cavallo, A. y R. Rigobon. 2016. The Billion Prices Project: Using Online Prices for Measurement and Research, Journal of Economic Perspectives, 30(2): 151-178.

Chernozhukov, V., M. Demirer, E. Duflo e I. Fernandez-Val. 2018. Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments, WP, núm. 24678, NBER, Cambridge,

Choi, H. 2010. Predicting Initial Claims for Unemployment Benefits, SSRN,

Choi, H. y H. Varian. 2012. Predicting the Present with Google Trends, Economic Record, 88(s1): 29.

Einav, L., C. Farronato, J. Levin y N. Sundaresan. 2018. Auctions versus Posted Prices in Online Markets, Journal of Political Economy, 126(1): 178-215.

Einav, L., D. Knoepfle, J. Levin y N. Sundaresan. 2014. Sales Tax and Internet Commerce, American Economic Review, 104(1): 1-26.

Einav, L. y J. Levin. 2014a. The Data Revolution and Economic Analysis, Innovation Policy and the Economy, 14(1): 1-24.

Einav, L. y J. Levin. 2014b. Economics in the Age of Big Data, Science, 345(6210): 1-6.

Erel, I., L. Henny, C. Tan y M. Weisbach. 2018. Selecting Directors Using Machine Learning, WP núm. 24435, NBER, Cambridge,

Gerunov, A. 2014. Big Data Approaches to Modeling the Labor Market, publicado en Proceedings of the International Conference on Big Data, Knowledge and Control Systems Engineering, pp. 47-56.

Gleaser, E., H. Kim y M. Luca. 2017. Nowcasting the Local Economy: Using Yelp Data to Measure Economic Activity, WP núm. 24010, NBER, Massachusetts,

Global Pulse Lab. 2014. Mining Indonesian Tweets to Understand Food Price Crises, UN GLOBAL PULSE METHODS PAPER, february,

Goel, S., J. Hofman, S. Lahaie, D. Pennock y D. Watts. 2010. Predicting consumer behavior with Web search, Proceedings of the National Academy of Sciences, 107(41): 17486-17490.

Hamermesh, D. 2013. Six Decades of Top Economics Publishing: Who and How? Journal Of Economic Literature, 51(1): 162-172.

Hastie et al. 2013. An Introduction to Statistical Learning: with Applications in R, Springer.

Hayashi, F. 2011. Econometrics, Princeton University Press.

Henderson, J., A. Storeygard y D. Weil. 2012. Measuring Economic Growth from Outer Space, American Economic Review, 102(2): 994-1028.

Hilbert, M. 2016. Big Data for Development: A Review of Promises and Challenges, Development Policy Review, 34(1): 135-174.

Horton, J. y P. Tambe. 2015. Labor Economists Get Their Microscope: Big Data and Labor Market Analysis, Big Data, 3(3): 130-137.

Hota, H.S., R. Handa y A.K. Shrivas. 2017. Time Series Data Prediction Using Sliding Window Based RBF Neural Network, International Journal of Computational Intelligence Research, 13(5): 1145-1156.

INEGI. 2017. Estadísticas a propósito del día mundial de Internet, Aguascalientes, México.

Inoue, Atsushi, Lu Jin y Barbara Rossi. 2017. Rolling window selection for out-of-sample forecasting with time-varying parameters, Journal of Econometrics, 196(1): 55-67.

Kennedy, R., D. Lazer y S. Wojcik. 2017. Improving election prediction internationally, Science, 355(6324): 515-520.

Lakkaraju, H., E. Guiar, C. Shan, D. Miller, N. Bhanpuri, R. Ghani y K.L. Addison. 2015. A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes, International Conference on Knowledge Discovery and Data Mining.

Llorente, A., M. Garcia-Herranz, M. Cebrian y E. Moro. 2015. Social Media Fingerprints of Unemployment, PLoS ONE 10(5): e0128692.

Marinescu, I. 2017. The General Equilibrium Impacts of Unemployment Insurance: Evidence from a Large Online Job Board, Journal of Public Economics, 150(c): 14-29.

Marinescu, I. y R. Wolthoff. 2016. Opening the Black Box of the Matching Function: the Power of Words, WP núm. 22508, NBER, Cambridge,

Mullainathan, S. y Z. Obermeyer. 2017. Does Machine Learning Automate Moral Hazard and Error? American Economic Review: Papers and Proceedings, 107(5): 476-480.

Mullainathan, S. y J. Spiess. 2017. Machine Learning: An Applied Econometric Approach, Journal of Economic Perspectives, 31(2): 87-106.

Rogers, S. 2016. What is Google Trends data??and what does it mean?

Rundle, A., M. Bader, C.A Richards, K.M. Neckerman y J.O. Teitler. 2011. Using Google street view to audit neighborhood environments, American Journal of Preventive Medicine, 40(1): 94-100.

Sara, N.B., R. Halland, C. Igel y S. Alstrup. 2015. High-School Dropout Prediction Using Machine Learning: A Danish Large-scale Study, en M. Verleysen (comp.), Proceedings, ESANN 2015: 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning,, pp. 319-324.

Stephens-Davidowitz, S. 2017. Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are, HarperCollins Publishers, Nueva York.

Taddy, M. 2018. The Technological Elements of Artificial Intelligence, WP núm. 24301, NBER, Cambridge,

Tuhkuri, J. 2015. Big Data: Do Google Searches Predict Unemployment?, tesis de maestría, Universidad de Helsinki,

Varian, H. 2014. Big Data: New Tricks for Econometrics, Journal of Economic Perspectives, 28(2): 3-28.