By Camilo Arias Martelo
The effective design, implementation, and evaluation of public policies rely on accurate socioeconomic data. It would be impossible, for instance, to implement a poverty alleviation program without knowing who the relevant target population is, where they live and their economic status. As the United Nations Population Fund puts it, “without accurate data, those most in need remain invisible.”
Traditionally, governments have relied on national censuses to gather this critical data. Data from these country-wide surveys tend to be fairly accurate, but can be expensive, too expensive in some cases for some poorer countries to coordinate on a regular basis. The Democratic Republic of the Congo and Eritrea took their last census in 1985, Myanmar’s last census was in 1983, and Lebanon has not had a census since 1932. Professor Joshua Blumenstockfromthe University of California, Berkeley’s School of Information recently published a study in the American Economic Association Papers and Proceedings,in which he described a machine learning approach he developed to estimate socioeconomic characteristics using “call detail records”, records of phone calls, text messages, airtime purchases, mobile money use and other user data. In his study, Blumenstock approximated the wealth of people in Rwanda and Afghanistan using just two months of call detail records, achieving the same accuracy of a national census at a fraction of the cost.
The idea underlying Blumenstock’s research is that a person’s wealth is correlated with mobile transactions. By determining these correlations, a researcher can use call detail records to estimate general trends for an individual, a region or even an entire country.
Blumenstock gathered wealth information from 856 mobile phone users in Rwanda in 2005 and 1,234 users in Afghanistan in 2016 through phone interviews, and then obtained the last two months of call detail records of each respondent. The resulting dataset contained tens of thousands of transactions, each described with fields including the caller’s and receiver’s identities, dates, durations, costs and the location of the cellphone tower nearest to both parties. Blumenstock then developed a machine learning algorithm to build a model that linked the call detail records to population wealth levels.
For both countries, the resulting models had average accuracy levels that were like the accuracy of a five-year-old national census. In terms of scale, Blumenstock’s models were robust enough to approximate the wealth levels of 30% of the entire population of Afghanistan, and 10% of the population of Rwanda. More data could expand these models to encompass entire nations.
The potential applications of machine learning methods in socioeconomic data-gathering are vast. While the author acknowledged that machine learning cannot produce the same results as a national census in terms of the amount and accuracy of information, this method could still offer a general measure when a census would be prohibitively expensive. For more specific policy objectives, call detail records and machine learning could identify population segments with which to design targeted programs. Moreover, machine learning may also help improve impact evaluations, since its low cost enables more frequent monitoring of the same population over time.
Given that phone use patterns differ across communities and evolve over time, further research could address how effectively this model accounts for variation over geographic distance or time. Future models could also combine call detail records with other indirect sources of socioeconomic information – such as nighttime lighting, which was used by researchers in 2016– in conjunction with machine learning techniques to predict world poverty distribution. Finally, with the increasing penetration of mobile technology – which by 2025 could reach 71% of the total world population and could surpass 50% in the world’s poorest regions– we can certainly expect the availability of useful data from phones to increase over time, enabling further data-driven policy applications.
This article was originally published in the Chicago Policy Review on April 16, 2019.