“The labor essential to data work, such as moderation and annotation, is systematically hidden from those who profit from the fruits of that labor. A new project shines a spotlight on the lived experiences of data workers around the world, showing firsthand the costs and opportunities of doing tech work overseas.”
Much of the boring, thankless, and psychologically damaging work is outsourced to poor countries, where workers are happy to take on jobs for a fraction of the wages in the U.S. or Europe. This labor market puts it in the ranks of “boring, dirty, and dangerous” jobs, such as “recycling” electronics or ship-breaking. The conditions of moderation and annotation work are unlikely to result in amputation or cancer, but that doesn’t make them safe, much less enjoyable or rewarding.
The Data Workers Survey, a joint project between the AI ethics research group DAIR and the Technical University of Berlin, is nominally modelled on Marx’s late 19th century work, specifying working conditions in its reports that are “collectively produced and politically actionable”.
The full report is freely available and was launched today at an online event where it was discussed by the people running the project.
The ever-expanding scope of AI applications is inevitably built on human expertise. And to this day, that expertise is purchased for the lowest price a company can offer without causing a PR problem. When you report a post, they don’t say, “Great, we’ll send this to someone in Syria and they’ll respond for 3 cents.” But the volume of reports (and report-worthy content) is so high that any solution other than outsourcing large amounts of the work to cheap labor markets doesn’t make much sense for the companies involved.
A closer look at the reports reveals that many of them are deliberately anecdotal: They are closer to the level of systematic anthropological observation than to quantitative analysis.
Quantifying these experiences often fails to capture the true costs. The end result is the kind of statistics companies love to tout (and therefore demand in studies): higher wages than others in the area, jobs created, savings passed on to customers, etc. Sleepless nightmares among moderators and rampant drug addiction are rarely mentioned, let alone measured or presented.
Check out Fasica Berhane Gebrekidan’s report on Kenyan data workers struggling with mental health and drug issues (full PDF here).
She and her colleagues worked at Sama, which billed itself as a more ethical data work pipeline, but as people described it, the reality of the job was unrelenting misery and a lack of support from field offices.
Whistleblower image of Samasource’s moderation workspace in Kenya. Image credit: Fasica Berhane Gebrekidan
Hired to process tickets (i.e. flagged content) in local languages and dialects, they are exposed to an endless stream of violence, cruelty, sexual abuse, hate speech, and other content that they must immediately review and “respond” to avoid their pay being docked for performance below expectations, the report said. Some of them view more than one per minute, translating to a minimum of around 500 such pieces of content per day. (One might wonder where the AI is in this, but it’s likely that it’s providing the training data.)
“It’s absolutely heartbreaking. I’ve seen the worst scenes you can imagine and I’m afraid I will be scarred for the rest of my life for having done this work,” said Rahel Gebrekirkos, one of the contractors interviewed.
Support staff were “ill-equipped, unprofessional and underqualified” and moderators frequently turned to drugs to cope, complaining of intrusive thoughts, depression and other problems.
We’ve heard stories like this before, but it’s significant to hear that it’s still happening. There are several reports of this kind, some more personal and some in different formats.
For example, Yasser Yousef Al-Rayes works as a data annotator in Syria to help pay for his higher education. He and his roommate collaborate on visual annotation tasks such as image-text analysis, but as he points out, the demands from clients are often frustrating and poorly defined.
He chose to document his work in the form of a short film that is eight minutes worth watching.
Workers like Yasser are often hidden behind many organizational layers and work as subcontractors of subcontractors, making it unclear who is responsible when problems or lawsuits arise.
Milagros Micheli of DAIR and the Technical University of Berlin, one of the project’s leaders, said that while she hadn’t seen any comments or changes from the companies named in the report, it was still early. But the results seem strong enough to warrant further investigation. “We plan to continue this work with a second group of data workers, possibly from Brazil, Finland, China and India,” she wrote.
No doubt some will discount what makes these reports valuable: their anecdotal nature. But while statistics are easy to lie about, anecdotes always contain at least some truth, because they are taken directly from the source. Even if there were only a dozen moderators in these troubled countries of Kenya, Syria, and Venezuela, their statements should worry anyone who relies on them, which is pretty much everyone.