Ghost work powering AI-based services

AI in the sustainable cities of the future, part 4 / AI i framtidens hållbara städer, del 4

Published in

Viable Cities

9 min readJan 13, 2020

Ghost graffiti by Yair Talmor (2014), in public domain.

I recently came across a very colourful pollution map over a popular area in the city of Stockholm. The map showed parts of the area as red, with high pollution from particles originating from nearby roads and a car tunnel, and other areas as blue, indicating low pollution. The map looked nothing like the graphs produced by the City of Stockholm to indicate PM10 for particle measurements. For instance, the difference in height, as measured from the ground, is usually part of an environmental consequence study, like this picture from the same area (left is 15 meters above ground, right is 25).

Example taken from an environmental consequence study from Stockholm. The yearly norm for PM10 is that the average should not exceed 40 microgrammes per cubic meter of air, which is easy to plot and easy to interpret.

After closer examination, it turned out the map I saw was part of a sales pitch for a flat that sat safely inside a blue area on the map. It was stated neither how high the flat were placed, nor were there any references to official measurements. The broker that put the sales material together instead spoke of sensor networks picking up particle readings, implying that some automation was involved. But since cheap Internet of Things-style sensors can not measure so small particles, the ones small enough to do damage to humans in large concentrations, there were obviously humans involved in doing the calculations underlying this map, assuming the map was not a fake.

A dozen or so years ago, I gave a conference keynote. My take home message was simple. The Semantic Web should not be made useful and efficient by letting humans tag data for machines to read. I illustrated the last slide with a Chinese tea thermos, because in China, India, and some countries in Southern Europe, sweatshops had opened up with the purpose of making M2M (machines talking to machines) on the Web more efficient. The law of large numbers and other factors would supposedly make it unnecessary to rate the human tagging efforts: it would be enough to rate the intelligent Web applications that could be built on top of all those tags. Intelligent agents would now be able to roam the Web in new ways, standing on the solid foundation provided by ontologies and standards. And tags.

Ontologically, in the original sense of the philosophical term, not the one appropriated by Web intelligence enthusiasts, this was significant because the Internet Protocol and most of how Web nodes were talking to each other were client/server. The client/server architecture meant that there were masters and slaves, rather than peers talking to each other. Masters delegated tasks to slaves, and slaves adopted tasks as best they could. The intelligent agent world, on the other hand, promoted P2P (peers talking to peers, and reasoning with each other about tasks) communication, since multi-agent systems stressed the distributed nature of group reasoning at its core. To have a centralised master agent that could override voting protocols or game-theoretic reasoning, for instance, was considered cheating, or at best an ugly old school monolithic solution. So now a Web application (they were not yet called apps) could consist of anything from one up to thousands of intelligent agents, collaborating within the application but also with agents from other applications. Great!

When large companies, start-ups, and government agencies started to produce Web applications, most of them found that it was hard to get user acceptance, not to mention willingness to pay. The main reason for this was that the applications sucked. The worst part was the bootstrapping: you needed to fill out your profile in order to get the customised and personalised intelligent service you deserved. Many of us got tired of this quickly, after having filled out a hundred or so long forms with personal information. And this well before privacy concerns, sales of data to third parties, GDPR, etc. User interfaces were clunky, and often the business models were wrong. Users were mixed up with usage, and ad agencies often had the final say on how things were presented to the prospective users.

Data quality and quantity were generally deemed to be bottlenecks, and one proposed solution was to annotate more data being used by agents. Because no algorithms for automated annotation with good enough quality were available at the time, humans were set to do the annotating. Algorithms could be used to link data to metadata, or even to build metadata from data automatically (or rather, automagically: ”this looks like a collection of photographs of 19th century paintings, from museums all over the world” would be an example of non-standardised free text metadata). Metadata standardised according to some so-called Web ontology would be fully structured data entries, but even so each line of those entries would typically be put there by a human.

Annotation and describing data by adding structured metadata is mind-numbingly boring work. To be clear, in the history of office work, it might not have qualified as boring even, as a lot of early IT-based office work had to do with just registering data entries on a dumb terminal, connected to a mainframe computer. But in the early days of the Web and intelligent agent technology, mid-1990s, most of the dumb terminals had started to communicate with each other, en route to become personal computers. In short, offices were opening up to giving their clerks much more interesting and varied things to do. In this light, going back to the dumb terminal tagging and registering, with the purpose of letting machines rather than people communicate more seemed to me at the time of that keynote to be a very bad idea. And it never really did take off. The Semantic Web did not become a separate stratum of the Internet on which M2M transformed B2B (business to business). Corporations did not represent themselves in negotiations by intelligent agents. City dwellers did not find out what went down in their city by monitoring the agents of their elected politicians. They did not book their holiday trips using intelligent agents. What did happen was that the back office services, the system the staff at the booking agency for the holiday trip saw on their personal computers, changed in that their bookings (like getting you a specific seat on a specific flight) were partly automated and possibly made more efficient. But since this did not seem to make my own holiday trip booking any easier, as a customer, I did not care.

Ghost work is when humans do unacknowledged or invisible work with the purpose of letting machines do work. An example is crowdsourcing via services like Mechanical Turk. The slogan for Mechanical Turk is Access a global, on-demand, 24x7 workforce, which sounds like a machine park to me, but is in fact a gig economy human resource marketplace. A number of books, and thousands of social media posts, have appeared in 2019 on ghost work, most of which are concerned with disappearing jobs. As most of the ghost work is intended to improve a service, ghost work is always time-limited for each application: once the improvements are over the threshold, the machines are good enough and on their own, and the humans move to the next application. In other words, ghost work and the gig economy go hand in hand and so these jobs are supposed to disappear. In a perfect future world, these jobs would not exist in the first place. A case in point from BBC Worklife illustrates why:

Shawn Speagle, 26, moderated graphic content for Facebook, but was employed by outsourcing firm Cognizant in Tampa, Florida. For $15 an hour, he faced a stream of graphic content including animal torture, child pornography and death, but the workflow meant he had to watch videos in their entirety even if he’d seen them 30 times. He says daily, unexplained and seemingly arbitrary policy changes meant disturbing material was often left online. He couldn’t directly alert authorities about crimes, he says, and never heard back about his escalations to the responsible Facebook team. “I definitely felt like I was just a cog in the machine,” he says. “I never got any idea if I was actually helping.”

And now the Semantic Web story is repeating itself, serving up a new family of applications for ghost work. In data-driven reasoning, AI systems are always hungry for more data. A cheap way to serve them is by gathering data from sensors, like the digital traces we generate by moving around with our phones. The Internet of Things in particular is served by vast amounts of cheap sensors. In order for machine learning algorithms to crunch this data and so provide a backbone for AI-based services, like informing us of the latest news on the dynamic city traffic situation, pre-processing is needed. Faulty sensor readings are ideally weeded out from the sensor data before they reach the comma-separated files that machine learning requires. Since anomaly detection is machine learning forte, let us make them do their own work. The human effort required to make this happen is based on engineering and methodological excellence: a long way away from the moronic and mind-destroying activities that constitute ghost work.

ABOUT THIS SERIES (English): This is an entry in the blog series AI in the sustainable cities of the future that looks at promises that artificial intelligence (AI) will make the sustainable smart cities of the future possible. In these posts, we will in particular look at Sweden and the ambitions of many Swedish cities in line with the Viable Cities mission to be climate neutral in 2030, as part of meeting the UN’s 17 sustainability goals in Agenda 2030. The series also points to relevant and successful investments in AI around the world and do interviews with particularly interesting people in this context.

The series aims to analyze the gap that the writers believe exists between AI as an idea and AI application in reality. The idea is often (but not always) discussed by non-experts who learn what AI can do by reading popular descriptions, while applications are discussed by experts who are often very deeply into narrow technology pursuits and read specialist literature. When those who decide on what is going to happen in a city around AI in the future develop and use information as a basis for decisions, maintaining this gap can create false expectations and in the worst case also to erroneous decisions. The hope is to shed some light and help improve this situation.

This blog series is done within the framework of Viable Cities by AI professors Hedvig Kjellström and Magnus Boman (both KTH Rayal Institute of Technology) and communications officer Marcus Törncrantz. Feel free to contact hedvig@kth.se, mab@kth.se, or marcustorn@gmail.com. Thanks for reading!

OM DENNA SERIE (svenska): Det här är ett inlägg i bloggserien AI i framtidens hållbara städer som skärskådar löften om att artificiell intelligens (AI) ska göra framtidens hållbara smarta städer möjliga. I inläggen tittar vi särskilt på Sverige och den ambition många av våra städer har i linje med Viable Cities mission att vara klimatneutrala år 2030, som en del i uppfyllandet av FN:s 17 hållbarhetsmål i Agenda 2030. Serien pekar också på relevanta och lyckade satsningar på AI världen över och gör intervjuer med särskilt intressanta personer i sammanhanget.

Serien tar sikte på att analysera det gap som skribenterna tror finns mellan AI som idé och AI tillämpat i verkligheten. Idén diskuteras oftast (men inte alltid) av icke-experter som läst sig till vad AI kan göra genom att läsa populärbeskrivningar, medan tillämpningarna diskuteras av experter som är djupt försjunkna i snäva teknikutmaningar och mest läser specialistlitteratur. När de som bestämmer över vad som ska hända i en stad kring AI i framtiden tar fram och nyttjar beslutsunderlag kan detta gap skapa felaktiga förväntningar och i värsta fall leda till felaktiga beslut. Förhoppningen är att belysa och bidra till att förbättra den här situationen.

Bloggserien görs inom ramen för Viable Cities av AI-professorerna Hedvig Kjellström och Magnus Boman (båda KTH) och kommunikatören Marcus Törncrantz. Ta gärna kontakt med hedvig@kth.se, mab@kth.se, eller marcustorn@gmail.com. Tack för att du läser!

Ghost work powering AI-based services

AI in the sustainable cities of the future, part 4 / AI i framtidens hållbara städer, del 4

Written by Magnus Boman