Nearly all organizations today are doing some of their business in the cloud, but the push for increased feature performance and reliability has sparked a growing number to embrace a cloud native infrastructure. In Capgemini’s survey of more than 900 executives, adoption of cloud native apps is set to jump from 15% to 32% by 2020. The strong combination of growth in cloud native adoption and the considerable opportunities it creates for organizations is why we’re making cloud native a core theme at the O’Reilly Velocity Conference this year.
What’s the appeal of cloud native? These days consumers demand instant access to services, products, and data across any device, at any time. This 24/7 expectation has changed how companies do business, forcing many to move their infrastructure to the cloud to provide the fast, reliable, always-available access on which we’ve come to rely.
Yet, merely packaging your apps and moving them to the cloud isn’t enough. To harness the cloud’s cost and performance benefits, organizations have found that a cloud native approach is a necessity. Cloud native applications are specifically designed to scale and provision resources on the fly in response to business needs. This lets your apps run efficiently, saving you money. These apps are also more resilient, resulting in less downtime and happier customers. And as you develop and improve your applications, a cloud native infrastructure makes it possible for your company to deploy new features faster, more affordably, and with less risk.
Cloud native considerations
The Cloud Native Computing Foundation (CNCF) defines cloud native as a set of technologies designed to:
…empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.
These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.
The alternative to being cloud native is to either retain your on-premises infrastructure or merely “lift and shift” your current infrastructure to the cloud. Both options result in your existing applications being stuck with their legacy modes of operation and unable to take advantage of the cloud’s built-in benefits.
While “lift and shift” is an option, it’s become clear as enterprises struggle to manage cloud costs and squeeze increased performance from their pipelines that it’s not enough to simply move old architectures to new locations. To remain competitive, companies are being forced to adopt new patterns, such as DevOps and site reliability engineering, and new tools like Kubernetes, for building and maintaining distributed systems that often span multiple cloud providers. Accordingly, use of cloud native applications in production has grown more than 200% since December 2017.
And the number of companies contributing to this space keeps growing. The CNCF, home to popular open source tools like Kubernetes, Prometheus, and Envoy, has grown to 350 members compared to fewer than 50 in early 2016. The community is extremely active—the CNCF had more than 47,000 contributors work on its projects in 2018. “This is clearly a sign that the cloud native space is a place companies are investing in, which means increased demand for resources,” said Boris Scholl, product architect for Microsoft Azure, in a recent conversation.
But going cloud native is not all sunshine and roses; it’s hard work. The systems are inherently complex, difficult to monitor and troubleshoot, and require new tools that are constantly evolving and not always easy to learn. Vendor lock-in is a concern as well, causing many companies to adopt either a multi-cloud approach (where they work with more than one public cloud vendor) or a hybrid cloud approach (a combination of on-premises private cloud and third-party public cloud infrastructure, managed as one), which adds complexity in exchange for flexibility. Applications that are developed specifically to take advantage of one cloud provider’s infrastructure are not very portable.
The challenges are not all technical, either. Going cloud native requires new patterns of working and new methods of collaborating, such as DevOps and site reliability engineering. To be successful, these shifts need buy-in from every part of the business.
In Solstice’s Cloud Native Forecast for 2019, the authors highlight the challenges of change as a top trend facing the cloud community this year. “One of the most challenging aspects of cloud-native modernization is transforming an organization’s human capital and culture,” according to the report. “This can involve ruthless automation, new shared responsibilities between developers and operations, pair programming, test-driven development, and CI/CD. For many developers, these changes are simply hard to implement.”
Cloud native and the evolution of the O’Reilly Velocity Conference
We know businesses are turning to cloud native infrastructure because it helps them meet and exceed the expectations of their customers. We know cloud native methods and tools are expanding and maturing. And we know adoption of cloud native infrastructure is not an easy task. These factors mean systems engineers and operations professionals—the audience Velocity serves—are being asked to learn new techniques and best practices for building and managing the cloud native systems their companies need.
Evolving toward cloud native is a natural step for Velocity because it has a history of shifting as technology shifts. The event’s original focus on WebOps grew to encompass a broader audience: systems engineers. Our community today has emerged from their silos to take part in cross-functional teams, building and maintaining far more interconnected, distributed systems, most of which are hosted, at least in part, on the cloud. Our attendees have experienced first-hand the raft of new challenges and opportunities around performance, security, and reliability in building cloud native systems.
At Velocity, our mission is to provide our audience with the educational resources and industry connections they need to successfully build and maintain modern systems, which means turning the spotlight to cloud native infrastructure. We hope you’ll join us as we explore cloud native in depth at our 2019 events in San Jose (June 10-13, 2019) and Berlin (November 4-7, 2019).
Like many others, I’ve known for some time that machine learning models themselves could pose security risks. A recent flourish of posts and papers has outlined the broader topic, listed attack vectors and vulnerabilities, started to propose defensive solutions, and provided the necessary framework for this post. The objective here is to brainstorm on potential security vulnerabilities and defenses in the context of popular, traditional predictive modeling systems, such as linear and tree-based models trained on static data sets. While I’m no security expert, I have been following the areas of machine learning debugging, explanations, fairness, interpretability, and privacy very closely, and I think many of these techniques can be applied to attack and defend predictive modeling systems.
In hopes of furthering discussions between actual security experts and practitioners in the applied machine learning community (like me), this post will put forward several plausible attack vectors for a typical machine learning system at a typical organization, propose tentative defensive solutions, and discuss a few general concerns and potential best practices.
1. Data poisoning attacks
Data poisoning refers to someone systematically changing your training data to manipulate your model’s predictions. (Data poisoning attacks have also been called “causative” attacks.) To poison data, an attacker must have access to some or all of your training data. And at many companies, many different employees, consultants, and contractors have just that—and with little oversight. It’s also possible a malicious external actor could acquire unauthorized access to some or all of your training data and poison it. A very direct kind of data poisoning attack might involve altering the labels of a training data set. So, whatever the commercial application of your model is, the attacker could dependably benefit from your model’s predictions—for example, by altering labels so your model learns to award large loans, large discounts, or small insurance premiums to people like themselves. (Forcing your model to make a false prediction for the attacker’s benefit is sometimes called a violation of your model’s “integrity”.) It’s also possible that a malicious actor could use data poisoning to train your model to intentionally discriminate against a group of people, depriving them the big loan, big discount, or low premiums they rightfully deserve. This is like a denial-of-service (DOS) attack on your model itself. (Forcing your model to make a false prediction to hurt others is sometimes called a violation of your model’s “availability”.) While it might be simpler to think of data poisoning as changing the values in the existing rows of a data set, data poisoning can also be conducted by adding seemingly harmless or superfluous columns onto a data set. Altered values in these columns could then trigger altered model predictions.
Now, let’s discuss some potential defensive and forensic solutions for data poisoning:
Disparate impact analysis: Many banks already undertake disparate impact analysis for fair lending purposes to determine if their model is treating different types of people in a discriminatory manner. Many other organizations, however, aren’t yet so evolved. Disparate impact analysis could potentially discover intentional discrimination in model predictions. There are several great open source tools for detecting discrimination and disparate impact analysis, such as Aequitas, Themis, and AIF360.
Reject on Negative Impact (RONI): RONI is a technique that removes rows of data from the training data set that decrease prediction accuracy. See “The Security of Machine Learning” in section 8 for more information on RONI.
Residual analysis: Look for strange, prominent patterns in the residuals of your model predictions, especially for employees, consultants, or contractors.
Self-reflection: Score your models on your employees, consultants, and contractors and look for anomalously beneficial predictions.
Disparate impact analysis, residual analysis, and self-reflection can be conducted at training time and as part of real-time model monitoring activities.
2. Watermark attacks
Watermarking is a term borrowed from the deep learning security literature that often refers to putting special pixels into an image to trigger a desired outcome from your model. It seems entirely possible to do the same with customer or transactional data. Consider a scenario where an employee, consultant, contractor, or malicious external actor has access to your model’s production code—that makes real-time predictions. Such an individual could change that code to recognize a strange, or unlikely, combination of input variable values to trigger a desired prediction outcome. Like data poisoning, watermark attacks can be used to attack your model’s integrity or availability. For instance, to attack your model’s integrity, a malicious insider could insert a payload into your model’s production scoring code that recognizes the combination of age of 0 and years at an address of 99 to trigger some kind of positive prediction outcome for themselves or their associates. To deny model availability, an attacker could insert an artificial, discriminatory rule into your model’s scoring code that prevents your model from producing positive outcomes for a certain group of people.
Defensive and forensic approaches for watermark attacks might include:
Anomaly detection: Autoencoders are a fraud detection model that can identify input data that is strange or unlike other input data, but in complex ways. Autoencoders could potentially catch any watermarks used to trigger malicious mechanisms.
Data integrity constraints: Many databases don’t allow for strange or unrealistic combinations of input variables and this could potentially thwart watermarking attacks. Applying data integrity constraints on live, incoming data streams could have the same benefits.
Version control: Production model scoring code should be managed and version-controlled—just like any other mission-critical software asset.
Anomaly detection, data integrity constraints, and disparate impact analysis can be used at training time and as part of real-time model monitoring activities.
3. Inversion by surrogate models
Inversion basically refers to getting unauthorized information out of your model—as opposed to putting information into your model. Inversion can also be an example of an “exploratory reverse-engineering” attack. If an attacker can receive many predictions from your model API or other endpoint (website, app, etc.), they can train their own surrogate model. In short, that’s a simulation of your very own predictive model! An attacker could conceivably train a surrogate model between the inputs they used to generate the received predictions and the received predictions themselves. Depending on the number of predictions they can receive, the surrogate model could become quite an accurate simulation of your model. Once the surrogate model is trained, then the attacker has a sandbox from which to plan impersonation (i.e., “mimicry”) or adversarial example attacks against your model’s integrity, or the potential ability to start reconstructing aspects of your sensitive training data. Surrogate models can also be trained using external data sources that can be somehow matched to your predictions, as ProPublica famously did with the proprietary COMPAS recidivism model.
To protect your model against inversion by surrogate model, consider the following approaches:
Authorized access: Require additional authentication (e.g., 2FA) to receive a prediction.
Throttle predictions: Restrict high numbers of rapid predictions from single users; consider artificially increasing prediction latency.
White-hat surrogate models: As a white-hat hacking exercise, try this: train your own surrogate models between your inputs and the predictions of your production model and carefully observe:
the accuracy bounds of different types of white-hat surrogate models; try to understand the extent to which a surrogate model can really be used to learn unfavorable knowledge about your model.
the types of data trends that can be learned from your white-hat surrogate model, like linear trends represented by linear model coefficients.
the types of segments or demographic distributions that can be learned by analyzing the number of individuals assigned to certain white-hat surrogate decision tree nodes.
the rules that can be learned from a white-hat surrogate decision tree—for example, how to reliably impersonate an individual who would receive a beneficial prediction.
4. Adversarial example attacks
A motivated attacker could theoretically learn, say by trial and error (i.e., “exploration” or “sensitivity analysis”), surrogate model inversion, or by social engineering, how to game your model to receive their desired prediction outcome or to avoid an undesirable prediction. Carrying out an attack by specifically engineering a row of data for such purposes is referred to as an adversarial example attack. (Sometimes also known as an “exploratory integrity” attack.) An attacker could use an adversarial example attack to grant themselves a large loan or a low insurance premium or to avoid denial of parole based on a high criminal risk score. Some people might call using adversarial examples to avoid an undesirable outcome from your model prediction “evasion.”
Try out the techniques outlined below to defend against or to confirm an adversarial example attack:
Activation analysis: Activation analysis requires benchmarking internal mechanisms of your predictive models, such as the average activation of neurons in your neural network or the proportion of observations assigned to each leaf node in your random forest. You then compare that information against your model’s behavior on incoming, real-world data streams. As one of my colleagues put it, “this is like seeing one leaf node in a random forest correspond to 0.1% of the training data but hit for 75% of the production scoring rows in an hour.” Patterns like this could be evidence of an adversarial example attack.
Benchmark models: Use a highly transparent benchmark model when scoring new data in addition to your more complex model. Interpretable models could be seen as harder to hack because their mechanisms are directly transparent. When scoring new data, compare your new fancy machine learning model against a trusted, transparent model or a model trained on a trusted data source and pipeline. If the difference between your more complex and opaque machine learning model and your interpretable or trusted model is too great, fall back to the predictions of the conservative model or send the row of data for manual processing. Also record the incident. It could be an adversarial example attack.
White-hat sensitivity analysis: Use sensitivity analysis to conduct your own exploratory attacks to understand what variable values (or combinations thereof) can cause large swings in predictions. Screen for these values, or combinations of values, when scoring new data. You may find the open source package cleverhans helpful for any white-hat exploratory analyses you conduct.
Activation analysis and benchmark models can be used at training time and as part of real-time model monitoring activities.
A motivated attacker can learn—say, again, by trial and error, surrogate model inversion, or social engineering—what type of input or individual receives a desired prediction outcome. The attacker can then impersonate this input or individual to receive their desired prediction outcome from your model. (Impersonation attacks are sometimes also known as “mimicry” attacks and resemble identity theft from the model’s perspective.) Like an adversarial example attack, an impersonation attack involves artificially changing the input data values to your model. Unlike an adversarial example attack, where a potentially random-looking combination of input data values could be used to trick your model, impersonation implies using the information associated with another modeled entity (i.e., convict, customer, employee, financial transaction, patient, product, etc.) to receive the prediction your model associates with that type of entity. For example, an attacker could learn what characteristics your model associates with awarding large discounts, like comping a room at a casino for a big spender, and then falsify their information to receive the same discount. They could also share their strategy with others, potentially leading to large losses for your company.
If you are using a two-stage model, be aware of an “allergy” attack. This is where a malicious actor may impersonate a normal row of input data for the first stage of your model in order to attack the second stage of your model.
Defensive and forensic approaches for impersonation attacks may include:
Screening for duplicates: At scoring time track the number of similar records your model is exposed to, potentially in a reduced-dimensional space using autoencoders, multidimensional scaling (MDS), or similar dimension reduction techniques. If too many similar rows are encountered during some time span, take corrective action.
Security-aware features: Keep a feature in your pipeline, say num_similar_queries, that may be useless when your model is first trained or deployed but could be populated at scoring time (or during future model retrainings) to make your model or your pipeline security-aware. For instance, if at scoring time the value of num_similar_queries is greater than zero, the scoring request could be sent for human oversight. In the future, when you retrain your model, you could teach it to give input data rows with high num_similar_queries values negative prediction outcomes.
Activation analysis, screening for duplicates, and security-aware features can be used at training time and as part of real-time model monitoring activities.
6. General concerns
Several common machine learning usage patterns also present more general security concerns.
Blackboxes and unnecessary complexity: Although recent developments in interpretable models and model explanations have provided the opportunity to use accurate and also transparent nonlinear classifiers and regressors, many machine learning workflows are still centered around blackbox models. Such blackbox models are only one type of often unnecessary complexity in a typical commercial machine learning workflow. Other examples of potentially harmful complexity could be overly exotic feature engineering or large numbers of package dependencies. Such complexity can be problematic for at least two reasons:
A dedicated, motivated attacker can, over time, learn more about your overly complex blackbox modeling system than you or your team knows about your own model. (Especially in today’s overheated and turnover-prone data “science” market.) To do so, they can use many newly available model-agnostic explanation techniques and old-school sensitivity analysis, among many other more common hacking tools. This knowledge imbalance can potentially be exploited to conduct the attacks described in sections 1 – 5 or for other yet unknown types of attacks.
Machine learning in the research and development environment is highly dependent on a diverse ecosystem of open source software packages. Some of these packages have many, many contributors and users. Some are highly specific and only meaningful to a small number of researchers or practitioners. It’s well understood that many packages are maintained by brilliant statisticians and machine learning researchers whose primary focus is mathematics or algorithms, not software engineering, and certainly not security. It’s not uncommon for a machine learning pipeline to be dependent on dozens or even hundreds of external packages, any one of which could be hacked to conceal an attack payload.
Distributed systems and models: For better or worse, we live in the age of big data. Many organizations are now using distributed data processing and machine learning systems. Distributed computing can provide a broad attack surface for a malicious internal or external actor in the context of machine learning. Data could be poisoned on only one or a few worker nodes of a large distributed data storage or processing system. A back door for watermarking could be coded into just one model of a large ensemble. Instead of debugging one simple data set or model, now practitioners must examine data or models distributed across large computing clusters.
Distributed denial of service (DDOS) attacks: If a predictive modeling service is central to your organization’s mission, ensure you have at least considered more conventional distributed denial of service attacks, where attackers hit the public-facing prediction service with an incredibly high volume of requests to delay or stop predictions for legitimate users.
7. General solutions
Several older and newer general best practices can be employed to decrease your security vulnerabilities and to increase fairness, accountability, transparency, and trust in machine learning systems.
Authorized access and prediction throttling: Standard safeguards such as additional authentication and throttling may be highly effective at stymieing a number of the attack vectors described in sections 1–5.
Benchmark models: An older or trusted interpretable modeling pipeline, or other highly transparent predictor, can be used as a benchmark model from which to measure whether a prediction was manipulated by any number of means. This could include data poisoning, watermark attacks, or adversarial example attacks. If the difference between your trusted model’s prediction and your more complex and opaque model’s predictions are too large, record these instances. Refer them to human analysts or take other appropriate forensic or remediation steps. (Of course, serious precautions must be taken to ensure your benchmark model and pipeline remains secure and unchanged from its original, trusted state.)
Interpretable, fair, or private models: The techniques now exist (e.g., monotonic GBMs (M-GBM), scalable Bayesian rule lists (SBRL), eXplainable Neural Networks (XNN)), that can allow for both accuracy and interpretability. These accurate and interpretable models are easier to document and debug than classic machine learning blackboxes. Newer types of fair and private models (e.g., LFR, PATE) can also be trained to essentially care less about outward visible, demographic characteristics that can be observed, socially engineered into an adversarial example attack, or impersonated. Are you considering creating a new machine learning workflow in the future? Think about basing it on lower-risk, interpretable, private, or fair models. Models like this are more easily debugged and potentially robust to changes in an individual entity’s characteristics.
Model debugging for security: The newer field of model debugging is focused on discovering errors in machine learning model mechanisms and predictions, and remediating those errors. Debugging tools such a surrogate models, residual analysis, and sensitivity analysis can be used in white-hat exercises to understand your own vulnerabilities or for forensic exercises to find any potential attacks that may have occurred or be occurring.
Model documentation and explanation techniques: Model documentation is a risk-mitigation strategy that has been used for decades in banking. It allows knowledge about complex modeling systems to be preserved and transferred as teams of model owners change over time. Model documentation has been traditionally applied to highly transparent linear models. But with the advent of powerful, accurate explanatory tools (such as tree SHAP and derivative-based local feature attributions for neural networks), pre-existing blackbox model workflows can be at least somewhat explained, debugged, and documented. Documentation should obviously now include all security goals, including any known, remediated, or anticipated security vulnerabilities.
Model monitoring and management explicitly for security: Serious practitioners understand most models are trained on static snapshots of reality represented by training data and that their prediction accuracy degrades in real time as present realities drift away from the past information captured in the training data. Today, most model monitoring is aimed at discovering this drift in input variable distributions that will eventually lead to accuracy decay. Model monitoring should now likely be designed to monitor for the attacks described in sections 1 – 5 and any other potential threats your white-hat model debugging exercises uncover. (While not always directly related to security, my opinion is that models should also be evaluated for disparate impact in real time as well.) Along with model documentation, all modeling artifacts, source code, and associated metadata need to be managed, versioned, and audited for security like the valuable commercial assets they are.
Security-aware features: Features, rules, and pre- or post-processing steps can be included in your models or pipelines that are security-aware, such as the number of similar rows seen by the model, whether the current row represents an employee, contractor, or consultant, or whether the values in the current row are similar to those found in white-hat adversarial example attacks. These features may or may not be useful when a model is first trained. But keeping a placeholder for them when scoring new data, or when retraining future iterations of your model, may come in very handy one day.
Systemic anomaly detection: Train an autoencoder–based anomaly detection metamodel on your entire predictive modeling system’s operating statistics—the number of predictions in some time period, latency, CPU, memory, and disk loads, the number of concurrent users, and everything else you can get your hands on—and then closely monitor this metamodel for anomalies. An anomaly could tip you off that something is generally not right in your predictive modeling system. Subsequent investigation or specific mechanisms would be needed to trace down the exact problem.
8. References and further reading
A lot of the contemporary academic machine learning security literature focuses on adaptive learning, deep learning, and encryption. However, I don’t know many practitioners who are actually doing these things yet. So, in addition to recently published articles and blogs, I found papers from the 1990s and early 2000s about network intrusion, virus detection, spam filtering, and related topics to be helpful resources as well. If you’d like to learn more about the fascinating subject of securing machine learning models, here are the main references—past and present—that I used for this post. I’d recommend them for further reading, too.
I care very much about the science and practice of machine learning, and I am now concerned that the threat of a terrible machine learning hack, combined with growing concerns about privacy violations and algorithmic discrimination, could increase burgeoning public and political skepticism about machine learning and AI. We should all be mindful of AI winters in the not-so-distant past. Security vulnerabilities, privacy violations, and algorithmic discrimination could all potentially combine to lead to decreased funding for machine learning research or draconian over-regulation of the field. Let’s continue discussing and addressing these important problems to preemptively prevent a crisis, as opposed to having to reactively respond to one.
SOD — an embedded, modern cross-platform computer vision and machine learning software library that exposes a set of APIs for deep learning, advanced media analysis and processing, including real-time, multi-class object detection and model training on embedded systems with limited computational resource and IoT devices.Open source.
Textworld — Microsoft Research project, it’s an open source, extensible engine that both generates and simulates text games. You can use it to train reinforcement learning (RL) agents to learn skills such as language understanding and grounding, combined with sequential decision-making. Cue “Microsoft teaches AI to play Zork” headlines. And they have a competition.
Timeliner — All your digital life on a single timeline, stored locally. Great idea; I hope its development continues.
What’s Wrong with Blaming “Information” for Political Chaos (Cory Doctorow) — a response to yesterday’s “What The Hell is Going On?” link. I think Perell is wrong. His theory omits the most salient, obvious explanation for what’s going on (the creation of an oligarchy that has diminished the efficacy of public institutions and introduced widespread corruption in every domain), in favor of rationalizations that let the wealthy and their enablers off the hook, converting a corrupt system with nameable human actors who have benefited from it and who spend lavishly to perpetuate it into a systemic problem that emerges from a historical moment in which everyone is blameless, prisoners of fate and history. I think it’s both: we have far more of every medium than we can consume because the information industrial engines are geared to production and distraction not curation for quality. This has crippled the internet’s ability to be a fightback mechanism. My country’s recent experiences with snuff videos and white supremacist evangelicals doesn’t predispose me to think as Perell does that the deluge of undifferentiated information is a marvelous thing, so I think Cory and I have a great topic of conversation the next time we’re at the same conference together.
3 Things I Wish I Knew When I Began Designing Languages (Peter Alvaro) — when I presented at my job talk at Harvard, a systems researcher who I admire very much, said something along the lines of, “Yes, this kind of reminds me of a Racket, and in Racket everything is a parenthesis. So, in your language, what is the thing that is everything that I don’t buy?” That was nice.
What the Hell is Going On? — I’ll show how the shift from information scarcity to information abundance is transforming commerce, education, and politics. The structure of each industry was shaped by the information-scarce, mass media environment. First, we’ll focus on commerce. Education will be second. Then, we’ll zoom out for a short history of America since World War II. We’ll see how information scarcity creates authority and observe the effects of the internet on knowledge. Finally, we’ll return to politics and tie these threads together.
Meritocracy (Fast Company) — in companies that explicitly held meritocracy as a core value, managers assigned greater rewards to male employees over female employees with identical performance evaluations. This preference disappeared where meritocracy was not explicitly adopted as a value.
Facebook is Not a Monopoly, But Should Be Broken Up (Wired) — Demand monopsonists integrate horizontally, acquiring or copying user demand adjacent to their existing demand and gaining leverage over their suppliers (and advertisers, if that’s the model). Facebook is unlikely to ever own a media production company, just as Airbnb and Uber will not soon own a hotel or a physical taxi company. But if they can, they’ll own every square foot of demand that feeds those industries. (via Cory Doctorow)
Debugging Neural Networks — 1. Start simple; 2. Confirm your loss; 3. Check intermediate outputs and connections; 4. Diagnose parameters; 5. Tracking your work.
A Peek into the Future of Wearables (IEEE) — Mind reading glasses, goggles that erase chronic pain, a wristband that can hear what the wearer can’t, and more futuristic wearables are on the horizon.
Event Audio — I wrote up a guide for event organizers to providing microphones so all the speakers can give their best performance.
In this episode of the Data Show, I spoke with Kartik Hosanagar, professor of technology and digital business, and professor of marketing at The Wharton School of the University of Pennsylvania. Hosanagar is also the author of a newly released book, A Human’s Guide to Machine Intelligence, an interesting tour through the recent evolution of AI applications that draws from his extensive experience at the intersection of business and technology.
We had a great conversation spanning many topics, including:
The types of unanticipated consequences of which algorithm designers should be aware.
The predictability-resilience paradox: as systems become more intelligent and dynamic, they also become more unpredictable, so there are trade-offs algorithms designers must face.
Managing risk in machine learning: AI application designers need to weigh considerations such as fairness, security, privacy, explainability, safety, and reliability.
A bill of rights for humans impacted by the growing power and sophistication of algorithms.
Some best practices for bringing AI into the enterprise.
Iodide (Mozilla) — notebook, but with multiple languages (eventually) compiling down to WebAssembly. Create, share, collaborate, and reproduce powerful reports and visualizations with tools you already know.
Amazon’s Alexa: 80,000 Apps and No Runaway Hit (Bloomberg) — voice has a massive discoverability problem. As Alan Cooper said, I really have no idea what the boundaries of the domains are, because I would have to go experiment endlessly with Siri and Alexa and all the others, and I don’t have the patience. But that’s the point: I have no idea even roughly what I’m likely to be able to ask about. And it’s a moving target because the platform makers assume that more content is better, so they shovel new content into the system as fast as they can. So in a very real sense, the burden of memorizing the list of commands is increasing over time, as the system “improves.”
ESP Little Game Engine — Game engine with web emulator and compiler. […] The game engine has a virtual screen resolution of 128×128 pixels, 16 colors, one background layer, 32 soft sprites with collision tracking and rotation, 20kb of memory for the game and variables. The virtual machine performs approximately 900,000 operations per second at a drawing rate of 20 frames per second. Control of eight buttons. Built for the ESP8266 chipset.
Every day, someone comes up with a new use for old data. Recently, IBM scraped a million photos from Flickr and turned them into a training data set for an AI project intending to reduce bias in facial recognition. That’s a noble goal, promoted to researchers as an opportunity to make more ethical AI.
Yet, the project raises numerous ethical questions of its own. Photographers and subjects weren’t asked if their photos could be included; while the photos are all covered by a Creative Commons non-commercial license, one of the photographers quoted in an NBC article about the project asks by what rationale anything IBM does with his photographs can be considered “non-commercial.” It’s almost impossible to get your photographs removed from the database; it’s possible in principle, but IBM requires you to have the URL of the original photograph—which means you have to know which photographs were included in the first place. (NBC provides a tool to check whether your photos are in the database.) And there are plenty of questions about how people will make use of this data, which has been annotated with many measurements that are useful for face recognition systems.
Not only that, photographic subjects were, in effect, turned into research subjects without their consent. And even though their photos were public on Flickr, a strong case can be made that the new context violates their privacy.
Cornell Tech professor Helen Nissenbaum, author of the book Privacy in Context, reminds us that we need to think about privacy in terms of when data moves from one context to another, rather than in absolute terms. Thinking about changes in context is difficult, but essential: we’ve long passed the point where any return to absolute privacy was possible—if it ever was possible in the first place.
Meredith Whittaker, co-director of the AI Now Institute, made a striking extension to this insight in a quote from the same NBC article: “People gave their consent to sharing their photos in a different internet ecosystem.”
We do indeed live in a different internet ecosystem than the one many of our original privacy rules were invented for. The internet is not what it was 30 years ago. The web is not what it was 30 years ago, when it was invented. Flickr is not what it was when it was founded. 15 or 20 years ago, we had some vague ideas about face recognition, but it was a lot closer to science fiction. People weren’t actually automating image tagging, which is creepy enough; they certainly weren’t building applications to scan attendees at concerts or sporting events.
IBM’s creation of a new database obviously represents a change of context. But Whittaker is saying that the internet itself is a changing context. It isn’t what it has been in the past; it probably never could have stayed the same; but regardless of what it is now, the data’s context has changed, without the data moving. We’re right to worry about data flows, but we also have to worry about the context changing even when data doesn’t flow. It’s easy to point fingers at IBM for using Flickr’s data irresponsibly—as we read it, we’re sympathetic with that position. But the real challenge is that the meaning of the images on Flickr has changed. They’re not just photos: they’re a cache of data for training machine learning systems.
What do we do when the contexts themselves change? That’s a question we must work hard to answer. Part of the problem is that contexts change slowly, and that changes in a context are much easier to ignore than a new data-driven application.
Some might argue that data can never be used without consent. But that has led us down a path of over-broad clickwrap agreements that force people to give consent for things that are not yet even imagined in order to use a valuable service.
One special type of meta-context to consider is intent. While context may change, it is possible to look through that changing context to the intent of a user’s consent to the use of their data. For example, when someone uses Google maps, they implicitly consent to Google using location data to guide them from place to place. When Google then provides an API that allows Uber or Lyft or Doordash to leverage that data to guide a driver to them, the context has changed but the intent has not. The data was part of a service transaction, and the intent can be “well-intentionedly” transferred to a new context, as long as it is still in service of the user, rather than simply for the advantage of the data holder.
When Google decides to use your location to target advertisements, that’s not only a different context but a different intent. As it turns out, Google actually expresses the intent of their data collection very broadly, and asks its users to consent to Google’s use of their data in many evolving contexts as the cost of providing free services. There would surely be value in finer grained expression of intent. At the same time, you can make the case that there was a kind of meta-intent expressed and agreed to, which can survive the context transition to new services. What we still need in a case like this is some mechanism for redress, for users to say “in this case, you went too far” even as they often are delighted by other new and unexpected uses for their data.
There are other cases where the breach of intent is far clearer. For example, when my cell phone provider gets my location as a byproduct of connecting me to a cell tower, other use of my data was never part of the transaction, and when they resell it (as they do), that is a breach not only of context but of intent.
Data ethics raises many a hard problem. Fortunately, the framing of context as a guiding principle by Nissenbaum and Whittaker gives us a powerful way to work toward solving them.
An All-Neural On-Device Speech Recognizer (Google) — on-device is the important bit here: no more uploading all your ambient audio to the cloud. After compression, the final model is 80MB. That’s impressive too.
Super Sensors — a single, highly capable sensor can indirectly monitor a large context, without direct instrumentation of objects.
Observations on Burnout — I’ll add some data points, go in-depth on what I think causes it, and attempt to offer some advice for engineers and managers.