Four short links: 31 January 2019

Four short links: 31 January 2019

Four short links
  1. Cory Doctorow at Grand Reopening of the Public DomainLocke was a thinkfluencer. No transcript yet, but audio ripped on the Internet Archive.
  2. Libre SiliconWe develop a free and open source semiconductor manufacturing process standard and provide a quick, easy, and inexpensive way for manufacturing. No NDAs will be required anywhere to get started, making it possible to build the designs in your basement if you wish. We are aiming to revolutionize the market by breaking through the monopoly of proprietary closed-source manufacturers.
  3. Predicting Visual Discomfort with Stereo DisplaysIn a third experiment, we measured phoria and the zone of clear single binocular vision, which are clinical measurements commonly associated with correcting refractive error. Those measurements predicted susceptibility to discomfort in the first two experiments. A simple predictor of whether and when you’re going to puke with an AR/VR headset would be a wonderful thing. Perception of synthetic realities are weird: a friend told me about encountering a bug in a VR renderer that made him immediately (a) fall over, and (b) puke. Core dumped?
  4. A New Circular Vision for Electronics (World Economic Forum) — getting coverage because it says: Each year, close to 50 million tonnes of electronic and electrical waste (e-waste) are produced, equivalent in weight to all commercial aircraft ever built; only 20% is formally recycled. If nothing is done, the amount of waste will more than double by 2050, to 120 million tonnes annually. […] That same e-waste represents a huge opportunity. The material value alone is worth $62.5 billion (€55 billion), three times more than the annual output of the world’s silver mines and more than the GDP of most countries. There is 100 times more gold in a tonne of mobile phones than in a tonne of gold ore. (via Slashdot)
Article image: Four short links

Four short links: 30 January 2019

Four short links: 30 January 2019

Four short links
  1. The Rise of No CodeAs creating things on the internet becomes more accessible, more people will become makers. It’s no longer limited to the >1% of engineers who can code, resulting in an explosion of ideas from all kinds of people. We see “no code” projects on Product Hunt often. This is related to my ongoing interest in Ways In Which Programmers Are Automating Themselves Out of A Job. This might be bad for some low-complexity programmers in the short term, and good for society. Or it might be that the AI Apocalypse is triggered by someone’s Glitch bot achieving sentience. Watch this space!
  2. My Losing Battle with Enterprise Sales (Luke Kanies) — All that discounting you have to do for enterprise clients? It’s because procurement’s bonus is based on how much of a discount they force you to give. Absolutely everyone knows this is how it works, and that everyone knows this, so it’s just a game. I offer my product for a huge price, you try to force a discount, and then at the end we all compare notes to see how we did relative to market. Neither of us really wants to be too far out of spec; I want to keep my average prices the same, and you just want to be sure you aren’t paying too much. Luke tells all.
  3. Decoding Words from Brain WavesIn each study, electrodes placed directly on the brain recorded neural activity while brain-surgery patients listened to speech or read words out loud. Then, researchers tried to figure out what the patients were hearing or saying. In each case, researchers were able to convert the brain’s electrical activity into at least somewhat-intelligible sound files.
  4. A New Golden Age for Computer Architecture (ACM) — the opportunities for future improvements in speed and energy efficiency will come from (the authors predict): compiler tech and domain-specific architectures. This is a very good overview of how we got here, by way of Moore’s Law, Dennard’s Law, and Amdahl’s Law.
Article image: Four short links

How companies are building sustainable AI and ML initiatives

How companies are building sustainable AI and ML initiatives

Interfaces

(source: Pixabay)

In 2017, we published “How Companies Are Putting AI to Work Through Deep Learning,” a report based on a survey we ran aiming to help leaders better understand how organizations are applying AI through deep learning. We found companies were planning to use deep learning over the next 12-18 months. In 2018, we decided to run a follow-up survey to determine whether companies’ machine learning (ML) and AI initiatives are sustainable—the results of which are in our recently published report, “Evolving Data Infrastructure.”

The current generation of AI and ML methods and technologies rely on large amounts of data—specifically, labeled training data. In order to have a longstanding AI and ML practice, companies need to have data infrastructure in place to collect, transform, store, and manage data. On one hand, we wanted to see whether companies were building out key components. On the other hand, we wanted to measure the sophistication of their use of these components. In other words, could we see a roadmap for transitioning from legacy cases (perhaps some business intelligence) toward data science practices, and from there into the tooling required for more substantial AI adoption?

Here are some notable findings from the survey:

  • Companies are serious about machine learning and AI. Fifty-eight percent of respondents indicated that they were either building or evaluating data science platform solutions. Data science (or machine learning) platforms are essential for companies that are keen on growing their data science teams and machine learning capabilities.
  • Companies are building or evaluating solutions in foundational technologies needed to sustain success in analytics and AI. These include data integration and extract, transform, and load (ETL) (60% of respondents indicated they were building or evaluating solutions), data preparation and cleaning (52%), data governance (31%), metadata analysis and management (28%), and data lineage management (21%).
  • Data scientists and data engineers are in demand. When asked which were the main skills related to data that their teams needed to strengthen, 44% chose data science and 41% chose data engineering.
  • Companies are building data infrastructure in the cloud. Eighty-five percent indicated that they had data infrastructure in at least one of the seven cloud providers we listed, with two-thirds (63%) using Amazon Web Services (AWS) for some portion of their data infrastructure. We found that users of AWS, Microsoft Azure, and Google Cloud Platform (GCP) tended to use multiple cloud providers.
Article image: Interfaces

(source: Pixabay).

Four short links: 29 January 2019

Four short links: 29 January 2019

Four short links
  1. git-absorbgit commit –fixup, but automatic.
  2. Coding the Matrix — linear algebra was where math broke me at university, so my eyes are always drawn to presentations of the subject that promise relevance and comprehensibility. (via Academic Torrents)
  3. A List of Useful Steganography Tools and Resources — what it says on the box.
  4. Analyzing the Performance of WebAssembly vs. Native CodeAcross the SPEC CPU suite of benchmarks, we find a substantial performance gap: applications compiled to WebAssembly run slower by an average of 50% (Firefox) to 89% (Chrome), with peak slowdowns of 2.6x (Firefox) and 3.14x (Chrome). We identify the causes of this performance degradation, some of which are due to missing optimizations and code generation issues, while others are inherent to the WebAssembly platform.
Article image: Four short links

Four short links: 28 January 2019

Four short links: 28 January 2019

Four short links
  1. AI Helps Amputees Walk With a Robotic Knee (IEEE) — Normally, human technicians spend hours working with amputees to manually adjust robotic limbs to work well with each person’s style of walking. By comparison, the reinforcement learning technique automatically tuned a robotic knee, enabling the prosthetic wearers to walk smoothly on level ground within 10 minutes.
  2. Penelopea cloud-based, open, and modular platform that consists of tools and techniques for mapping landscapes of opinions expressed in online (social) media. The platform is used for analyzing the opinions that dominate the debate on certain crucial social issues, such as immigration, climate change, and national identity. Penelope is part of the H2020 EU project ODYCCEUS (Opinion Dynamics and Cultural Conflict in European Spaces).
  3. What MMOs Can Teach Us About Real-Life Politics — Larry Lessig is designing the political mechanics for a videogame, and this interview is very intriguing. Lessig is also interested in possibly implementing an in-game process in which democracy doesn’t depend on voting: “I’m eager to experiment or enable the experimentation of systems that don’t need to be tied so much to election.” (via BoingBoing)
  4. The AtomSpace: a Typed Graphical Distributed in-RAM Knowledgebase (OpenCog) — Here’s my sales pitch: you want a graph database with a sophisticated type system built into it. Maybe you don’t know this yet. But you do. You will. You’ll have trouble doing anything reasonable with your knowledge (like reasoning, inferencing, and learning) if you don’t. This is why the OpenCog AtomSpace is a graph database, with types.
Article image: Four short links

Rethinking informed consent

Rethinking informed consent

Wrong Way

(source: Colin Knowles on Flickr)

Informed consent is part of the bedrock of data ethics. DJ Patil, Hilary Mason, and I have written about it, as have many others. It’s rightfully part of every code of data ethics I’ve seen. But I have to admit misgivings—not so much about the need for consent, but about what it means. Obtaining consent to collect and use data isn’t the end of the process; at best, it’s the beginning, and perhaps not a very good one.

Helen Nissenbaum, in an interview with Scott Berinato, articulates some of the problems. It’s easy to talk about informed consent, but what do we mean by “informed”? Almost everyone who reads this article has consented to some kind of medical procedure; did any of us have a real understanding of what the procedure was and what the risks were? We rely on the prestige or status of the doctor, but unless we’re medical professionals, or have done significant online research, we have, at best, a vague notion of what’s going to happen and what the risks are. In medicine, for the most part, things come out all right. The problems with consent to data collection are much deeper.

The problem starts with the origin of the consent criterion. It comes from medicine and the social sciences, in which consenting to data collection and to being a research subject has a substantial history. It arose out of experiments with mind-boggling ethical problems (for example, the Tuskeegee syphilis experiment), and it still isn’t always observed (paternalism is still a thing). “Consent” in medicine is limited: whether or not you understand what you’re consenting to, you are consenting to a single procedure (plus emergency measures if things go badly wrong). The doctor can’t come back and do a second operation without further consent. And likewise, “consent” in the social sciences is limited to a single study: you become a single point in an array of data that ceases to exist when the study is complete.

That may have been true years ago, but those limitations on how consent is used seem very shaky, as Nissenbaum argues. Consent is fundamentally an assurance about context: consenting to a medical procedure means the doctors do their stuff, and that’s it. The outcome might not be what you want, but you’ve agreed to take the risk. But what about the insurance companies? They get the data, and they can repackage and exchange it. What happens when, a few years down the road, you’re denied coverage because of a “pre-existing condition”? That data has moved beyond the bounds of an operating room. What happens when data from an online survey or social media profile is shared with another organization and combined and re-combined with other data? When it is used in other contexts, can it be de-anonymized and used to harm the participants? That single point in an array of data has now become a constellation of points feeding many experiments, not all of which are benign.

I’m haunted by the question, “what are users consenting to?” Technologists rarely think through the consequences of their work carefully enough; but even if they did, there will always be consequences that can’t be foreseen or understood, particularly when data from different sources is combined. So, consenting to data collection, whether it’s clicking on the ever-present checkbox about cookies or agreeing to Facebook’s license agreement, is significantly different from agreeing to surgery. We really don’t know how that data is used, or might be used, or could be used in the future. To use Nissenbaum’s language, we don’t know where data will flow, nor can we predict the contexts in which it will be used.

Consent frequently isn’t optional, but compelled. Writing about the #DeleteFacebook movement, Jillian York argues that for many, deleting Facebook is not an option: “for people with marginalized identities, chronic illnesses, or families spread across the world, walking away [from Facebook] means leaving behind a potentially vital safety net of support.” She continues by writing that small businesses, media outlets, artists, and activists rely on it to reach audiences. While no one is compelled to sign up, or to remain a user, for many “deleting facebook” means becoming a non-entity. If Facebook is your only way to communicate with friends, relatives, and support communities, refusing “consent” may not be an option; consent is effectively compelled. The ability to withdraw consent from Facebook is a sign of privilege. If you lack privilege, an untrustworthy tool may be better than no tool at all.

One alternative to consent is the idea that you own the data and should be compensated for its use. Eric Posner, Glen Weyl, and others have made this argument, which essentially substitutes a market economy for consent: if you pay me enough, I’ll let you use my data. However, markets don’t solve many problems. In “It’s time for a bill of data rights,” Martin Tisne argues that data ownership is inadequate. When everything you do creates data, it’s no more meaningful to own your “digital shadow” than your physical one. How do you “own” your demographic profile? Do you even “own” your medical record? Tisne writes: “A person doesn’t ‘own’ the fact that she has diabetes—but she can have the right not to be discriminated against because of it… But absent government regulation to prevent health insurance companies from using data about preexisting conditions, individual consumers lack the ability to withhold consent. … Consent, to put it bluntly, does not work.” And it doesn’t work whether or not consent is mediated by a market. At best, the market may give some incremental income, but at worst, it gives users incentives to act against their best interest.

It’s also easy to forget that in many situations, users are compensated for their data: we’re compensated by the services that Facebook, Twitter, Google, and Amazon provide. And that compensation is significant; how many of us could do our jobs without Google? The economic value of those services to me is large, and the value of my data is actually quite limited. To Google, the dozens of Google searches I do in a day are worth a few cents at most. Google’s market valuation doesn’t derive from the value of my data or yours in isolation, but the added value that comes from aggregating data across billions of searches and other sources. Who owns that added value? Not me. An economic model for consent (I consent to let you use my data if you pay me) misses the point: data’s value doesn’t live with the individual.

It would be tragic to abandon consent, though I agree with Nissenbaum that we urgently need to get beyond “incremental improvement to consent mechanisms.” It is time to recognize that consent has serious limitations, due partly to its academic and historical origins. It’s important to gain consent for participation in an experiment; otherwise, the subject isn’t a participant but a victim. However, while understanding the consequences of any action has never been easy, the consent criterion arose when consequences were far more limited and data didn’t spread at the speed of light.

So, the question is: how do we get beyond consent? What kinds of controls can we place on the collection and use of data that align better with the problems we’re facing? Tisne suggests a “data bill of rights”: a set of general legal principles about how data can be used. The GDPR is a step in this direction; the Montreal Declaration for the Responsible Development of Artificial Intelligence could be reformulated as a “bill of data rights.” But a data bill of rights assumes a new legal infrastructure, and by nature such infrastructures place the burden of redress on the user. Would one bring a legal action against Facebook or Google for violation of one’s data rights? Europe’s enforcement of GDPR will provide an important test case, particularly since this case is essentially about data flows and contexts. It isn’t clear that our current legal institutions can keep pace with the many flows and contexts in which data travels.

Nissenbaum starts from the knowledge that data moves, and that the important questions aren’t around how our data is used, but where our data travels. This shift in perspective is important precisely because data sets become more powerful when they’re combined; because it isn’t possible to anticipate all the ways data might be used; and because once data has started flowing, it’s very hard to stop it. But we have to admit we don’t yet know how to ask for consent about data flows or how to prove they are under control. Which data flows should be allowed? Which shouldn’t? We want to enable medical research on large aggregated data sets without jeopardizing the insurance coverage of the people whose data are in those sets. Data would need to carry metadata with it that describes where it could be transferred and how it could be used once it’s transferred; it makes no sense to talk about controlling data flows if that control can’t be automated.

As Ben Lorica and I have argued, the only way forward is through more automation, not less; issues of scale won’t let us have it any other way. In a conversation, Andrew Zaldivar told me of his work with Margaret Mitchell, Timnit Gebru, and others, on model cards that describe the behavior of a machine learning model, and of Timnit Gebru’s work on Datasheets for Datasets, which specify how a data set was collected, how it is intended to be used, and other information. Model cards and data set datasheets are a step toward the kind of metadata we’d need to automate control over data flows, to build automated tools that manage where data can and can’t travel, to protect public goods as well as personal privacy. In the past year, we’ve seen how easy it is to be overly optimistic about tool building, but we are all already using data at the scale of Google and Facebook. There will need to be human systems that override automatic control over data flows, but automation is an essential ingredient.

Consent is the first step along the path toward ethical use of data, but not the last one. What is the next step?

Four short links: 25 January 2019

Four short links: 25 January 2019

Four short links
  1. Biggest IT Failures of 2018 (IEEE) — a coding error with the spot-welding robots at Subaru’s Indiana Automotive plant in Lafayette, Ind., meant 293 of its new Subaru Ascents had to be sent to the car crusher. A similar problem is suspected as the reason behind the welding problems affecting the steering on Fiat Chrysler Jeep Wranglers. This is not the “crushing it” that brogrammers intended.
  2. Programming Paradigms for Dummies: What Every Programmer Should KnowThis chapter gives an introduction to all the main programming paradigms, their underlying concepts, and the relationships between them. We give a broad view to help programmers choose the right concepts they need to solve the problems at hand. We give a taxonomy of almost 30 useful programming paradigms and how they are related. Most of them differ only in one or a few concepts, but this can make a world of difference in programming. (via Adrian Colyer)
  3. Proposed Model Governance — Singapore Government’s work on regulating AI.
  4. Talent Shortage in Quantum Computing (MIT) — an argument that we need special training for quantum computing, as it’s a mix of engineering and science at this stage in its evolution. This chap would disagree, colorfully: when a subject which claims to be a technology, which lacks even the rudiments of experiment that may one day make it into a technology, you can know with absolute certainty that this “technology” is total nonsense. That was the politest quote I could make.
Article image: Four short links

Four short links: 24 January 2019

Four short links: 24 January 2019

Four short links
  1. Computational Periscopy with an Ordinary Camera (Nature) — Here we introduce a two-dimensional computational periscopy technique that requires only a single photograph captured with an ordinary digital camera. Our technique recovers the position of an opaque object and the scene behind (but not completely obscured by) the object, when both the object and scene are outside the line of sight of the camera, without requiring controlled or time-varying illumination. Such recovery is based on the visible penumbra of the opaque object having a linear dependence on the hidden scene that can be modeled through ray optics. Computation and vision, whether deep learning or this kind of mathematical witchcraft, has brought about an age of truly amazing advances. Digital cameras are going to make film cameras look like pinhole cameras because the digital feature set will be staggering. (All requiring computational power, on- or off-device)
  2. The Data Calculator: Data Structure Design and Cost Synthesis From First Principles, and Learned Cost ModelsWe present a design engine, the Data Calculator, which enables interactive and semi-automated design of data structures. It brings two innovations. First, it offers a set of fine-grained design primitives that capture the first principles of data layout design: how data structure nodes lay out data, and how they are positioned relative to each other. This allows for a structured description of the universe of possible data structure designs that can be synthesized as combinations of those primitives. The second innovation is computation of performance using learned cost models. I’m always interested in augmentation for programmers. (via Adrian Colyer)
  3. Confluo (Berkeley) — open source system for real-time distributed analysis of multiple data streams. Confluo simultaneously supports high throughput concurrent writes, online queries at millisecond timescales, and CPU-efficient ad hoc queries via a combination of data structures carefully designed for the specialized case of multiple data streams, and an end-to-end optimized system design. The home page has more information. Designing for multiple data streams is an interesting architectural choice. Any interesting business will track multiple data streams, but will they do that in one system or bolt together multiple?
  4. Open-Sourcing Bioinstruments — story of the poseidon syringe pump system, which has free hardware designs and software.
Article image: Four short links

7 web dev trends on our radar

7 web dev trends on our radar

Stones on a beach

(source: Free-Photos via Pixabay)

The Greek philosopher Heraclitus’ saying that “change is the only constant in life” resonates strongly with web developers. We asked our community of experts for their take on the tools and trends that will usher in the greatest changes for the web development world in the coming months.

GraphQL leaves the nest

2019 is going to be a big year for figuring out how larger organizations are going to work with GraphQL at scale. For companies that aren’t set up like Facebook, reconciling large-scale service-oriented architecture (SOA) efforts with GraphQL will require thinking around schema composition and quite a bit of exciting tooling to make development fast and easy. — Adam Neary, Tech Lead at Airbnb

Machine learning in the browser

With machine learning (ML) shifting toward a larger developer audience, we expect to see new use cases for ML in the browser and connected IoT devices such as the Raspberry Pi and Google AIY projects. Tools like TensorFlow and TensorFlow.js are enabling developers to build ML-enabled applications without first completing their PhDs, while easier-to-use APIs in TensorFlow.js and TensorFlow with Keras are lowering the barrier to building deep learning models. These advances make it possible to quickly deploy off-the-shelf models and research paper models into production environments. — Nick Kreeger, Senior Software Engineer, Google

React introduces the notion of hooks

The announcement of React Hooks demonstrates how the React team is making great decisions about the future of the library. In 2019, teams using React will be able to opt-in to new features, and it’s very likely that other organizations will move to React due to the strength of these new proposals. — Alex Banks and Eve Porcello, Software Engineers, Moon Highway

Micro-frontends and/or ES6 modules scale frontend applications

The frontend ecosystem is looking for a better way to collaborate with distributed teams in medium to large projects in order to speed up the delivery of new features or products. Micro-frontend principles and ES6 modules are the answer to this challenge, bringing to the table a smart and consistent way to slice an application via subdomains. — Luca Mezzalira, Chief Architect, DAZN

Less typing for mobile users

For mobile web users, web authentication and web payments will finally appear on a lot of sites in 2019, allowing users to log in and pay without entering any details in a web form, just by accessing native features from the browsers. — Maximiliano Firtman, Mobile and Web Developer and Consultant

Progressive web apps catch on

Ever since Alex Russell and Frances Berriman first described them in 2015, people have been talking about progressive web apps (PWAs). In 2019 they’ll become increasingly pervasive. Developers are likely to write more PWAs primarily because the libraries they use, such as Redux and Firebase, encourage them to design apps that align with a PWA architecture. — David Griffiths and Dawn Griffiths, authors of Head First Kotlin

MobX and MobX State Tree usage expands

MobX has gotten a lot of traction in the past year, and it will be consolidated even further in 2019. It’s the perfect companion for working in combination with React. Its reactivity model makes it really easy to implement an application end to end—particularly when you use the MobX State Tree to provide a structure to your projects. — Luca Mezzalira

Four short links: 23 January 2019

Four short links: 23 January 2019

Four short links
  1. Zero-Shot Transfer Across 93 Languages (Facebook) — we have significantly expanded and enhanced our LASER (Language-Agnostic SEntence Representations) toolkit. We are now open-sourcing our work, making LASER the first successful exploration of massively multilingual sentence representations to be shared publicly with the NLP community. The toolkit now works with more than 90 languages, written in 28 different alphabets.
  2. Formally Verified Software in the Real World (CACM) — This was not the first autonomous flight of the AH-6, dubbed the Unmanned Little Bird (ULB); it had been doing them for years. This time, however, the aircraft was subjected to mid-flight cyber attacks. The central mission computer was attacked by rogue camera software as well as by a virus delivered through a compromised USB stick that had been inserted during maintenance. The attack compromised some subsystems but could not affect the safe operation of the aircraft.
  3. The Linux of Social Media: How LiveJournal Pioneered Then Lost Web Blogging“We were always saying we were fighting for the users, that we would run everything by the community before we did anything,” says Mark Smith, a software engineer who worked on LiveJournal and became the co-creator of Dreamwidth. “Well, as it turns out, when you do that, you end up with the community telling you they want everything to stay the same, forever.”
  4. Monicaopen source personal CRM. Monica helps you organize the social interactions with your loved ones.
Article image: Four short links