Automated ML: A journey from CRISPR.ML to Azure ML

Automated ML: A journey from CRISPR.ML to Azure ML

This is a keynote from the O’Reilly Artificial Intelligence Conference in New York 2019. See other highlights from the event.

This keynote was sponsored by Microsoft.

Article image: Prismatic

(source: Pixabay).

Checking in on AI tools

Checking in on AI tools

This is a keynote from the O’Reilly Artificial Intelligence Conference in New York 2019. Watch more full keynotes from this event on the O’Reilly online learning platform.

You can also see other highlights from the event.

Article image: Tools

(source: Pixabay).

Fast, flexible, and functional: 4 real-world AI deployments at enterprise scale

Fast, flexible, and functional: 4 real-world AI deployments at enterprise scale

This is a keynote highlight from the O’Reilly Artificial Intelligence Conference in New York 2019. Watch the full version of this keynote on the O’Reilly online learning platform.

You can also see other highlights from the event.

Article image: Staircases and escalators

(source: Pexels.com).

AI and the robotics revolution

AI and the robotics revolution

This is a keynote highlight from the O’Reilly Artificial Intelligence Conference in New York 2019. Watch the full version of this keynote on the O’Reilly online learning platform.

You can also see other highlights from the event.

Article image: Wireframe

(source: Pixabay).

Four short links: 18 April 2019

Four short links: 18 April 2019

Four short links
  1. Geomancera geospatial feature engineering library. It leverages geospatial data such as OpenStreetMap (OSM) alongside a data warehouse like BigQuery. You can use this to create, share, and iterate geospatial features for your downstream tasks (analysis, modeling, visualization, etc.).
  2. Meshrooma free, open source 3D Reconstruction Software based on the AliceVision framework.
  3. BlingFireA lightning fast finite state machine and regular expression manipulation library. […] We use Fire for many linguistic operations inside Bing such as tokenization, multi-word expression matching, unknown word-guessing, stemming / lemmatization, just to mention a few. cf NLTK.
  4. Learning ZIL — what the Infocom games were written in, decades before Inform. Andrew Plotkin wrote an intro that explains how it sits in the universe. (Note: this is useless but historically interesting.)
Article image: Four short links

Four short links: 17 April 2019

Four short links: 17 April 2019

Four short links
  1. Infocom Source Code Uploaded — with some version control (retroactively manufactured from different versions of the source code). Uploaded from a hard drive of Infocom material copied at the time of the acquisition. Jason Scott described the contents. See also DECWAR source.
  2. I Kind of Hate Twitter (Jason Lefkowitz) — a very good product analysis of why Twitter drives unproductive behaviour. Example: Push delivery makes it hard to ignore what people are saying about you. If someone’s talking about you on the web, you have to go into Google and search to find that out. If someone’s talking about you on Twitter, though, it’s very likely right in your face. This can be flattering if people are saying nice things, but if they’re not, it can feel embarrassing and/or painful; and people who are embarrassed or wounded tend to do stupid things, like lash back at the person who did the wounding, that they regret later when the pain has worn off.
  3. New Ways of Seeing — new BBC show from James Bridle which looks to be great. (via The Guardian)
  4. Why Software Projects Take Longer Than You Think—a Statistical ModelA reasonable model for the “blowup factor” would be something like a log-normal distribution. If the estimate is one week, then let’s model the real outcome as a random variable distributed according to the log-normal distribution around one week. This has the property that the median of the distribution is exactly one week, but the mean is much larger […]
Article image: Four short links

Four short links: 16 April 2019

Four short links: 16 April 2019

Four short links
  1. Facebook Transparency Tool (Buzzfeed) — A transparency tool on Facebook inadvertently provides a window into the confusing maze of companies you’ve never heard of who appear to have your data.
  2. Microsoft’s AI Research with Chinese Military University Fuels Concerns (SCMP) — “The new methods and technologies described in their joint papers could very well be contributing to China’s crackdown on minorities in Xinjiang, for which they are using facial recognition technology,” said Helena Legarda, a research associate at the Mercator Institute for China Studies, who focuses on China’s foreign and security policies.
  3. @justsaysinmice — points out bogus science claims by adding “in mice” where appropriate. Genius.
  4. What Machine Learning Needs from Hardware (Pete Warden) — More arithmetic; Inference; Low Precision; Compatibility; Codesign.
Article image: Four short links

Four short links: 15 April 2019

Four short links: 15 April 2019

Four short links
  1. You Should Organize a Study Group/Book Club/Online Group/Event! Tips on How to Do It (Stephanie Hurlburt) — good advice on how to get people together.
  2. Berkeley Open ArmsBerkeley Open Arms manufactures the BLUE robot arm that was developed at UC Berkeley’s Robot Learning Lab. Paper (arXiv link).
  3. Human Contact is a Luxury Good (NYT) — Life for anyone but the very rich—the physical experience of learning, living, and dying—is increasingly mediated by screens. Not only are screens themselves cheap to make, but they also make things cheaper. […] The rich do not live like this. The rich have grown afraid of screens. They want their children to play with blocks, and tech-free private schools are booming. Humans are more expensive, and rich people are willing and able to pay for them. Conspicuous human interaction—living without a phone for a day, quitting social networks and not answering email—has become a status symbol.
  4. ArchiveBoxThe open source self-hosted web archive. Takes browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more.
Article image: Four short links

Strata San Francisco, 2019: Opportunities and Risks

Strata San Francisco, 2019: Opportunities and Risks

San Francisco

(source: O’Reilly)

The Strata Data Conference in San Francisco was filled with speakers talking about opportunity. But those opportunities were balanced against risks—risks that loom large as we discover more powerful ways to apply data using machine learning and artificial intelligence. It’s a necessary tension we’ll need to understand as we continue on the journey into the age of data.

Cloudera’s merger with HortonWorks demonstrates some of the opportunities. They “drank their own champagne” (a metaphor preferable to eating dog food) by using machine learning to merge the two companies: clustering similar customers, predicting sales opportunities, and integrating the two teams.

In his keynote, program co-chair Ben Lorica gave excellent advice for organizations that are just starting on the road to machine learning: companies that have been successful with machine learning have either built on existing data products or services, or used machine learning to modernize existing applications. Companies that attempt to make a leap into the void, working with data and services they don’t understand well, will have a rough time. Machine learning grows out of your current data practices. It may be revolutionary, but if you haven’t prepared for the revolution by developing your data sources, learning how to clean your data, preparing for data governance, and more, you’ll inevitably fall behind. Fortunately, there are tools—both open source and commercial—to help in all these areas.

Some of the most important opportunities are for democratizing data: not just making data accessible, but making it usable by everyone in the organization, even those without programming skills. Jeremy Howard’s session showed how a subject expert with no prior programming knowledge can make an AI application. Howard told me about a dermatologist who has built an application that classifies burns. (He also recommended against watching the demo before lunch.) Efforts like this are key to building AI systems that create a better world. Emergency responders need tools that assist them in the field, tools that can be built into their phones, and let them make decisions without waiting for an MD.

According to Mike Olson, the most important thing we’ve learned from cloud computing is that “easy seriously matters.” Easy doesn’t just mean you can pay for computing with your credit card, or add and subtract servers at a moment’s notice. And it doesn’t just mean providing good tools for analytics. Easy applies to every aspect of computing, particularly self-service data. Easy means making tools for building data pipelines that don’t care where the data is physically located (in a data center or the cloud), and that understand regulations governing that data and how it is used, and that make data accessible without requiring programming skills. These are tools that can be used by anyone, not just engineers and data analysts: managers, executives, and sales and marketing folks.

Moving data and computing to the cloud remains a tremendous opportunity. We’re still in the early days of cloud computing: many companies that could move their data to the cloud haven’t yet done so. Jordan Tigani of Google talked about the many opportunities the cloud represents, starting with decoupling data storage from computation, reducing administrative overhead, building real-time pipelines, eliminating silos, and enabling access for all users. All these benefits flow naturally from moving data to the cloud and relying on the scale of infrastructure that only cloud providers give you.

What about the risks? Several speakers, including Peter Singer and David Sanger, talked about the dangers of an increasingly militarized network. Peter Singer said: “There is no silver bullet. There will continue to be marketing, politics, wars, all taking place online. We need new strategies for dealing with it.” These dangers increase as our tools become more powerful; Singer said that we can look forward to “deep face” (fake videos), and Elizabeth Svoboda discussed how neuroscience is already used to construct political messages that trigger fear responses.

We also heard about progress toward meeting these challenges. Shafi Goldwasser challenged developers to create “Safe ML”: machine learning that can’t be abused. Machine learning needs to ensure privacy, both of the training data and the model, and needs to be fair and invulnerable to tampering. The tools we need to create Safe ML have been under development among cryptographers for the past 30 years, well before modern machine learning became practical. The challenge facing machine learning developers is taking these tools—federated learning, multiparty cryptography, homomorphic encryption, and differential privacy—and putting them to use. Her points were echoed in several other sessions throughout the conference.

At the ethics summit, participants discussed the many problems in building software systems ethically. There are clearly dangers here: hardly a day goes by without news of data abuse. But perhaps the most interesting discussion was whether ethics is a zero sum game or a business opportunity. Does treating customers fairly and respecting their individuality and their privacy represent an opportunity? There are a lot of things you can say about Amazon’s business practices, but almost nobody criticizes the ease with which you can return merchandise. What other opportunities are there? Many customers have become cynical, and expect to be treated badly; too few companies have thought seriously about using data to make their customers’ lives better. That may be changing.

These themes were echoed in the Future of the Firm track, which focused on rethinking the corporation for the digital era. The future isn’t just about “implementing AI,” but about building organizations that work better: that support their employees’ training needs, that listen to their employees on ethical issues, that take a human-centered approach to AI. The future of the firm is about taking advantage of data—but it’s about taking advantage of data to build a better future for customers, employees, and investors.

Putting data to work is an opportunity; we’ve been making that point since the first Strata conference. The risks of a hostile, militarized network are real. But the opportunities—for corporations, for employees, for customers—are far greater.

Article image: San Francisco

(source: O’Reilly).

Four short links: 12 April 2019

Four short links: 12 April 2019

Four short links
  1. Tea: A High-level Language and Runtime System for Automating Statistical AnalysisIn Tea, users express their study design, any parametric assumptions, and their hypotheses. Tea compiles these high-level specifications into a constraint satisfaction problem that determines the set of valid statistical tests, and then executes them to test the hypothesis. Open source.
  2. Chinese AI — the things that you probably don’t realize about Chinese AI, such as the language gap disadvantaging Western researchers. (via BoingBoing)
  3. It’s Time to Think about Jurisdictional Data Sovereignty (Kris Constable) — not something that Americans think about, but which the rest of the world is chewing on.
  4. The Curious Case of Public Sans (Matthew Butterick) — Public Sans is a derivative work of Franklin Sans, which requires derivatives to be released under Open Font License (OFL). But work of a government employee or agency is in the public domain. Oof.
Article image: Four short links