Four short links: 4 December 2018

Four short links: 4 December 2018

Four short links
  1. Fifteen Unconventional Uses of Voice Technology (Nicole He) — Students had half a semester to learn tools like the Web Speech API, Dialogflow, and Actions on Google, and then were tasked with making something…interesting. The in-class code examples we used are on GitHub. Here are 15 funny, subversive, and impressively weird final projects from the class.
  2. Summary of 2018’s Most Important AI PapersTo help you catch up, we’ve summarized 10 important AI research papers from 2018 to give you a broad overview of machine learning advancements this year. There are many more breakthrough papers worth reading as well, but we think this is a good list for you to start with.
  3. arbtt — a time tracker that sits in the background. You write rules that tell it how to categorize your activity.
  4. Microsoft Simple Encrypted Arithmetic Libraryan easy-to-use but powerful homomorphic encryption library written in C++. It supports both the BFV and the CKKS encryption schemes. (via Microsoft Research Blog)
Article image: Four short links

Four short links: 5 November 2018

Four short links: 5 November 2018

Four short links
  1. StormcheckerA modern model checker for probabilistic systems. Test your models of your distributed system.
  2. MonoCorpusa note-taking app for software and machine learning engineers meant to encourage learning, sharing, and easier development. Increase documentation for yourself and your team without slowing your velocity. Take notes as part of your process instead of dedicating time to writing them. An interesting use for notebooks.
  3. OdinDeploy your 12-factor-applications to AWS easily and securely with the Odin, an AWS Step Function based on the step framework that deploys services as auto-scaling groups (ASGs).
  4. Toward an AI Physicist for Unsupervised LearningWe investigate opportunities and challenges for improving unsupervised machine learning using four common strategies with a long history in physics: divide-and-conquer, Occam’s Razor, unification, and lifelong learning. Instead of using one model to learn everything, we propose a novel paradigm centered around the learning and manipulation of *theories*, which parsimoniously predict both aspects of the future (from past observations) and the domain in which these predictions are accurate. (see also MIT TR)
Article image: Four short links

The freedom of Kubernetes

The freedom of Kubernetes

This is a keynote highlight from the O’Reilly Velocity Conference in London 2018. Watch the full version of this keynote on O’Reilly’s online learning platform.

You can also see other highlights from the event.

Article image: Geometric

(source: Pixabay).

What changes when we go offline-first?

What changes when we go offline-first?

This is a keynote highlight from the O’Reilly Velocity Conference in London 2018. Watch the full version of this keynote on O’Reilly’s online learning platform.

You can also see other highlights from the event.

Article image: Grid

(source: Pixabay).

Learning from the web of life

Learning from the web of life

This is a keynote highlight from the O’Reilly Velocity Conference in London 2018. Watch the full version of this keynote on O’Reilly’s online learning platform.

You can also see other highlights from the event.

Article image: Web

(source: Pixabay).

Four short links: 3 December 2018

Four short links: 3 December 2018

Four short links
  1. Amazon is Competing with Its CustomersWhat’s more, Kreps said, Amazon has not contributed a single line of code to the Apache Kafka open source software and is not reselling Confluent’s cloud tool. Sometimes Amazon contributes back, but increasingly often it seems like its software MO is exploitation not co-creation. This is what prompted the creation of various “open except if you resell it as a cloud service”-source licenses, like the Commons Clause.
  2. kbd-audiotools for capturing and analyzing keyboard input paired with microphone capture.
  3. Kubernetes is the OS That Matters (Matt Asay) — provocative clickbait title, but the point is important: if single-machine apps are the exception, then the lowest layer of critical shared software is no longer the OS but instead the cluster manager.
  4. Software Sprawl, The Golden Path, and Scaling Teams with Agency (Charity Majors) — good talk on how to recover from “we’re using too many shiny tools, and it’s hard to make progress because there’s no common set of tools, so everyone’s reinventing the wheel, and omg fire.”
Article image: Four short links

Four short links: 8 November 2018

Four short links: 8 November 2018

Four short links
  1. ASAP: Fast, Approximate, Graph Pattern Mining at Scale (Usenix) — we present A Swift Approximate Pattern-miner (ASAP), a system that enables both fast and scalable pattern mining. ASAP is motivated by one key observation: in many pattern mining tasks, it is often not necessary to output the exact answer […] an approximate count is good enough. (via Morning Paper)
  2. Binci — tackling the same problem space as Docker Compose, but aimed at ephemeral containers rather than long-running ones (e.g., for test/CI systems).
  3. Metrics for Investors (Andrew Chen) — detailed take on the metrics through which investors view SaaS businesses.
  4. How to Fit Large Neural Networks on the EdgeThis blog explores a few techniques that can be used to fit neural networks in memory-constrained settings. Different techniques are used for the “training” and “inference” stages, and hence they are discussed separately.
Article image: Four short links

Four short links: 7 November 2018

Four short links: 7 November 2018

Four short links
  1. Fast Abstractive Summarization with Reinforce-Selected Sentence RewritingInspired by how humans summarize long documents, we propose an accurate and fast summarization model that first selects salient sentences and then rewrites them abstractively (i.e., compresses and paraphrases) to generate a concise overall summary. We use a novel sentence-level policy gradient method to bridge the non-differentiable computation between these two neural networks in a hierarchical way, while maintaining language fluency. Source code available.
  2. KBPediaa comprehensive knowledge structure for promoting data interoperability and knowledge-based artificial intelligence, [which] combines seven “core” public knowledge bases—Wikipedia, Wikidata,, DBpedia, GeoNames, OpenCyc, and UMBEL—into an integrated whole. Now has a serious open source offering.
  3. Baidu Opens AI Park in Beijingautonomous buses, smart walkways that track people’s steps using facial recognition, intelligent pavilions equipped with the company’s conversational DuerOS system, and augmented reality Tai Chi lessons. It’s theatre, but theatre sets perceptions. In this case, the perception that China is miles ahead of America in AI. It was the AR Tai Chi that caught my eye.
  4. TRE: A Regex Engine with Approximate MatchingIt does this by calculating the Levenshtein distance (number of insertions, deletions, or substitutions it would take to make the strings equal) as it searches for a match.
Article image: Four short links

140 live online training courses opened for November, December, and January

140 live online training courses opened for November, December, and January

Computer and paper on a desk

(source: Pexels via Pixabay)

Learn new topics and refine your skills with 140 live online training courses we opened up for November, December, and January on our learning platform.

Artificial intelligence and machine learning

Artificial Intelligence for Big Data, November 28-29

Essential Machine Learning and Exploratory Data Analysis with Python and Jupyter Notebook , December 3

Deep Learning for Machine Vision, December 4

Beginning Machine Learning with Scikit-Learn, December 5

Managed Machine Learning Systems and Internet of Things, December 5-6

Natural Language Processing (NLP) from Scratch, December 7

Machine Learning in Practice, December 7

Deep Learning with TensorFlow, December 12

Getting Started with Machine Learning, December 12

Essential Machine Learning and Exploratory Data Analysis with Python and Jupyter Notebook, January 7-8

Artificial Intelligence: AI for Business, January 9

Managed Machine Learning Systems and Internet of Things, January 9-10

Applied Deep Learning for Coders with Apache MXNet, January 10-11

Artificial Intelligence: An Overview of AI and Machine Learning, January 15

Hands-On Machine Learning with Python: Classification and Regression, January 16

Hands-On Machine Learning with Python: Clustering, Dimension Reduction, and Time Series Analysis, January 17


Building Smart Contracts on the Blockchain, November 29-30

Introducing Blockchain, December 7

Understanding Hyperledger Fabric Blockchain, December 10-11

Blockchain Applications and Smart Contracts: Developing Decentralized, December 13


Spotlight on Innovation: The Future Beyond Digital, Entering a New Era of Exploration and Collaboration, November 28

Negotiation Fundamentals, December 7

Applying Critical Thinking, December 10

How to Give Great Presentations, December 10

Performance Goals for Growth, December 12

Leadership Communication Skills for Managers, January 9

Introduction to Critical Thinking, January 10

Introduction to Delegation Skills, January 10

Why Smart Leaders Fail, January 15

Data science and data tools

Real-Time Data Foundations: Kafka, December 3

Real-Time Data Foundations: Spark, December 4

Getting Started with Pandas, December 5

Getting Started with Python 3, December 5-6

Mastering Pandas, December 6

Real-Time Data Foundations: Flink, December 7

Apache Hadoop, Spark, and Big Data Foundations, December 10

Real-Time Data Foundations: Time Series Architectures, December 10

Sentiment Analysis for Chatbots in Python, December 11

Hands-on Introduction to Apache Hadoop and Spark Programming, December 12-13

Building Intelligent Bots in Python, December 13

Intermediate Machine Learning with Scikit-Learn, December 17


3ds Max and V-Ray: The Path Towards Photorealism, December 14


Java Full Throttle with Paul Deitel: A One-Day, Code-Intensive Java Standard Edition Presentation, November 15

Next-Generation Java Testing with JUnit 5, November 15

Mastering the Basics of Relational SQL Querying, November 19-20

Designing Bots and Conversational Apps for Work, November 29

Pythonic Object-Oriented Programming, December 3

Bash Shell Scripting in 3 Hours, December 3

Beyond Python Scripts: Logging, Modules, and Dependency Management, December 5

Linux Filesystem Administration, December 5-6

Beyond Python Scripts: Exceptions, Error Handling, and Command-Line Interfaces, December 6

Next Level Git – Master your Workflow, December 6

Programming with Java 8 Lambdas and Streams, December 6

Consumer Driven Contracts – A Hands-On Guide to Spring Cloud Contract, December 10

SQL for Any IT Professional, December 10

Linux Under the Hood, December 10

Linux Troubleshooting, December 11

Scalable Concurrency with the Java Executor Framework, December 11

Next Level Git – Master your Content, December 13

Linux Performance Optimization, December 13

Mastering Go for UNIX Administrators, UNIX Developers, and Web Developers, December 13-14

Getting Started with Java: From Core Concepts to Real Code in 4 Hours, December 17

Reactive Spring Boot, December 17

Scala Fundamentals: From Core Concepts to Real Code in 5 Hours, December 18

Programming with Data: Python and Pandas, December 18

Spring Boot and Kotlin, December 18

Julia 1.0 Essentials, December 18

Functional Design for Java 8, December 18-19

Java 8 Generics in 3 Hours, December 20

Python: The Next Level, January 7-8

Design Patterns Boot Camp, January 9-10

Learning Python 3 by Example, January 10

Modern JavaScript, January 14

Learn the Basics of Scala, January 14

Getting Started with Pandas, January 14

Introduction to JavaScript Programming, January 14-15

Getting Started with Python 3, January 14-15

Mastering Pandas, January 15

Scaling Python with Generators, January 15

Getting Started with Pytest, January 16

OCA Java SE 8 Programmer Certification Crash Course, January 16-18

Mastering Python’s Pytest, January 17

Pythonic Design Patterns, January 18

Visualization in Python with Matplotlib, January 18


Cybersecurity Offensive and Defensive Techniques in 3 Hours, December 7

Cyber Security Fundamentals, December 10-11

Certified Ethical Hacker (CEH) Crash Course, December 13-14

Intense Introduction to Hacking Web Applications, December 17

CCNA Security Crash Course, December 18-19

CompTIA PenTest+ Crash Course, December 18-19

CompTIA Security+ SY0-501 Crash Course, January 7-8

AWS Certified Security – Specialty Crash Course, January 7-8

AWS Advanced Security with Config, GuardDuty, and Macie, January 14

Ethical Hacking Bootcamp with Hands-on Labs, January 15-17

CompTIA Security+ SY0-501 Certification Practice Questions and Exam Strategies, January 16

Cyber Ops SECFND 210-250 Crash Course, January 16

CCNA Cyber Ops SECOPS 210-255 Crash Course, January 18

Software architecture

Developing Incremental Architecture, December 10

Implementing Evolutionary Architectures, December 13-14

Architecture for Continuous Delivery, December 17

Architecture by Example, December 17-18

Comparing Service-Based Architectures, December 18

Software Architecture for Developers, January 7

Systems engineering and operations

Automating with Ansible, December 3

An Introduction to DevOps with AWS, December 3

Red Hat Certified Engineer (RHCE) Crash Course, December 4-7

9 Steps to Awesome with Kubernetes, December 5

Ansible for Managing Network Devices, December 5

Amazon Web Services: AWS Managed Services, December 5-6

Network Troubleshooting Using the Half Split and OODA, December 6

Google Cloud Certified Associate Cloud Engineer Crash Course, December 6-7

Getting Started with Continuous Delivery (CD), December 10

AWS Monitoring Strategies, December 10

Practical Docker, December 11

Getting Started with Amazon Web Services (AWS), December 11-12

Amazon Web Services: Architect Associate Certification – AWS Core Architecture Concepts, December 11-12

CCNP R/S ROUTE (300-101) Crash Course, December 11-13

Ansible in 3 Hours, December 12

Amazon Web Services: AWS Design Fundamentals, December 13-14

Deploying Container-Based Microservices on AWS, December 13-14

Kubernetes in 3 Hours, December 14

Jenkins 2: Up and Running, December 17

CompTIA Cloud+ CV0-002 Exam Prep, December 17

CCNP R/S SWITCH (300-115) Crash Course, December 17-19

Google Cloud Platform (GCP) for AWS Professionals, December 18

AWS CloudFormation Deep Dive, January 7-8

Red Hat Certified System Administrator (RHCSA) Crash Course, January 7-10

Building a Cloud Roadmap, January 9

Implementing and Troubleshooting TCP/IP, January 9

Docker: Up and Running, January 9-10

Building Distributed Pipelines for Data Science Using Kafka, Spark, and Cassandra, January 9-11

Understanding AWS Cloud Compute Options, January 10-11

Istio on Kubernetes: Enter the Service Mesh, January 16

AWS Certified SysOps Administrator (Associate) Crash Course, January 16-17

Chaos Engineering: Planning and Running Your First Game Day, January 17

Visualizing Software Architecture with the C4 Model, January 18

Web programming

Hands-On Chatbot and Conversational UI Development, December 3-4

Building APIs with Django REST Framework, December 17

Developing Web Apps with Angular and TypeScript, December 17-19

Rethinking REST: A Hands-On Guide to GraphQL and Queryable APIs, December 18

Kubernetes’ scheduling magic revealed

Kubernetes’ scheduling magic revealed

Magic composition

(source: Pixabay)

Kubernetes is an industry-changing technology that allows massive scale and simplicity for the orchestration of containers. Most of us happily push thousands of deployments and pods to Kubernetes every day. Have you ever wondered what sorcery is at play in Kubernetes to determine where all those pods will be created in the Kubernetes cluster? All of this is made possible by the kube-scheduler.

Understanding how the Kubernetes scheduler makes scheduling decisions is critical in order to ensure consistent performance and optimal resource utilization. All scheduling in Kubernetes is done based upon a few key pieces of information. First, it is using information about the worker node to determine what the total capacity of the node is. Using kubectl describe node will give you all the information you need to understand regarding how the scheduler sees the world.

 cpu:                4
 ephemeral-storage:  103079200Ki
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             16427940Ki
 pods:               110
 cpu:                3600m
 ephemeral-storage:  98127962034
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             14932524020
 pods:               110

Here we see what the scheduler sees as being the total capacity of the worker node as well as the allocatable capacity. The allocatable numbers factor in kubelet settings for Kubernetes and system reserved space. Allocatable represents the total space the scheduler has to work with for a given node.

Next, we need to look at how we instruct the scheduler about our workload. It is important to note that Kubernetes does not consider actual CPU and memory utilization of a workload. It factors in only the resource descriptions provided by the developer or operator. Here is an example from a pod object definition:

        cpu: 100m
        memory: 170Mi
        cpu: 100m
        memory: 170Mi

These are the specifications provided at the container level. The developer must provide these resource requests and limits on a per container basis, not per pod. What do these specifications mean? The limits are only considered by the kubelet and are not a factor during scheduling. This indicates that the cgroup of this container will be set to limit CPU utilization to 10% of a single CPU core, and if memory utilization exceeds 170MB, then the process will be killed and restarted; there is no “soft” memory limit in Kubernetes use of cgroups. The requests are used by the scheduler to determine the best worker node on which to place this workload. Note that the scheduler is summing the resource requests of all containers in the pod to determine where to place it. The kubelet is enforcing limits on a per-container basis.

We now have enough information to understand the basic resource-based scheduling logic that Kubernetes uses. When a new pod is created, the scheduler looks at the total resource requests of the pod and then attempts to find the worker node that has the most available resources. This is tracked by the scheduler for each node, as seen in kubectl describe node:

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits   Memory Requests  Memory Limits
  ------------  ----------   ---------------  -------------
  1333m (37%)   2138m (59%)  1033593344 (6%)  1514539264 (10%)

You can investigate the exact details of the Kubernetes scheduler via the source code. There are two key concepts in scheduling. On the first pass, the scheduler attempts to filter the nodes that are capable of running a given pod based on resource requests and other scheduling requirements. On the second pass the scheduler weighs the eligible nodes based on absolute and relative resource utilization of the nodes and other factors. The highest weighted eligible node is selected for scheduling of the pod.

This post is part of a collaboration between O’Reilly and IBM. See our statement of editorial independence.

Article image: Magic composition

(source: Pixabay).