  1. Why Data is Never RawIn scientific research, the choice of what to measure and how is fundamental. But in many cases, especially in the social sciences, what we want to capture doesn’t already have a clear measurement. It must therefore be “operationalized” somehow—meaning we must create a technique for measuring it. This necessarily requires emphasizing some aspects over others. Just as thought involves focusing, data collection involves narrowing attention; something is always left out.
  2. Jericho — Microsoft’s open source environment that connects learning agents with interactive fiction games. Using the fabulous Frotz, of course.
  3. Algorithms — new textbook from UIUC professor Jeff Erickson.
  4. The Digital Revolution Isn’t Over, But Has Turned Into Something Else (George Dyson) — The digital revolution began when stored-program computers broke the distinction between numbers that mean things and numbers that do things. Numbers that do things now rule the world. But who rules over the machines? (via BoingBoing)
  1. Tokyo Cafe Staffed by Robots Controlled by Paralyzed PeopleDeveloped by Ory, a startup that specializes in robotics for disabled people, the OriHime-D is a 120 cm (4-foot) tall robot that can be operated remotely from a paralyzed person’s home. Even if the operator only has control of their eyes, they can command OriHime-D to move, look around, speak with people, and handle objects. (via Dan Hon)
  2. The Reuniona new science fiction story about surveillance in China by Chen Qiufan, published in MIT TR.
  3. Lessons from Running a Small-Scale Electronics Factory in my Guest Bedroom — hardware is hard. Lots of things you only learn by getting amongst it.
  4. Inter UIa typeface specially designed for user interfaces with a focus on high legibility of small-to-medium sized text on computer screens.
  1. Amazon Marketplace ScamsAs Amazon has escalated its war on fake reviews, sellers have realized that the most effective tactic is not buying them for yourself, but buying them for your competitors—the more obviously fraudulent the better. A handful of glowing testimonials, preferably in broken English about unrelated products and written by a known review purveyor on Fiverr, can not only take out a competitor and allow you to move up a slot in Amazon’s search results, it can land your rival in the bewildering morass of Amazon’s suspension system. (via Marginal Revolution)
  2. Growing Public Domain — the public domain now includes “In the Orchard” and “Mrs Dalloway in Bond Street,” by Virginia Woolf; “The Ego and the Id,” by Sigmund Freud (original German version); “Towards a New Architecture,” by Le Corbusier (original French version);
    “The Murder of Roger Ackroyd” and “The Murder on the Links,” by Agatha Christie; “The Lurking Fear,” by H.P. Lovecraft; “Duino Elegies,” by Rainer Maria Rilke (original German version); “Safety Last!” and “Why Worry?,” by Harold Lloyd; M. C. Escher—”Dolphins”; Pablo Picasso—”The Pipes of Pan” and “Paulo on a Donkey”; and Paul Klee—”Architecture, Tightrope Walker, and Masks.”
  3. Russia vs. Telegram: Technical Notes on the Battle — a CCC talk. Spoiler alert: Russia didn’t succeed, and in trying, they also banned IP addresses of major local businesses (VKontakte, Yandex, and others), presumably, by mistake. A flaw in the filter was exploited to bring one of the major ISPs down for a while. Moscow internet exchange point announced that a like flaw of the filter could be used to disrupt peering.
  4. Guesstimateopen source spreadsheet for things that aren’t certain where you can create Fermi estimates and perform Monte Carlo estimates. I’ve linked to this before, but I hadn’t realized it’s open source. Development has slowed, the founders are busy elsewhere, but it’s a promising idea.
  1. SchemaCrawlerFree database schema discovery and comprehension tool. Make sense of the databases you inherit.
  2. EU To Fund Bug Bounties for Open Source Projects (ZD Net) — this is good, but insufficient. See Katie Moussouris.
  3. Essential C — a sweet little summary of C, an even terser K&R.
  4. AI, Game Theory, and Poker (YouTube) — a talk by Tuomas Sandholm, CMU professor and co-creator of Libratus, which is the first AI system to beat top human players at the game of Heads-Up No-Limit Texas Hold’em. From the AI Podcast.
  1. Updating: A Set of Bayesian NotesNotes on Bayesian methods – written to supplement CS&SS/STAT 564: Bayesian Statistics for the Social Sciences.
  2. How Much of the Internet is Fake? (NY Mag) — What’s gone from the internet, after all, isn’t “truth,” but trust: the sense that the people and things we encounter are what they represent themselves to be.
  3. TensorFlow PrivacyLibrary for training machine learning models with privacy for training data.
  4. Universally Unique Lexicographically Sortable Identifiers128-bit compatibility with UUID; 1.21e+24 unique ULIDs per millisecond; Lexicographically sortable!; Canonically encoded as a 26 character string, as opposed to the 36 character UUID; Uses Crockford’s base32 for better efficiency and readability (5 bits per character); Case insensitive; No special characters (URL safe); Monotonic sort order (correctly detects and handles the same millisecond).
  1. Reading Rats’ Minds (MIT) — In recent years, scientists have shown that by recording the electrical activity of groups of neurons in key areas of the brain, they could read a rat’s thoughts of where it was, both after it actually ran the maze and also later when it would dream of running the maze in its sleep—a key process in consolidating its memory. In the new study, several of the scientists involved in pioneering such mind-reading methods now report they can read out those signals in real time as the rat runs the maze, with a high degree of accuracy and the ability to account for the statistical relevance of the readings almost instantly after they are made. […] The software of the system is open source and available for fellow neuroscientists to download and use freely, Chen and Wilson say. Rats not included. The paper is open access, too.
  2. yyyy and YYYY: Why Your Year May Be Wrong (Erica Sadun) — The presence of YYYY in the date format without its expected supporting information reduces to “start of year, go back one week, report the first day.” (I’ll explain this more in just a little bit.)
  3. Conversation with Juergen Schmidhuberthe co-creator of long short-term memory networks (LSTMs) that are used in billions of devices today for speech recognition, translation, and much more. … The history of science is the history of compression progress. Metalearning, self-referential programs, and more. It’s a dry discussion of fiery ideas. (via hardmaru)
  4. Scanning 250 Pages/MinuteOur system continuously observes 3D deformation of each flipped page at 500 times per second and recognizes the best moment for book image digitization. The video is hypnotic. (via Reza Zadeh)
  1. Evil FizzBuzz (Jason Gorman) — a really clever CI exercise for a team.
  2. EmuTOS — open source reimplementation of the original Atari ST operating system. (via Hacker News)
  3. Teach Yourself Logic: A Study Guide — a wonderfully chatty book that functions as an introduction to logic for mathematicians and philosophers.
  4. Lenia: Biology of Artificial Lifea new model of artificial life called Lenia (from Latin lenis “smooth”), a two-dimensional cellular automaton with continuous space-time-state and generalized local rule. Computer simulations show that Lenia supports a great diversity of complex autonomous patterns or “lifeforms” bearing resemblance to real-world microscopic organisms. More than 400 species in 18 families have been identified, many discovered via interactive evolutionary computation. They differ from other cellular automata patterns in being geometric, metameric, fuzzy, resilient, adaptive, and rule-generic. Implementation with source.
  1. Maxclave (Bunnie Huang) — you thought software testing was hard? Welcome to the world of hardware testing.
  2. Biological One-Way Functions for Secure Key GenerationIt is demonstrated that the spatiotemporal dynamics of an ensemble of living organisms such as T cells can be used for maximum entropy, high‐density, and high‐speed key generation.
  3. Christmas Robot Roundup (IEEE) — selection of holiday greetings from various robots and robotics companies. I for one welcome our new tinsel-and-holly-clad industrial apparatus overlords.
  4. Congress Votes to Make Open Government Data the Default in the United StatesThe Open, Public, Electronic, and Necessary Government Data Act (AKA the OPEN Government Data Act) is about to become law […]. This codifies two canonical principles for democracy in the 21st century: 1. public information should be open by default to the public in a machine-readable format, where such publication doesn’t harm privacy or security. 2. federal agencies should use evidence when they make public policy. Merry Christmas, democracy; here’s a small present in a bad year.
  1. Solving Murder with Prolog — if THIS was the motivating example for Prolog, I’d have taken to it a lot sooner! I love those logic puzzle books.
  2. The Machine Learning Race is Really a Data Race (MIT Sloan Review) — Organizations that hope to make AI a differentiator need to draw from alternative data sets—ones they may have to create themselves.
  3. Photo Wakeup: 3-D Character Animation from a Single Photo — this is incredible work. Watch the video if nothing else.
  4. EtcherFlash OS images to SD cards and USB drives, safely and easily. Open source.
