Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.


THE ML ENGINEER πŸ€–
Issue #21
 
 
This week in Issue #21:
Alibi black box ML explanations, Karpathy's tips on NNs, Nando on learning to learn, a gentle intro to imagenet, sparkML with Kafka, computer vision, ML streaming libraries, upcoming ML conferences, data science / ML engineering jobs and more πŸš€.
 
Support the ML Engineer!
Forward the email, or share the online version on 🐦 Twitter,  πŸ’Ό Linkedin and  πŸ“• Facebook!
 
If you would like to suggest articles, ideas, papers, libraries, jobs, events or provide feedback just hit reply or send us an email to a@ethical.institute! We have received a lot of great suggestions in the past, thank you very much for everyone's support!
 
 
 
The team behind SeldonIO has open sourced a library to perform explanations on black box machine learning models. Their initial release brings together three incredibly interesting (and very well documented) approaches towards explainability. These include anchor methods, contrastive explanations, and trust scores. It's highly recommended to read through the documentation as they have provided thorough high level explanations on their approaches, together with references to really interesting research in this area.
 
 
Andrej Karpathy has put together yet another incredible resource - this time on tips around training neural networks. This post was inspired from his recent tweet which outlined the most common mistakes training a NNs. In this post Andrej argues a recipe to deal the the natural challenges of the leaky abstractions and silent failures that come with neural networks. His tips include "becoming one with the data", setting up training/evaluation skelletons, avoiding overfit, using regularization, tuning, and "squeezing the juice".
 
 
Nando de Freitas gave a talk in London last week with the Alan Turing Institute, which covered a wide range of topics, including a high level introduction to machine learning, and then a more specific overview on the work he's currently focusing on: Meta-learning. The field of meta-learning, and multi-task learning is incredibly interesting, as it requires machine learning models to succeed at a wide variety of tasks, without access to the sheer amounts of data that the world of deep learning requires. Nando released a paper a few years back titled "Learning to learn by gradient descent by gradient descent", which showcases how this technique can be exploited to build more general and even reusable algorithms.
 
 
Machine Learning Mastery comes back this week with a great post shedding light on a topic you may have heard repeatedly, the "ImageNet Challenge". In this tutorial, Jason provides an overview of what the ImageNet Challenge is, together with an insight on the dataset (21k classes and 1m+ images), and talks about the deep learning achievements that have appeared throughout the last few years.
 
 
Following up from our talk last week on Real Time Machine Learning using Kafka and Spark Streaming, this week we have open sourced the one-click deploy foundation on docker-compose, together with a simple tutorial that allows you to get started quickly with real time ML streams. The brief overview provides instructions on how to 1) run the whole stack after installing docker-compose, 2) run a producer that pushes data to the stream, 3) run a consumer that processes the data, and 4) monitor the whole stack using Grafana and Kafka Manager.
 
 
TyrosLabs has put together a great and extensive introduction to computer vision, where they outline the areas in which computer vision has been used. The post also covers all-things-computer-vision from a higher level, and provides links that could give new-comers to the field a good intuition not only around techniques but also around the business and practical applications of these tools.
 
 
 
 
MLOps = Featured OS Libraries
The theme for this week's featured ML libraries is Real time Machine Learning with data streaming pipelines, which falls on our Responsible ML Principle #4. This week we want to dive deeper and feature some fast growing libraries in this space - four featured libraries on data stream processing this week are:
 
  • Apache Flink - Open source stream processing framework with powerful stream and batch processing capabilities.
  • Faust - Streaming library built on top of Python’s Asyncio library using the async kafka client inspired by the kafka streaming library.
  • Kafka Streams - Kafka client library for buliding applications and microservices where the input and output are stored in kafka clusters
  • Spark Streaming - Micro-batch processing for streams using the apache spark framework as a backend supporting stateful exactly-once semantics
 
If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request
 
 
 
We feature conferences that have core  ML tracks (primarily in Europe for now) to help our community stay up to date with great events coming up.
 
Technical Conferences
 
 
  • AI Conference Beijing [18/06/2019] - O'Reilly's signature applied AI conference in Asia in Beijing, China.
 
 
  • Data Natives [21/11/2019] - Data conference in Berlin, Germany.
 
  • ODSC Europe [19/11/2019] - The Open Data Science Conference in  London, UK.
 
 
 
 
 
Business Conferences
 
  • World Summit AI Americas [10/04/2019] - Large scale AI summit in Montreal, Canada.
    • Come join our panel on AI Ethics and Tools.
 
 
 
  • Big Data LDN 2019 [13/11/2019] - Conference for strategy and tech on big data in London, UK.
 
 
 
We showcase Machine Learning Engineering jobs (primarily in London for now) to help our community stay up to date with great opportunities that come up. It seems that the demand for data scientists continues to rise!
 
Leadership Opportunities
 
Mid-level Opportunities
 
Junior Opportunities
 
 
 
 
 
Β© 2018 The Institute for Ethical AI & Machine Learning