Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.

Issue #17
This week in Issue #17:
The SpaCy universe of resources, introducing Plotly Express, preprocessing images with Keras, calling out statistical significance, time series with tensorflow probability, massive multi-task learning, data streaming libraries, upcoming AI conferences, new Machine Learning jobs and more 🚀.
Support the ML Engineer!
Forward the email, or share the online version on 🐦 Twitter,  💼 Linkedin and  📕 Facebook!
If you would like to suggest articles, ideas, papers, libraries, jobs, events or provide feedback just hit reply or send us an email to! We have received a lot of great suggestions in the past, thank you very much for everyone's support!
The industrial-strength NLP framework SpaCy is known not only for its great features, but also for the awesome learning/documentation resources available. ExplosionAI cofounder Ines Montani shares two incredibly useful resources, 1) a two page SpaCy cheatsheet, and 2) a fully fledged SpaCy hands on online course. If that's not cool enough, SpaCy is hosting their first IRL 2-day conference on the 4th of July in Berlin which we have featured in our upcoming ML conferences list below.
Check out this great tutorial on pre-processing for image data using Keras. Great resource which provides hands on examples on how to use some of the most common pre-processing approaches to image data, including normalising, centering and standardising images. This article contains details on: 1) How to configure and a use the ImageDataGenerator class (in keras) for train, validation, and test datasets of images. 2) How to use the ImageDataGenerator to normalize pixel values when fitting and evaluating a convolutional neural network model. 3) How to use the ImageDataGenerator to center and standardize pixel values when fitting and evaluating a convolutional neural network model.
The international journal of science "Nature" has released a very interesting article which brings attention to a key challenge in the scientific community, and it is "calling to retire statistical significance and use confidence intervals" instead. The article argues that significant p-values may not always be fully representative - an issue which has led to overhyped claims and even the dismissal of possibly crucial effects. There are some really great initiatives in the machine learning community which help raise the bar for quality through reproducibility of results to ensure they can be evaluated properly: one very exciting initiative which we mentioned a few weeks ago is Papers With Code, which just released a new feature to provide GitHub badges that show SotA performance
Great announcement of a simple and high level wrapper for - Plotly Express. It exposes a simple syntax for complex charts. Inspired by Seaborn and ggplot2, it was specifically designed to have a terse, consistent and easy-to-learn API. with This library has been added to the visualisation frameworks section in our Awesome Machine Learning list. Plotly express includes faceting, maps, animations, and trendlines. Check Plotly Express, as well as the fully fledged Plotly documentation.
Forecasting can be an incredibly valuable analytical skill to have in your toolset. This short article provides a great introduction to the family of probability models for time series called "structural time series models" - this family of models encompass autoregressive processes, moving averages, local linear trends, seasonality and regression. The article also provides a hands on example using the Tensorflow Probability library, forcasting CO2 Concentration using data from the Mauna Loa observatory in Hawaii.
A group of Stanford researchers release an update on their work with Snorkel MeTal to tackle massive multi-task learning in natural language understanding. In this post, they talk about how they use Snorkel MeTaL to construct a simple model (pretrained BERT + linear task heads) and incorporate a variety of supervision signals (traditional supervision, transfer learning, multi-task learning, weak supervision, and ensembling) in a Massive Multi-Task Learning (MMTL) setting, achieving a new state-of-the-art score on the GLUE Benchmark and four of its nine component tasks (CoLA, SST-2, MRPC, STS-B).
MLOps = Featured OS Libraries
We are excited to add a new section to the MLOps library on data stream processing! Data stream processing falls on our Responsible ML Principle #4. The four featured libraries on data stream processing this week are:
  • Apache Flink - Open source stream processing framework with powerful stream and batch processing capabilities.
  • Faust - Streaming library built on top of Python’s Asyncio library using the async kafka client inspired by the kafka streaming library.
  • Kafka Streams - Kafka client library for buliding applications and microservices where the input and output are stored in kafka clusters
  • Spark Streaming - Micro-batch processing for streams using the apache spark framework as a backend supporting stateful exactly-once semantics
If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request
We feature conferences that have core  ML tracks (primarily in Europe for now) to help our community stay up to date with great events coming up.
Technical Conferences
  • DataFest19 [11/03/2019] - Two week festival of Data Innovation hosted across Scotland, UK.
  • AI Conference Beijing [18/06/2019] - O'Reilly's signature applied AI conference in Asia in Beijing, China.
  • Data Natives [21/11/2019] - Data conference in Berlin, Germany.
  • ODSC Europe [19/11/2019] - The Open Data Science Conference in  London, UK.
Business Conferences
  • World Summit AI Americas [10/04/2019] - Large scale AI summit in Montreal, Canada.
    • Come join our panel on AI Ethics and Tools.
  • Big Data LDN 2019 [13/11/2019] - Conference for strategy and tech on big data in London, UK.
We showcase Machine Learning Engineering jobs (primarily in London for now) to help our community stay up to date with great opportunities that come up. It seems that the demand for data scientists continues to rise!
Junior Opportunities
Mid-level Opportunities
Leadership Opportunities
© 2018 The Institute for Ethical AI & Machine Learning