The Institute for Ethical AI & Machine Learning

Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.

THE ML ENGINEER
Issue #4

This week in Issue #4:

Andrew Ng on reproducibility, 2018 machine learning nostalgia, the magic of feature engineering, explainability & bias evaluation on Tensorflow, ensemble awesomeness, continuous D/I for ML, 4 AutoML libraries and more!

Support the ML Engineer!

Forward the email, or share the online version on 🐦 Twitter, 💼 Linkedin and 📕 Facebook!

If you would like to suggest articles, ideas, tutorials, libraries or provide feedback just hit reply or send us an email to a@ethical.institute!

Andrew Ng and Reproducibility

Andrew Ng kicks off the year with a very interesting discussion on twitter which touches upon our Principle #4 on infrastructure for reproducible operations. On our second edition of The Machine Learning Engineer newsletter we highlighed several awesome open source libraries that have been pushing the limits of machine learning reproducibility, including Data Version Control (DVC), QuiltData, Pachyderm and ModelDB.

2018 machine learning nostalgia

The year has begun with a lot of great posts reminiscing the 2018 machine learning highlights. AtlasML brought us Papers with Code in the MLE newsletter last week - this time they bring an awesome review of 2018 on the state of deep learning.

Magic of Feature Engineering

Feature engineering is still one of the most powerfull areas of machine learning. In this great Kaggle kernel, Feature Labs data scientist Will Koehrsen provides a comprehensive overview of how to approach the end to end data science workflow using the Home Credit Default Risk Competition dataset.

Explainability/bias with tensorflow

A hands-on talk by Alejandro Saucedo showcasing bias and explainability in deep learning by building a tensorflow model that automates a "loan approval process". One of the core points is that teams should focus on making sure the right touch-points, skillsets and best practices are in place throughout the design, development and deployment of machine learning systems. The talk covers techniques on data analysis, feature importance and model evaluation including SHAP and LIME.

Ensembles and more ensembles

Ensembles are just awesome. They have been amazing us with their performance by combining the best of different worlds through a large variety of different methods. The great team at Machine Learning mastery has put an impressive series on ensembles. They first provide a great (and broad) overview. They then provide hands on examples on averaging ensembles, bagging ensembles (making decision trees cool again since 2001), voting ensembles, weighted average ensembles, stacking ensembles and snapshot ensembles.

CI/CD for Machine Learning

Great post by Dillon from Paperspace on continuous integration and deployment for AI & machine learning. This was a highly discussed topic in 2018, which will certainly see interesting inovations in 2019. This post breaks down ML CI/CD into the areas of 1) data, 2) hardware, 3) training steps and 4) retraining/online.

MLOps = ML Operations

This edition we are focusing on feature engineering, and more specifically the automation side of it. The machine learning feature engineering libraries we're showcasing this week are:

auto-sklearn - Framework to automate algorithm and hyperparameter tuning for sklearn
TPOT - Automation of sklearn pipeline creation (including feature selection, pre-processor, etc)
tsfresh - Automatic extraction of relevant features from time series
Featuretools - An open source framework for automated feature engineering

If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request!