The Institute for Ethical AI & Machine Learning

Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.

THE ML ENGINEER 🤖

Issue #39

This week in Issue #39:

Management insights for data science
Outlier selection vs detection
10 rules for sharing notebooks
AutoML and AI at Google
5 sampling algorithms for everyone
ML Explainability libraries
AI conferences
ML jobs
+ more 🚀

Forward the email, or share the online version on 🐦 Twitter, 💼 Linkedin and 📕 Facebook!

If you would like to suggest articles, ideas, papers, libraries, jobs, events or provide feedback just hit reply or send us an email to a@ethical.institute! We have received a lot of great suggestions in the past, thank you very much for everyone's support!

Management for Data Science

Management in data science teams has become a big challenge in organisations looking to introduce and grow their data science teams. This article provides an excellent ground up overview of management, from a practical perspective. It first introduces high level concepts you have probably come across, but then dives into a hands on case study building end-to-end data science infrastrtructure for a recommendation app startup, detailing the stakeholders involved, together with interactions and processes.

Selection vs Detection of outliers

Anomaly detection algorithms have been gaining popularity due to their practical use beyond traditional areas like fraud detection. In this context, this article does a great job defining nuanced terminology involved in this area - namely the difference between selection versus detection of outliers. The blog provides code that allows you to leverage sklearn's algorithms to approach these challenges with a "one size fits all" approach, that encompasses a pattern that can generalise into other techniques that could be useful in similar contexts.

Rules for sharing notebooks

With a fast increase in adoption of Jupyter (and other) notebook technologies, there has also been an increase in complexity around collaborating and maintaining notebooks. Last week we showed how Netflix tackles their challenges with data and software infrastructure. This week, we see another great piece that instead proposes 10 rules you can follow as an individual to make your notebooks easier to digest, maintain and extend.

AutoML and AI at Google

Large scale use of machine learning has introduced new complexities - with that, there has been a large amount of manual work that comes when finding the best parameters for an ML algorithm (such a neural network, random forest classifier and beyond). Practical AI brings us an excellent podcast from Google's Sheron Chen diving into AutoML and AI at Google, covering some of the most popular topics in machine learning at this time.

5 sampling algos for everyone

Data imbalance and representability of training vs production data often becomes a huge challenge, and is certianly often qutie a key point when the topic of "algorithmic bias" is raised. Here is a great blog post that introduces 5 common sampling techniques that every ML practitioner should be familiar with.

OSS: Explainability Libraries

The theme for this week's featured ML libraries is ML Explainability, and we're happy to share brand new libraries into that section. The four featured libraries this week are:

tensorflow's lucid - Lucid is a collection of infrastructure and tools for research in neural network interpretability.
rationale - Code to implement learning rationales behind predictions with code for paper "Rationalizing Neural Predictions"
anchor - Code for the paper "High precision model agnostic explanations", a model-agnostic system that explains the behaviour of complex models with high-precision rules called anchors.
woe - Tools for WoE Transformation mostly used in ScoreCard Model for credit rating

If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request!

MLConf = Conferences & Events

We feature conferences that have core ML tracks (primarily in Europe for now) to help our community stay up to date with great events coming up.

Technical & Scientific Conferences

EURNLP 2019 [11/10/2019] - European NLP Research summit in London, UK.

Data Natives [21/11/2019] - Data conference in Berlin, Germany.

ODSC Europe [19/11/2019] - The Open Data Science Conference in London, UK.

Khipu AI [11/11/2019] - Latin American Meeting in Artifical Intelligence in Montevideo, Uruguay.

Business Conferences

Predictive Analytics World [18/11/2019] - Conference for Business AI in Berlin, Germany.

Big Data LDN 2019 [13/11/2019] - Conference for strategy and tech on big data in London, UK.

MLJobs = Jobs & Careers

We showcase Machine Learning Engineering jobs (primarily in London for now) to help our community stay up to date with great opportunities that come up.

Leadership Opportunities

Algorithmia is hiring for a VP of Engineering in Seatle, USA
Fractal Labs is hiring for a VP of Engineering in London

Mid-level Opportunities

Seldon is hiring for a Senior Machine Learning Engineer in London
Proportunity is hiring for a Senior Machine Learning Engineer in London
Atlas ML is hiring for a Lead NLP Engineer in London
StreetBees is hiring for a Senior Data Scientist in London
Tractable is hiring for a Senior Deep Learning Engineer

Junior Opportunities

Migacore is hiring for a Machine Learning Engineer in London
Babylon Health is hiring for a Machine Learning Engineer in London