The Institute for Ethical AI & Machine Learning

Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.

THE ML ENGINEER 🤖

Issue #60

This week in Issue #60:

An MLOps Framework for Machine Learning at Scale
Why ML Degrades in Production
Kaggle Kernel on Interpretability
Building Domain Specific NLP
Bayesian Product Raking at Wayfair
Featured OSS Production ML Libraries
Awesome AI Guidelines to check out this week
+ more 🚀

Forward the email, or share the online version on 🐦 Twitter, 💼 Linkedin and 📕 Facebook!

If you would like to suggest articles, ideas, papers, libraries, jobs, events or provide feedback just hit reply or send us an email to a@ethical.institute! We have received a lot of great suggestions in the past, thank you very much for everyone's support!

Hands on MLOps for AI at Scale

Production machine learning systems bring fundamentally different challenges to those in traditional software engineering. Last week in our talk at FOSDEM 2020 we provided a practical CI/CD framework to scale production machine learning at massive scale. In this talk we define the concept of MLOps, cover some of the challenges that production machine learning brings to the table, as well as a hands on example using Seldon Core and Jenkins X to build machine learning pipelines that can scale to hundreds of models.

Why ML Degrades in Production

The lifecycle of a machine learning model only begins when it's deployed. Degrading performance is a big challenge that requires the right processes and infrastructure to ensure it's monitored so that any business impact that would arise from skewed predictions due to drift in performance is avoided.

Kaggle Kernel on Interpretability

Machine learning interpretability is key in high risk use-cases - there are large number of techniques available, each with their own tradeoffs, and it's important to make sure the tradeoffs of these are understood. This Kaggle Kernel, covers a high level overview of the importance of machine learning interpretability, together with hands on examples around permutation importance, partial dependence plots and SHAP.

Building Domain Specific NLP

In this episode of the Data Exchange, Chief Scientist Ben Lorica speaks with David Talby, co-creator of Spark NLP, an open source, highly scalable, production grade natural language processing (NLP) library. Spark NLP has become one of the more popular NLP libraries and is available on PyPI, Conda, Maven, and Spark Packages. With recent advances in research in large-scale natural language models, there is strong interest in domain specific natural language applications - in this podcast they dive into some of these.

Bayesian Product Raking Wayfair

Wayfair has a huge catalog with over 14 million items with very broad categories. However, the large size of our product catalog also makes it hard for customers to find the perfect item among all of the possible options. In this post wayfair introduces their new Bayesian system which was developed to (1) identify these products and (2) present them to their customers.

OSS: ETL & Batch Processing

The topic for this week's featured production machine learning libraries is ETL and Batch Processing. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. The four featured libraries this week are:

Apache Airflow - Data Pipeline framework built in Python, including scheduler, DAG definition and a UI for visualisation
Argo Workflows - Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD (Custom Resource Definition).
Luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs, handling dependency resolution, workflow management, visualisation, etc
Genie - Job orchestration engine to interface and trigger the execution of jobs from Hadoop-based systems

If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request!

OSS: Awesome AI Guidelines

As AI systems become more prevalent in society, we face bigger and tougher societal challenges. We have seen a large number of resources that aim to takle thiese challenges in the form of AI Guidelines, Principles, Ethics Frameworks, etc, however there are so many resources it is hard to navigate. Because of this we started an Open Source initiative that aims to map the ecosystem to make it simpler to navigate. We will be showcasingitg three resources from our list so we can check them out every week. This week's resources are:

IEEE's Ethically Aligned Design - A Vision for Prioritizing Human Wellbeing with Artificial Intelligence and Autonomous Systems that encourages technologists to prioritize ethical considerations in the creation of autonomous and intelligent technologies.
Montréal Declaration for a responsible development of artificial intelligence - ethical principles and values that promote the fundamental interests of people and group created as an initiative by Université de Montréal
PWC's Responsible AI - PWC has put together a survey and a set of principles that abstract some of the key areas they've identified for responsible AI.
Singapore Data Protection Govt Commission's AI Governance Principles - The Singapore government's Personal Data Protection Commission has put together a set of guiding principles towards data protection and human involvement in automated systems, and comes with a report that breaks down the guiding principles and motivations.

If you know of any guidelines that are not in the "Awesome AI Guidelines" list, please do give us a heads up or feel free to add a pull request!