The Institute for Ethical AI & Machine Learning

Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.

THE ML ENGINEER 🤖

Issue #58

This week in Issue #58:

Feature Stores for Machine Learning
Key AI & Data Trends for 2020
LF AI 2019 Year in Review
From local interpretability to global understanding
Sampling methods for imbalanced classes
Featured OSS Production ML Libraries
Awesome AI Guidelines to check out this week
+ more 🚀

Forward the email, or share the online version on 🐦 Twitter, 💼 Linkedin and 📕 Facebook!

If you would like to suggest articles, ideas, papers, libraries, jobs, events or provide feedback just hit reply or send us an email to a@ethical.institute! We have received a lot of great suggestions in the past, thank you very much for everyone's support!

Feature Stores for ML

Duplicated work in data science scales as the projects and teams scale. Feature stores are now seen as core part of the solution for re-usability, however there is still a lot of ambiguity on its definition, architecture and best practices. This site contains an excellent list of resources that map large part of the ecosystem to drive the conversation forward, including videos, articles and beyond.

Key AI & Data Trends for 2020

The Data Exchange podcast comes back this week with an excellent deep dive into the key AI, Machine Learning and data trends for 2020. In this episode they dive into types of machine learning, real life applications, infrastructure/tools, and other topics such as managing risks and trends to watch.

LF AI 2019 Year in Review

It has been an incredible journey at the LF AI since we became an organisational member, and we could not be more excited for the great leaps it has achieved, and more importantly what it has yet to achieve. This great post provides an insight on some of the achievements and updates from 2019. Massive shoutout especially to the core team for their great work driving this forward, here is to yet another great 2020.

From local to global XAI

Tree-based models have seen a steady increase in adoption in produciton use-cases, and with that adoption has also come demand for compliance and reduction of operational risks. This paper proposes a solution that improves the interpretability of tree-based models through three main contributions. Contributions like this are what furthers the area of interpretaibility in machine learning.

Sampling methods for imbalances

Machine learning techniques often fail or give misleadingly optimistic performance on classification datasets with an imbalanced class distribution. The reason is that many machine learning algorithms are designed to operate on classification data with an equal number of observations for each class. When this is not the case, algorithms can learn that very few examples are not important and can be ignored in order to achieve good performance. In this article, machine learning mastery dives into a set of practical sampling methods that can be used when facing imbalanced datasets.

OSS: Feature Engineering

We're excited to add a new section into our Production ML Libraries which focuses on Feature Stores. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. The four featured libraries this week are:

Hopsworks Feature Store - Offline/Online Feature Store for Machine Learning.
Feature Store for Machine Learning (FEAST) - Feast (Feature Store) is a tool for managing and serving machine learning features. Feast is the bridge between models and data.
Veri - Veri is a Feature Label Store. Feature Label store allows storing features as keys and labels as values. Querying values is only possible with knn using features. Veri also supports creating sub sample spaces of data by default.
Ivory - ivory defines a specification for how to store feature data and provides a set of tools for querying it. It does not provide any tooling for producing feature data in the first place. All ivory commands run as MapReduce jobs so it assumed that feature data is maintained on HDFS.

If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request!

OSS: Awesome AI Guidelines

As AI systems become more prevalent in society, we face bigger and tougher societal challenges. We have seen a large number of resources that aim to takle thiese challenges in the form of AI Guidelines, Principles, Ethics Frameworks, etc, however there are so many resources it is hard to navigate. Because of this we started an Open Source initiative that aims to map the ecosystem to make it simpler to navigate. We will be showcasing three resources from our list so we can check them out every week. This week's resources are:

UK Government's Data Ethics Workbook - A resource put together by the Department for Digital, Culture, Media and Sport (DCMS) which provides a set of questions that can be asked by practitioners in the public sector, which address each of the principles in their Data Ethics Framework Principles.
World Economic Forum's Guidelines for Procurement - The WEF has put together a set of guidelines for governments to be able to safely and reliably procure machine learning related systems, which has been trialled with the UK government.
San Francisco City's Ethics & Algorithms Toolkit - A risk management framework for government leaders and staff who work with algorithms, providing a two part assessment process including an algorithmic assessment process, and a process to address the risks.

If you know of any guidelines that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request!