Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.

Issue #58
This week in Issue #58:
Forward the email, or share the online version on 🐦 Twitter,  💼 Linkedin and  📕 Facebook!
If you would like to suggest articles, ideas, papers, libraries, jobs, events or provide feedback just hit reply or send us an email to! We have received a lot of great suggestions in the past, thank you very much for everyone's support!
Duplicated work in data science scales as the projects and teams scale. Feature stores are now seen as core part of the solution for re-usability, however there is still a lot of ambiguity on its definition, architecture and best practices. This site contains an excellent list of resources that map large part of the ecosystem to drive the conversation forward, including videos, articles and beyond.
The Data Exchange podcast comes back this week with an excellent deep dive into the key AI, Machine Learning and data trends for 2020. In this episode they dive into types of machine learning, real life applications, infrastructure/tools, and other topics such as managing risks and trends to watch.
It has been an incredible journey at the LF AI since we became an organisational member, and we could not be more excited for the great leaps it has achieved, and more importantly what it has yet to achieve. This great post provides an insight on some of the achievements and updates from 2019. Massive shoutout especially to the core team for their great work driving this forward, here is to yet another great 2020.
Tree-based models have seen a steady increase in adoption in produciton use-cases, and with that adoption has also come demand for compliance and reduction of operational risks. This paper proposes a solution that improves the interpretability of tree-based models through three main contributions. Contributions like this are what furthers the area of interpretaibility in machine learning.
Machine learning techniques often fail or give misleadingly optimistic performance on classification datasets with an imbalanced class distribution. The reason is that many machine learning algorithms are designed to operate on classification data with an equal number of observations for each class. When this is not the case, algorithms can learn that very few examples are not important and can be ignored in order to achieve good performance. In this article, machine learning mastery dives into a set of practical sampling methods that can be used when facing imbalanced datasets.
OSS: Feature Engineering
We're excited to add a new section into our Production ML Libraries which focuses on Feature Stores. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. The four featured libraries this week are:
  • Hopsworks Feature Store - Offline/Online Feature Store for Machine Learning.
  • Feature Store for Machine Learning (FEAST) - Feast (Feature Store) is a tool for managing and serving machine learning features. Feast is the bridge between models and data.
  • Veri - Veri is a Feature Label Store. Feature Label store allows storing features as keys and labels as values. Querying values is only possible with knn using features. Veri also supports creating sub sample spaces of data by default.
  • Ivory - ivory defines a specification for how to store feature data and provides a set of tools for querying it. It does not provide any tooling for producing feature data in the first place. All ivory commands run as MapReduce jobs so it assumed that feature data is maintained on HDFS.
If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request
As AI systems become more prevalent in society, we face bigger and tougher societal challenges. We have seen a large number of resources that aim to takle thiese challenges in the form of AI Guidelines, Principles, Ethics Frameworks, etc, however there are so many resources it is hard to navigate. Because of this we started an Open Source initiative that aims to map the ecosystem to make it simpler to navigate. We will be showcasing three resources from our list so we can check them out every week. This week's resources are:
If you know of any guidelines that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request
© 2018 The Institute for Ethical AI & Machine Learning