Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.

Issue #56
This week in Issue #56:
Forward the email, or share the online version on 🐦 Twitter,  💼 Linkedin and  📕 Facebook!
If you would like to suggest articles, ideas, papers, libraries, jobs, events or provide feedback just hit reply or send us an email to! We have received a lot of great suggestions in the past, thank you very much for everyone's support!
Chief Data Scientist Ben Lorica comes back with another great podcast on The Data Exchange Podcast in conversation with Rajat Monga, one of the founding members of the TensorFlow Engineering team. Up until recently Rajat was the engineering manager for TensorFlow at Google. In this podcast they dive into TFX, a production scale ML platform based on Tensorflow, they talk about Multi-Level Intermediate Representation (MLIR), Deep Learning and the state of machine learning infrastructure.
Re-work has put together a Women in AI list of the year, which focuses on individuals that have spearheaded or taken part in great research in 2019, and therefore deserve recognition.
People give massive amounts of their personal data to companies every day and these data are used to generate tremendous business values. Some economists and politicians argue that based on value of data people there are situations where paid transactions should take place. Furthermore in the context of organisations holding data, this data has both a value and a risk that is currently ambiguous to quantify. This artcle discusses methods proposed in Bekeley papers that attempt to answer this question in the ML context.
The discussion of ethics in AI has become more critical as more applications make their way into production environments that affect the real world. We're organising a London meetup on January 24th covering an introduction to AI, where HATLAB Deputy Director James Kingston will help us get our bearings by taking us on a survey of an AI Ethics Landscape, followed by an open discussion. Come join us!
Most machine learning models are trained using data from files. Logical Clocks Co-Founder James Dowling has put toghether this guide to the popular file formats used in open source frameworks for machine learning in Python, including TensorFlow/Keras, PyTorch, Scikit-Learn, and PySpark. This post also describes how a Feature Store can make the Data Scientist’s life easier by generating training/test data in a file format of choice on a file system of choice.
OSS: Feature Engineering
We're excited to add a new section into our Production ML Libraries which focuses on Feature Stores. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. The four featured libraries this week are:
  • Hopsworks Feature Store - Offline/Online Feature Store for Machine Learning.
  • Feature Store for Machine Learning (FEAST) - Feast (Feature Store) is a tool for managing and serving machine learning features. Feast is the bridge between models and data.
  • Veri - Veri is a Feature Label Store. Feature Label store allows storing features as keys and labels as values. Querying values is only possible with knn using features. Veri also supports creating sub sample spaces of data by default.
  • Ivory - ivory defines a specification for how to store feature data and provides a set of tools for querying it. It does not provide any tooling for producing feature data in the first place. All ivory commands run as MapReduce jobs so it assumed that feature data is maintained on HDFS.
If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request
As AI systems become more prevalent in society, we face bigger and tougher societal challenges. We have seen a large number of resources that aim to takle thiese challenges in the form of AI Guidelines, Principles, Ethics Frameworks, etc, however there are so many resources it is hard to navigate. Because of this we started an Open Source initiative that aims to map the ecosystem to make it simpler to navigate. We will be showcasing three resources from our list so we can check them out every week. This week's resources are:
If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request
© 2018 The Institute for Ethical AI & Machine Learning