The Institute for Ethical AI & Machine Learning

Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.

THE ML ENGINEER 🤖

Issue #55

This week in Issue #55:

Machine Learning System Design
Machine Learning Interviews
A Deep Dive into Online Learning
Unsupervised NLU via GPT-2
Open Source Business Models
Featured OSS Production ML Libraries
Awesome AI Guidelines to check out this week
AI conferences
+ more 🚀

Forward the email, or share the online version on 🐦 Twitter, 💼 Linkedin and 📕 Facebook!

If you would like to suggest articles, ideas, papers, libraries, jobs, events or provide feedback just hit reply or send us an email to a@ethical.institute! We have received a lot of great suggestions in the past, thank you very much for everyone's support!

Machine Learning System Design

With the rise of large scale machine learning applications, it is becoming increasingly critical for practitioners to learn the best practices in machine learning system design. This great booklet covers four may steps of desingin machine learning systems, including 1) project setup, 2) data pipeline, 3) modeling, and 4) serving. The booklet itself also contains 27 open-minded machine learning system design questions that might come up in machine learning interviews.

Machine Learning Interviews

As the role of machine learning engineer becomes more prominent in industry, more useful content is contributed by the community to define the role, together with the best practices, and even advise on job interviews. This presentation by Machine Learning Engineer Chip Huyen provides great insight on the role of the MLE, together with advice on how to best approach machine learning interviews.

A Deep Dive into Online Learning

A fantastic resource that provides a very comprehensible introduction to online learning, which comes together with a set of lecture notes from Boston University's “Introduction to Online Learning” course. This first lecture provides an initial insight on the topic, with a strong technical foundation as well as an exercise to put the learnings into practice.

Unsupervised NLU via GPT-2

Amazon Applied Scientist Rakesh Chada has put together a great post that showcases the power of GPT-2. The language model GPT-2 from OpenAI is one of the most coherent generative models for text out there. While its generation capabilities are impressive, it’s ability to zero-shot perform some of the Natural Language Understanding (NLU) tasks seems even more fascinating to Rakesh. In this blog post, some of those capabilities are highlighted as well as a deep dive on one such fun use-case of converting singular nouns in english to their plural counterparts (and vice-versa).

Open Source Business Models

The open source software (OSS) movement has created some of our most important and widely used technologies, including operating systems, web browsers, databases and (of course) machine learning. Our world would not function, or at least not function as well, without open source software. In this podcast, Peter Levene shares some of his experience working with open source as a developer, entrepreneur and investor around business models for open source projects.

OSS: Data Stream Processing

The theme for this week's featured ML libraries is Data Stream Processing. The four featured libraries this week are:

Apache Flink - Open source stream processing framework with powerful stream and batch processing capabilities.
Faust - Streaming library built on top of Python's Asyncio library using the async kafka client inspired by the kafka streaming library.

Kafka Streams - Kafka client library for buliding applications and microservices where the input and output are stored in kafka clusters
Spark Streaming - Micro-batch processing for streams using the apache spark framework as a backend supporting stateful exactly-once semantics

If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request!

OSS: Awesome AI Guidelines

As AI systems become more prevalent in society, we face bigger and tougher societal challenges. We have seen a large number of resources that aim to takle thiese challenges in the form of AI Guidelines, Principles, Ethics Frameworks, etc, however there are so many resources it is hard to navigate. Because of this we started an Open Source initiative that aims to map the ecosystem to make it simpler to navigate. We will be showcasing three resources from our list so we can check them out every week. This week's resources are:

UK Government's Data Ethics Workbook - A resource put together by the Department for Digital, Culture, Media and Sport (DCMS) which provides a set of questions that can be asked by practitioners in the public sector, which address each of the principles in their Data Ethics Framework Principles.
World Economic Forum's Guidelines for Procurement - The WEF has put together a set of guidelines for governments to be able to safely and reliably procure machine learning related systems, which has been trialled with the UK government.
San Francisco City's Ethics & Algorithms Toolkit - A risk management framework for government leaders and staff who work with algorithms, providing a two part assessment process including an algorithmic assessment process, and a process to address the risks.

If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request!