The Institute for Ethical AI & Machine Learning

Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.

THE ML ENGINEER 🤖

Issue #59

This week in Issue #59:

Table Detection, Information Extraction with Deep Learning
Google's general AI conversational agent
TF-Encrypt and the state of privacy-preserving ML
Airflow's new Distributed Job Queueing System
Confidence models in financial research & practice
Featured OSS Production ML Libraries
Awesome AI Guidelines to check out this week
+ more 🚀

Forward the email, or share the online version on 🐦 Twitter, 💼 Linkedin and 📕 Facebook!

If you would like to suggest articles, ideas, papers, libraries, jobs, events or provide feedback just hit reply or send us an email to a@ethical.institute! We have received a lot of great suggestions in the past, thank you very much for everyone's support!

Table Detection & NLP with DL

The amount of data being collected is drastically increasing day-by-day with lots of applications, tools, and online platforms booming in the present technological era. To handle and access this humongous data productively, it’s necessary to develop valuable information extraction tools. One of the sub-areas that’s demanding attention in the Information Extraction field is the fetching and accessing of data from tabular forms.Table Extraction (TE) is the task of detecting and decomposing table information in a document. In this article they cover the motivations, techniques and solutions on how this can be achieved.

Towards general conv. agent

The current open-domain chatbots have a critical flaw — they often don’t make sense; somtimes they say inconsistencies, lack common sense and basic knowledge of the world. In this research, they present Mena, a 2.5 billion parameter end-to-end trained neural conversation model which can conduct conversations that are more sensible and specific than existing state of the art chatbots. New improvements are reflected through a human evaluation metrics proposed for open domain chatbots called sensibleness and specificity averave (SSA).

State of privacy preserving ML

Ben Lorica comes back this week with yet another great episode of the data exchange podcast, where he dives into conversation with Morten Dahl, research scientist at Dropout Labs, a startup building a platform and tools for privacy-preserving machine learning (and the person behind TF-Encrypt). In this conversation they dive into the current state of TF Encrypted, Federated learning (FL) and secure aggregation for FL, Privacy-preserving ML solutions, differential privacy, homomorphic encryption, and RISELab’s stack for coopetitive learning (MC2).

Distributed Delayed Job Queueing

Asynchronous background jobs can often dramatically improve the scalability of web applications by moving time-consuming, resource-intensive tasks to the background. These tasks are often prone to failures, and retrying mechanisms often make it even more expensive to operate applications with such jobs. Having a background queue helps the web servers handle incoming web requests promptly, and reduces the likelihood of performance issues that occur when requests become backlogged. At Airbnb, they built a job scheduling system called Dynein for very critical use cases. In this article, they walk through the history of job queuing systems at Airbnb, explain why they built Dynein, and describe how they were able to achieve its high scalability.

Applying confidence models

Financial models are at the mercy in model specifications, errors in model parameter estimates and errors resulting from the failure of a model to adapt to structural changes of an environment. Because of this trifecta of errors, it's important for dynamic models to quanitfy the uncertainty inherent in the financial estimates and predictions. This post they explore three types of errors in applying confidence intervals that are common in financial research and practice.

OSS: ETL & Batch Processing

The topic for this week's featured production machine learning libraries is ETL and Batch Processing. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. The four featured libraries this week are:

Apache Airflow - Data Pipeline framework built in Python, including scheduler, DAG definition and a UI for visualisation
Argo Workflows - Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD (Custom Resource Definition).
Luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs, handling dependency resolution, workflow management, visualisation, etc
Genie - Job orchestration engine to interface and trigger the execution of jobs from Hadoop-based systems

If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request!

OSS: Awesome AI Guidelines

As AI systems become more prevalent in society, we face bigger and tougher societal challenges. We have seen a large number of resources that aim to takle thiese challenges in the form of AI Guidelines, Principles, Ethics Frameworks, etc, however there are so many resources it is hard to navigate. Because of this we started an Open Source initiative that aims to map the ecosystem to make it simpler to navigate. We will be showcasingitg three resources from our list so we can check them out every week. This week's resources are:

IEEE's Ethically Aligned Design - A Vision for Prioritizing Human Wellbeing with Artificial Intelligence and Autonomous Systems that encourages technologists to prioritize ethical considerations in the creation of autonomous and intelligent technologies.
Montréal Declaration for a responsible development of artificial intelligence - ethical principles and values that promote the fundamental interests of people and group created as an initiative by Université de Montréal
PWC's Responsible AI - PWC has put together a survey and a set of principles that abstract some of the key areas they've identified for responsible AI.
Singapore Data Protection Govt Commission's AI Governance Principles - The Singapore government's Personal Data Protection Commission has put together a set of guiding principles towards data protection and human involvement in automated systems, and comes with a report that breaks down the guiding principles and motivations.

If you know of any guidelines that are not in the "Awesome AI Guidelines" list, please do give us a heads up or feel free to add a pull request!

This email was sent to

You received this email because you are registered with The Institute for Ethical AI & Machine Learning's newsletter "The Machine Learning Engineer"

Unsubscribe here

Sent by