Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.

Issue #59
This week in Issue #59:
Forward the email, or share the online version on 🐦 Twitter,  💼 Linkedin and  📕 Facebook!
If you would like to suggest articles, ideas, papers, libraries, jobs, events or provide feedback just hit reply or send us an email to! We have received a lot of great suggestions in the past, thank you very much for everyone's support!
The amount of data being collected is drastically increasing day-by-day with lots of applications, tools, and online platforms booming in the present technological era. To handle and access this humongous data productively, it’s necessary to develop valuable information extraction tools. One of the sub-areas that’s demanding attention in the Information Extraction field is the fetching and accessing of data from tabular forms.Table Extraction (TE) is the task of detecting and decomposing table information in a document. In this article they cover the motivations, techniques and solutions on how this can be achieved.
The current open-domain chatbots have a critical flaw — they often don’t make sense; somtimes they say inconsistencies, lack common sense and basic knowledge of the world. In this research, they present Mena, a 2.5 billion parameter end-to-end trained neural conversation model which can conduct conversations that are more sensible and specific than existing state of the art chatbots. New improvements are reflected through a human evaluation metrics proposed for open domain chatbots called sensibleness and specificity averave (SSA).
Ben Lorica comes back this week with yet another great episode of the data exchange podcast, where he dives into conversation with Morten Dahl, research scientist at Dropout Labs, a startup building a platform and tools for privacy-preserving machine learning (and the person behind TF-Encrypt). In this conversation they dive into the current state of TF Encrypted, Federated learning (FL) and secure aggregation for FL, Privacy-preserving ML solutions,  differential privacy, homomorphic encryption, and RISELab’s stack for coopetitive learning (MC2).
Asynchronous background jobs can often dramatically improve the scalability of web applications by moving time-consuming, resource-intensive tasks to the background. These tasks are often prone to failures, and retrying mechanisms often make it even more expensive to operate applications with such jobs. Having a background queue helps the web servers handle incoming web requests promptly, and reduces the likelihood of performance issues that occur when requests become backlogged. At Airbnb, they built a job scheduling system called Dynein for very critical use cases. In this article, they walk through the history of job queuing systems at Airbnb, explain why they built Dynein, and describe how they were able to achieve its high scalability.
Financial models are at the mercy in model specifications, errors in model parameter estimates and errors resulting from the failure of a model to adapt to structural changes of an environment. Because of this trifecta of errors, it's important for dynamic models to quanitfy the uncertainty inherent in the financial estimates and predictions. This post they explore three types of errors in applying confidence intervals that are common in financial research and practice.
The topic for this week's featured production machine learning libraries is ETL and Batch Processing. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. The four featured libraries this week are:
  • Apache Airflow - Data Pipeline framework built in Python, including scheduler, DAG definition and a UI for visualisation
  • Argo Workflows - Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD (Custom Resource Definition).
  • Luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs, handling dependency resolution, workflow management, visualisation, etc
  • Genie - Job orchestration engine to interface and trigger the execution of jobs from Hadoop-based systems
If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request
As AI systems become more prevalent in society, we face bigger and tougher societal challenges. We have seen a large number of resources that aim to takle thiese challenges in the form of AI Guidelines, Principles, Ethics Frameworks, etc, however there are so many resources it is hard to navigate. Because of this we started an Open Source initiative that aims to map the ecosystem to make it simpler to navigate. We will be showcasingitg three resources from our list so we can check them out every week. This week's resources are:
If you know of any guidelines that are not in the "Awesome AI Guidelines" list, please do give us a heads up or feel free to add a pull request
This email was sent to {{ contact.EMAIL }}
You received this email because you are registered with The Institute for Ethical AI & Machine Learning's newsletter "The Machine Learning Engineer"
Sent by
© 2018 The Institute for Ethical AI & Machine Learning