Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.


THE ML ENGINEER 🤖
Issue #53
 
 
This week in Issue #53:
 
 
Forward the email, or share the online version on 🐦 Twitter,  💼 Linkedin and  📕 Facebook!
 
If you would like to suggest articles, ideas, papers, libraries, jobs, events or provide feedback just hit reply or send us an email to a@ethical.institute! We have received a lot of great suggestions in the past, thank you very much for everyone's support!
 
 
 
Chief Data Scientist Ben Lorica comes back with another fantastic podcast with the Data Exchange - this time with a conversation with Reza Zadeh on large scale & real time computer vision use cases, adversarial attacks, deepfakes, fairness, privacy, and security. In this edition they dive into 1) Challenges in building large-scale, real-time computer vision applications. 2) Robustness of computer vision applications (adversarial attacks, deepfakes). 3) Impact of computer vision technologies on society: security, privacy and surveillance.
 
 
 
Ray is an open-source system for scaling Python applications from single machines to large clusters. Dean Wampler from the newly announced company (founded by some of the core Ray team) has put together a great article that provides an intuitive understanding on Ray for distributed data processing.
 
 
 
In production data science use-cases, the challenge of enabling and managing scheduling of data processing tasks at scale becomes growingly complex. Apache Airflow has skyrocketed since its debut as a key tool to perform workflow management, and with the grow of cloud native / Kubernetes technologies, Airflow has been able to ride the wave by providing more integrated Kubernetes support. Zulily has put together a great overview of how they have been able to extend their Airflow production infrastructure in Kubernetes, together with lessons learned on the way.
 
 
 
It's year end again, and that means it's time for KDnuggets annual year end expert analysis and predictions. This year they posed the question: What were the main developments in AI, Data Science, Deep Learning, and Machine Learning in 2019, and what key trends do you expect in 2020? They brought together insights from renowned various experts in the field which has been made available in this article.
 
 
 
Machine learning mastery has put together a very comprehensible introduction to "imbalanced classification". This tutorial covers three key areas: 1) Imbalanced classification is the problem of classification when there is an unequal distribution of classes in the training dataset. 2) The imbalance in the class distribution may vary, but a severe imbalance is more challenging to model and may require specialized techniques. 3) Many real-world classification problems have an imbalanced class distribution, such as fraud detection, spam detection, and churn prediction.
 
 
 
 
 
 
The theme for this week's featured ML libraries is Data Science Notebooks. The four featured libraries this week are:
 
  • ML Workspace - All-in-one web IDE for machine learning and data science. Combines Jupyter, VS Code, Tensorflow, and many other tools/libraries into one Docker image.
  • Polynote - Polynote is an experimental polyglot notebook environment. Currently, it supports Scala and Python (with or without Spark), SQL, and Vega.
  • Stencila - Stencila is a platform for creating, collaborating on, and sharing data driven content. Content that is transparent and reproducible.
  • RMarkdown - The rmarkdown package is a next generation implementation of R Markdown based on Pandoc.
 
If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request
 
 
 
 
As AI systems become more prevalent in society, we face bigger and tougher societal challenges. We have seen a large number of resources that aim to takle thiese challenges in the form of AI Guidelines, Principles, Ethics Frameworks, etc, however there are so many resources it is hard to navigate. Because of this we started an Open Source initiative that aims to map the ecosystem to make it simpler to navigate. We will be showcasing three resources from our list so we can check them out every week. This week's resources are:
 
  • Oxford's Recommendations for AI Governance - A set of recommendations from Oxford's Future of Humanity institute which focus on the infrastructure and attributes required for efficient design, development, and research around the ongoing work building & implementing AI standards.
  • San Francisco City's Ethics & Algorithms Toolkit - A risk management framework for government leaders and staff who work with algorithms, providing a two part assessment process including an algorithmic assessment process, and a process to address the risks.
  • ISO/IEC's Standards for Artificial Intelligence - The ISO's initiative for Artificial Intelligence standards, which include a large set of subsequent standards ranging across Big Data, AI Terminology, Machine Learning frameworks, etc.
  • Linux Foundation AI Landscape - The official list of tools in the AI landscape curated by the Linux Foundation, which contains well maintained and used tools and frameworks.
 
If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request
 
 
About us
 
The Institute for Ethical AI & Machine Learning is a UK-based research centre that carries out world-class research into responsible machine learning systems.
 
Check out our website
 
© 2018 The Institute for Ethical AI & Machine Learning