Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.


THE ML ENGINEER 🤖
Issue #49
 
 
This week in Issue #49:
 
 
Forward the email, or share the online version on 🐦 Twitter,  💼 Linkedin and  📕 Facebook!
 
If you would like to suggest articles, ideas, papers, libraries, jobs, events or provide feedback just hit reply or send us an email to a@ethical.institute! We have received a lot of great suggestions in the past, thank you very much for everyone's support!
 
 
 
Often machine learning models, and just systems in general, are monitored with very basic rules - e.g. send an alert if a metric drops under a certain threshold. However this can lead to a lot of false positives being flagged, creating too much noise, which could lead into ignoring these alerts (And real issues not being caught). The Seldon team has released a new open source machine learning library that focuses on outlier, adversarial and concept drift detection, which allows for smarter monitoring techniques. The package aims to cover both online and offline detectors for tabular data, text, images and time series. This package is built with production machine learning use-cases in mind, and has continuously updated integrations with the ML deployment framework Seldon Core.
 
 
 
As machine learning is adopted in more critical use-cases, the common phrase that tech startups have used "move fast and break things" becomes less desired. In this great article by Seldon Open Source Engineer Ryan Dawston, breaks down this AI Governance Dilemma. Ryan provides an introduction to the challenges of ML being deployed in critical use-cases, together with the different areas that should be taken into account, including outliers, concept drift, bias, privacy and other risks.
 
 
 
Neural word representations have proven useful in Natural Language Processing (NLP) tasks due to their ability to efficiently model complex semantic and syntactic word relationships. However, most techniques model only one representation per word, despite the fact that a single word can have multiple meanings or "senses". The ExplosionAI team, which is also behind SpaCy created a technique which they called sense2vec, which builds word embeddings in a similar way to word2vec, but also takes into account part-of-speech attributes for word tokens, which allow for meaning-aware vectors.
 
 
 
Large companies need to monitor various metrics of their applications and services in realtime. Microsoft has released a fascinating paper where they share some of their knowledge developing and maintaining a time-series anomaly detection service which helps customers to monitor the time-series continuously and alert for potential incidents on time. In this paper, they introduce the pipeline and algorithm of their anomaly detection service,which is designed to be accurate, efficient and general. The pipeline consists of three major modules, including data ingestion, exper-imentation platform and online compute. In this paper, the team also proposes a novel algorithmbased on Spectral Residual (SR) and Convolutional Neural Network(CNN).
 
 
 
Another great announcement from the LFAI this week - Pyro has released it's 1.0 version! Pyro is a universal probabilistic programming language (PPL) written in Python and supported by PyTorch on the backend. Pyro enables flexible and expressive deep probabilistic modeling, unifying the best of modern deep learning and Bayesian modeling. It is developed and maintained by Uber AI and community contributors.
 
 
 
 
 
 
The theme for this week's featured ML libraries is Adversarial Robustness. The four featured libraries this week are:
 
  • Alibi Detect - Open source library with algorithms for outlier detection, concept drift and anomaly detection, optimised for massive scale machine learning deployments
  • CleverHans - library for testing adversarial attacks / defenses maintained by some of the most important names in adversarial ML, namely Ian Goodfellow (ex-Google Brain, now Apple) and Nicolas Papernot (Google Brain). Comes with some nice tutorials
  • Foolbox - second biggest adversarial library. Has an even longer list of attacks - but no defenses or evaluation metrics. Geared more towards computer vision. Code easier to understand / modify than ART - also better for exploring blackbox attacks on surrogate models.
  • IBM Adversarial Robustness 360 Toolbox (ART) - at the time of writing this is the most complete off-the-shelf resource for testing adversarial attacks and defenses. It includes a library of 15 attacks, 10 empirical defenses, and some nice evaluation metrics. Neural networks only.
 
If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request
 
 
 
 
As AI systems become more prevalent in society, we face bigger and tougher societal challenges. We have seen a large number of resources that aim to takle thiese challenges in the form of AI Guidelines, Principles, Ethics Frameworks, etc, however there are so many resources it is hard to navigate. Because of this we started an Open Source initiative that aims to map the ecosystem to make it simpler to navigate. We will be showcasing three resources from our list so we can check them out every week. This week's resources are:
 
  • Oxford's Recommendations for AI Governance - A set of recommendations from Oxford's Future of Humanity institute which focus on the infrastructure and attributes required for efficient design, development, and research around the ongoing work building & implementing AI standards.
  • San Francisco City's Ethics & Algorithms Toolkit - A risk management framework for government leaders and staff who work with algorithms, providing a two part assessment process including an algorithmic assessment process, and a process to address the risks.
  • ISO/IEC's Standards for Artificial Intelligence - The ISO's initiative for Artificial Intelligence standards, which include a large set of subsequent standards ranging across Big Data, AI Terminology, Machine Learning frameworks, etc.
  • Linux Foundation AI Landscape - The official list of tools in the AI landscape curated by the Linux Foundation, which contains well maintained and used tools and frameworks.
 
If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request
 
 
 
We feature conferences that have core  ML tracks (primarily in Europe for now) to help our community stay up to date with great events coming up.
 
Technical & Scientific Conferences
 
 
 
  • Data Natives [21/11/2019] - Data conference in Berlin, Germany.
 
  • ODSC Europe [19/11/2019] - The Open Data Science Conference in  London, UK.
 
 
 
Business Conferences
 
 
  • Big Data LDN 2019 [13/11/2019] - Conference for strategy and tech on big data in London, UK.
 
 
About us
 
The Institute for Ethical AI & Machine Learning is a UK-based research centre that carries out world-class research into responsible machine learning systems.
 
Check out our website
 
© 2018 The Institute for Ethical AI & Machine Learning