Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.

Issue #41
This week in Issue #41:
Forward the email, or share the online version on 🐦 Twitter,  💼 Linkedin and  📕 Facebook!
If you would like to suggest articles, ideas, papers, libraries, jobs, events or provide feedback just hit reply or send us an email to! We have received a lot of great suggestions in the past, thank you very much for everyone's support!
The world of data processing is full of different engines with nuanced differences. Google already has a numerous set of data processing engines including Dremel, Mesa, Photon , F1, PowerDrill and Spanner - so, why did they need yet another data processing engine? Apparently because they felt they had too many data processing systems, and wanted to unify them all. Because of this, Google released a fascinating new paper that introduces their new SQL Engine called Porcella, which aims to unify serving and analytical data at youtube. This blog post provides a great insight on the architecture, objectives and use-cases of this new engine.
GitHub and Weights & Biases have collaborated to put together a fantastic contribution to the machine learning ecosystem that could bring fantastic innovations to the software engineering industry itself and trigger more similar competitions. This consists of a massively large dataset containing 6 million functions, 2 million of them documented, from open source projects on GitHub in 6 languages (Go, Java, Javascript, PHP, Python and Ruby) with the objective of improving semantic code search. They are also launching the CodeSearchNet challenge, which is a benchmark that will track and compare models trained on the CodeSearchNet dataset.
The Linux Foundation is leading yet another fantastic initiative: Facebook, Uber, Twitter and Alibaba join forces to form the "Presto Foundation" to tackle distributed data processing at scale. This is great news as the neutral governance will enable the members to contribute to the project to tackle some of the bigger challenges dealing with massively distributed data processing.
As systems grow in complexity, the approaches towards debugging issues also become more complex as the issues can be on code workflows, but also in data inconsistencies, network issues, infrastructure problems and beyond. GitLab has put together a fantastic deep dive on how they were able to resolve one of their issues at massive scale, together with a set of lessons they learned from it.
As your systems and teams become larger and more complex, the need not only to experiment efficiently but to be able to track, share and reproduce experiments become more critical. Netflix has put together a great post where they outline ther approach to re-thinking the way they track and manage experiments internally. Traditionally they have been using ABlaze, which is their centralised A/B testing platform, but now with their new platform they are able to perfectly recreate analyses on notebooks.
The theme for this week's featured ML libraries is Privacy Preserving Machine Learning libraries, and we're happy to share brand new libraries into that section. The four featured libraries this week are:
  • Intel Homomorphic Encryption Backend - The Intel HE transformer for nGraph is a Homomorphic Encryption (HE) backend to the Intel nGraph Compiler, Intel's graph compiler for Artificial Neural Networks
  • PySyft - A Python library for secure, private Deep Learning. PySyft decouples private data from model training, using Multi-Party Computation (MPC) within PyTorch
  • Microsoft SEAL - Microsoft SEAL is an easy-to-use open-source (MIT licensed) homomorphic encryption library developed by the Cryptography Research group at Microsoft
  • Tensorflow Privacy - A Python library that includes implementations of TensorFlow optimizers for training machine learning models with differential privacy
If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request
We feature conferences that have core  ML tracks (primarily in Europe for now) to help our community stay up to date with great events coming up.
Technical & Scientific Conferences
  • Data Natives [21/11/2019] - Data conference in Berlin, Germany.
  • ODSC Europe [19/11/2019] - The Open Data Science Conference in  London, UK.
Business Conferences
  • Big Data LDN 2019 [13/11/2019] - Conference for strategy and tech on big data in London, UK.
We showcase Machine Learning Engineering jobs (primarily in London for now) to help our community stay up to date with great opportunities that come up.
Leadership Opportunities
Mid-level Opportunities
Junior Opportunities
This email was sent to {{ contact.EMAIL }}
You received this email because you are registered with The Institute for Ethical AI & Machine Learning's newsletter "The Machine Learning Engineer"
Sent by
© 2018 The Institute for Ethical AI & Machine Learning