The Institute for Ethical AI & Machine Learning

Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.

THE ML ENGINEER 🤖

Issue #41

This week in Issue #41:

One data engine to rule them all at Google
Tackling data processing at scale with Presto foundation
GitHub releases the ImageNet for Code
Six lessons learned debugging a scaling problem at Gitlab
Reimagining Experimentation Analysis at Netflix
Open source differential privacy libraries
AI conferences
ML jobs
+ more 🚀

Forward the email, or share the online version on 🐦 Twitter, 💼 Linkedin and 📕 Facebook!

If you would like to suggest articles, ideas, papers, libraries, jobs, events or provide feedback just hit reply or send us an email to a@ethical.institute! We have received a lot of great suggestions in the past, thank you very much for everyone's support!

One data engine to rule them all

The world of data processing is full of different engines with nuanced differences. Google already has a numerous set of data processing engines including Dremel, Mesa, Photon , F1, PowerDrill and Spanner - so, why did they need yet another data processing engine? Apparently because they felt they had too many data processing systems, and wanted to unify them all. Because of this, Google released a fascinating new paper that introduces their new SQL Engine called Porcella, which aims to unify serving and analytical data at youtube. This blog post provides a great insight on the architecture, objectives and use-cases of this new engine.

The ImageNet for Code

GitHub and Weights & Biases have collaborated to put together a fantastic contribution to the machine learning ecosystem that could bring fantastic innovations to the software engineering industry itself and trigger more similar competitions. This consists of a massively large dataset containing 6 million functions, 2 million of them documented, from open source projects on GitHub in 6 languages (Go, Java, Javascript, PHP, Python and Ruby) with the objective of improving semantic code search. They are also launching the CodeSearchNet challenge, which is a benchmark that will track and compare models trained on the CodeSearchNet dataset.

Tackling data processing at scale

The Linux Foundation is leading yet another fantastic initiative: Facebook, Uber, Twitter and Alibaba join forces to form the "Presto Foundation" to tackle distributed data processing at scale. This is great news as the neutral governance will enable the members to contribute to the project to tackle some of the bigger challenges dealing with massively distributed data processing.

Wisdom from debugging at scale

As systems grow in complexity, the approaches towards debugging issues also become more complex as the issues can be on code workflows, but also in data inconsistencies, network issues, infrastructure problems and beyond. GitLab has put together a fantastic deep dive on how they were able to resolve one of their issues at massive scale, together with a set of lessons they learned from it.

Netflix reimagining experiments

As your systems and teams become larger and more complex, the need not only to experiment efficiently but to be able to track, share and reproduce experiments become more critical. Netflix has put together a great post where they outline ther approach to re-thinking the way they track and manage experiments internally. Traditionally they have been using ABlaze, which is their centralised A/B testing platform, but now with their new platform they are able to perfectly recreate analyses on notebooks.

OSS: Privacy Preserving ML

The theme for this week's featured ML libraries is Privacy Preserving Machine Learning libraries, and we're happy to share brand new libraries into that section. The four featured libraries this week are:

Intel Homomorphic Encryption Backend - The Intel HE transformer for nGraph is a Homomorphic Encryption (HE) backend to the Intel nGraph Compiler, Intel's graph compiler for Artificial Neural Networks

PySyft - A Python library for secure, private Deep Learning. PySyft decouples private data from model training, using Multi-Party Computation (MPC) within PyTorch

Microsoft SEAL - Microsoft SEAL is an easy-to-use open-source (MIT licensed) homomorphic encryption library developed by the Cryptography Research group at Microsoft
Tensorflow Privacy - A Python library that includes implementations of TensorFlow optimizers for training machine learning models with differential privacy

If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request!

MLConf = Conferences & Events

We feature conferences that have core ML tracks (primarily in Europe for now) to help our community stay up to date with great events coming up.

Technical & Scientific Conferences

EURNLP 2019 [11/10/2019] - European NLP Research summit in London, UK.

Data Natives [21/11/2019] - Data conference in Berlin, Germany.

ODSC Europe [19/11/2019] - The Open Data Science Conference in London, UK.

Khipu AI [11/11/2019] - Latin American Meeting in Artifical Intelligence in Montevideo, Uruguay.

Business Conferences

Predictive Analytics World [18/11/2019] - Conference for Business AI in Berlin, Germany.

Big Data LDN 2019 [13/11/2019] - Conference for strategy and tech on big data in London, UK.

MLJobs = Jobs & Careers

We showcase Machine Learning Engineering jobs (primarily in London for now) to help our community stay up to date with great opportunities that come up.

Leadership Opportunities

Algorithmia is hiring for a VP of Engineering in Seatle, USA
Fractal Labs is hiring for a VP of Engineering in London

Mid-level Opportunities

Seldon is hiring for a Senior Machine Learning Engineer in London
Proportunity is hiring for a Senior Machine Learning Engineer in London
Atlas ML is hiring for a Lead NLP Engineer in London
StreetBees is hiring for a Senior Data Scientist in London
Tractable is hiring for a Senior Deep Learning Engineer

Junior Opportunities

Migacore is hiring for a Machine Learning Engineer in London
Babylon Health is hiring for a Machine Learning Engineer in London

This email was sent to

You received this email because you are registered with The Institute for Ethical AI & Machine Learning's newsletter "The Machine Learning Engineer"

Unsubscribe here

Sent by