Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.


THE ML ENGINEER 🤖
Issue #15
 
This week in Issue #15:
The NLP of human noises, the challenges of debiasing AI, feature visualisations, the GAN stroke of genius, essential NLP tools & tips, AI comedy generated by humans, privacy preserving ML libraries, upcoming AI conferences, new Machine Learning jobs and more 🚀.
 
Support the ML Engineer!
Forward the email, or share the online version on 🐦 Twitter,  💼 Linkedin and  📕 Facebook!
 
If you would like to suggest articles, ideas, papers, libraries, jobs, events or provide feedback just hit reply or send us an email to a@ethical.institute! We have received a lot of great suggestions in the past, thank you very much for everyone's support!
 
 
 
 
There has been a lot of very interesting research in the area of algorithmic bias. This paper is no exception - a great insight on how some of the debiasing methods propsed in previous research only cover up the systematic gender biases identified as opposed to actually removing them. Very recommended (relatively short) read: "Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them".
 
 
Very interesting piece of research that goes beyond words. This paper dives into the noises the humans vocalise during conversations. These include all the "hmm"s, "umm"s, "ehem"s and beyond. What's better is that they actually made available a really interesting interactive visualisation which allows us to see how these sounds are clustered together. Try it out by hovering with your mouse (you may want to use headphones or lower the volume of your laptop).
 
 
This interesting post goes beyond feature importance and introduces the concept of "feature visualisation". As they put it, feature visualisation allows us to wear "lenses" to look through the eyes of the network. This is great as it promises to help us answer the question of "What have these networks learned that allows them to classify images so well?". This area could complement metrics such as feature importance and introduce explainability into models that could then allow domain experts to interpret the reasoning behind complex networks.
 
 
NVIDIA comes back with a very interesting piece of research. They use a deep neural network to convert doodles into photorealistic images. This could be a great tool for graphic designers (and other creative jobs), allowing experts to prototype and experiment faster before diving into their craft. In their video they actually show how this tool can be used. We may finally be able to keep up with the legendary tutorials of Bob Ross thanks to these type of tools.
 
 
Great post which outlines key fundamentals in NLP. The post includes key methods such as tokenisation, bag of words, stop words removal, stemming, lemmatization and topic modelling. For the curious ones that would be keen to dive deeper, check out the PyTorch tutorial we shared a few weeks ago which provides examples on Neural Machine Translation, Question-Answering Matching, News Category Classification and Movie Rating Classification.
 
 
Edinburgh AI NLP researcher Naomi Shaphra delivers a comedy sketch where she covers topics around her research adventures, including funding, sources, projects and beyond. The internet has seen great comedy sketches diving into programming (such as this Javascript talk), perhaps comedy in technical fields will become a bigger thing. If anyone in our community knows of any events of these kind we'd be happy to feature them in our upcoming events section below.
 
 
 
 
We are excited to see the Awesome MLOps list growing to over 300 stars now! Thanks to everyone for your support! This week's edition is focused on new libraries on Privacy Preserving Machine Learning which fall on our Responsible ML Principle #6. The four featured libraries this week are:
 
  • Tensorflow Privacy - A Python library that includes implementations of TensorFlow optimizers for training machine learning models with differential privacy.
  • TF-Encrypted - A Python library built on top of TensorFlow for researchers and practitioners to experiment with privacy-preserving machine learning.
  • PySyft - A Python library for secure, private Deep Learning. PySyft decouples private data from model training, using Multi-Party Computation (MPC) within PyTorch.
  • Uber SQL Differencial Privacy - Uber's open source framework that enforces differential privacy for general-purpose SQL queries.
 
If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request
 
 
 
We feature conferences that have core  ML tracks (primarily in Europe for now) to help our community stay up to date with great events coming up.
 
Technical Conferences
 
  • DataFest19 [11/03/2019] - Two week festival of Data Innovation hosted across Scotland, UK.
 
 
  • AI Conference Beijing [18/06/2019] - O'Reilly's signature applied AI conference in Asia in Beijing, China.
 
 
  • Data Natives [21/11/2019] - Data conference in Berlin, Germany.
 
  • ODSC Europe [19/11/2019] - The Open Data Science Conference in  London, UK.
 
 
Business Conferences
 
 
 
  • Big Data LDN 2019 [13/11/2019] - Conference for strategy and tech on big data in London, UK.
 
 
 
We showcase Machine Learning Engineering jobs (primarily in London for now) to help our community stay up to date with great opportunities that come up. It seems that the demand for data scientists continues to rise!
 
Junior Opportunities
 
 
Mid-level Opportunities
 
Leadership Opportunities
 
 
 
© 2018 The Institute for Ethical AI & Machine Learning