Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.


THE ML ENGINEER 🤖
Issue #18
 
 
This week in Issue #18:
Reducing bias in bios, data augmentation for images, common statistical tests, AutoML with code gen, strata san francisco, stackoverflow dev survey, AutoML featured libraries, upcoming AI conferences, new Machine Learning jobs and more 🚀.
 
Support the ML Engineer!
Forward the email, or share the online version on 🐦 Twitter,  💼 Linkedin and  📕 Facebook!
 
If you would like to suggest articles, ideas, papers, libraries, jobs, events or provide feedback just hit reply or send us an email to a@ethical.institute! We have received a lot of great suggestions in the past, thank you very much for everyone's support!
 
 
 
 
Current methods proposed to reduce bias require access to protected features, which may not always be available or may not be legal to use them. This research paper proposes a method for discouraging correlation between the predicted probability of an individual's true occupation and a word embedding of their name. This method leverages the societal biases encoded in word embeddings, eliminating the need for access to protected features. The paper shows great results reducing bias, and was given the "Best Paper Award" at the NAACL-HLT 2019.
 
 
Fantastic blog post introducing multiple methods for data augmentation when using images. Data agumentation is a process that helps generate more data for training purposes, and it's often used when facing imbalanced datasets for example. The most common example is upsampling, which basically copies datapoints to increase the number of examples. This article however dives deeper, covering various techniques including image shifts, image flips, image rotations, brightess and zoom.
 
 
Teaching teachers to teach stats teachings in a teachable way. This post argues that common statistical tests are overcomplicated when they are taught, and could be made simpler if presented a simple linear models. The article then does just that, provide an introduction to key statistical tests including Pearson/spearman correlation, one-sample t-text, independent t-test, one-way ANOVA and more! What is better is that it comes with an awesome cheatsheet.
 
 
AutoMLGS, a new contender, enters the field of automated machine learning model search. AutoMLGS is an interesting open source library that serves as a high level wrapper on top of several open source ML libraries like tensorflow, scikitlearn, etc. The difference is that AutoMLGS generates the code required to build the model, introducing a level of interpretability to the model generation, so it's possible to see all the data transformations that go through the pre-processing and model creation. If you are interested on AutoML libraries check out our featured open source AutoML libraries below.
 
 
As always, the Strata Data Conference in San Francisco led an incredibly exciting set of practical tech talks. Some of these talks included "hacking the vote" through ML-led campaigns, cryptography in AI, impact of social media, chatbot analysis, data warehousing criticisms, enterprise use of AI and beyond. Check out the rest of the highlights.
 
 
The 2019 stackoverflow developer survey is out, bringing insights from nearly 90,000 developers from all around the world. Some key metrics they made available show Python's new position as the fastest-growing programming language, devops & SRE being the highes paid roles (yes, higher than data science), developer optimism around the world, productivity blockers and beyond.
 
 
 
 
MLOps = Featured OS Libraries
The theme for this week's featured ML libraries is AutoML frameworks, which fall on our Responsible ML Principle #4. The four featured libraries on data stream processing this week are:
 
  • auto-sklearn - Framework to automate algorithm and hyperparameter tuning for sklearn
  • TPOT - Automation of sklearn pipeline creation (including feature selection, pre-processor, etc)
  • Colombus - A scalable framework to perform exploratory feature selection implemented in R
  • automl - Automated feature engineering, feature/model selection, hyperparam. optimisation
 
If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request
 
 
 
We feature conferences that have core  ML tracks (primarily in Europe for now) to help our community stay up to date with great events coming up.
 
Technical Conferences
 
  • DataFest19 [11/03/2019] - Two week festival of Data Innovation hosted across Scotland, UK.
 
 
  • AI Conference Beijing [18/06/2019] - O'Reilly's signature applied AI conference in Asia in Beijing, China.
 
 
  • Data Natives [21/11/2019] - Data conference in Berlin, Germany.
 
  • ODSC Europe [19/11/2019] - The Open Data Science Conference in  London, UK.
 
 
 
Business Conferences
 
  • World Summit AI Americas [10/04/2019] - Large scale AI summit in Montreal, Canada.
    • Come join our panel on AI Ethics and Tools.
 
 
 
  • Big Data LDN 2019 [13/11/2019] - Conference for strategy and tech on big data in London, UK.
 
 
 
We showcase Machine Learning Engineering jobs (primarily in London for now) to help our community stay up to date with great opportunities that come up. It seems that the demand for data scientists continues to rise!
 
Junior Opportunities
 
 
Mid-level Opportunities
 
Leadership Opportunities
 
 
 
© 2018 The Institute for Ethical AI & Machine Learning