The Institute for Ethical AI & Machine Learning

Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.

THE ML ENGINEER 🤖
Issue #18

This week in Issue #18:

Reducing bias in bios, data augmentation for images, common statistical tests, AutoML with code gen, strata san francisco, stackoverflow dev survey, AutoML featured libraries, upcoming AI conferences, new Machine Learning jobs and more 🚀.

Support the ML Engineer!

Forward the email, or share the online version on 🐦 Twitter, 💼 Linkedin and 📕 Facebook!

If you would like to suggest articles, ideas, papers, libraries, jobs, events or provide feedback just hit reply or send us an email to a@ethical.institute! We have received a lot of great suggestions in the past, thank you very much for everyone's support!

Reducing Bias in Bios

Current methods proposed to reduce bias require access to protected features, which may not always be available or may not be legal to use them. This research paper proposes a method for discouraging correlation between the predicted probability of an individual's true occupation and a word embedding of their name. This method leverages the societal biases encoded in word embeddings, eliminating the need for access to protected features. The paper shows great results reducing bias, and was given the "Best Paper Award" at the NAACL-HLT 2019.

Data Augmentation for Images

Fantastic blog post introducing multiple methods for data augmentation when using images. Data agumentation is a process that helps generate more data for training purposes, and it's often used when facing imbalanced datasets for example. The most common example is upsampling, which basically copies datapoints to increase the number of examples. This article however dives deeper, covering various techniques including image shifts, image flips, image rotations, brightess and zoom.

Common statistical tests

Teaching teachers to teach stats teachings in a teachable way. This post argues that common statistical tests are overcomplicated when they are taught, and could be made simpler if presented a simple linear models. The article then does just that, provide an introduction to key statistical tests including Pearson/spearman correlation, one-sample t-text, independent t-test, one-way ANOVA and more! What is better is that it comes with an awesome cheatsheet.

AutoML with code generation

AutoMLGS, a new contender, enters the field of automated machine learning model search. AutoMLGS is an interesting open source library that serves as a high level wrapper on top of several open source ML libraries like tensorflow, scikitlearn, etc. The difference is that AutoMLGS generates the code required to build the model, introducing a level of interpretability to the model generation, so it's possible to see all the data transformations that go through the pre-processing and model creation. If you are interested on AutoML libraries check out our featured open source AutoML libraries below.

Strata Data San Fran Highlights

As always, the Strata Data Conference in San Francisco led an incredibly exciting set of practical tech talks. Some of these talks included "hacking the vote" through ML-led campaigns, cryptography in AI, impact of social media, chatbot analysis, data warehousing criticisms, enterprise use of AI and beyond. Check out the rest of the highlights.

Stackoverflow Developer Survey

The 2019 stackoverflow developer survey is out, bringing insights from nearly 90,000 developers from all around the world. Some key metrics they made available show Python's new position as the fastest-growing programming language, devops & SRE being the highes paid roles (yes, higher than data science), developer optimism around the world, productivity blockers and beyond.

MLOps = Featured OS Libraries

The theme for this week's featured ML libraries is AutoML frameworks, which fall on our Responsible ML Principle #4. The four featured libraries on data stream processing this week are:

auto-sklearn - Framework to automate algorithm and hyperparameter tuning for sklearn
TPOT - Automation of sklearn pipeline creation (including feature selection, pre-processor, etc)
Colombus - A scalable framework to perform exploratory feature selection implemented in R
automl - Automated feature engineering, feature/model selection, hyperparam. optimisation

If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request!

MLConf = Conferences & Events

We feature conferences that have core ML tracks (primarily in Europe for now) to help our community stay up to date with great events coming up.

Technical Conferences

DataFest19 [11/03/2019] - Two week festival of Data Innovation hosted across Scotland, UK.

PyCon + PyData Florence [02/05/2019] - Python X comes this year with a PyData focus in Florence, Italy.

AI Conference Beijing [18/06/2019] - O'Reilly's signature applied AI conference in Asia in Beijing, China.

RAAIS 2019 [28/06/2019] - The Research and Applied AI Summit in London, UK

Data Natives [21/11/2019] - Data conference in Berlin, Germany.

ODSC Europe [19/11/2019] - The Open Data Science Conference in London, UK.

Spacy IRL [05/07/2019] - SpaCy NLP's First F2F Conference in Berlin, Germany.

Business Conferences

World Summit AI Americas [10/04/2019] - Large scale AI summit in Montreal, Canada.
- Come join our panel on AI Ethics and Tools.

AI Expo Global [19/04/2019] - Global conference on artificial intelligence in London, UK.
- Come join us at our talk on AI orchestration at scale.

Predictive Analytics World [18/11/2019] - Conference for Business AI in Berlin, Germany.

Big Data LDN 2019 [13/11/2019] - Conference for strategy and tech on big data in London, UK.

MLJobs = Jobs & Careers

We showcase Machine Learning Engineering jobs (primarily in London for now) to help our community stay up to date with great opportunities that come up. It seems that the demand for data scientists continues to rise!

Junior Opportunities

Seldon is hiring for a Machine Learning / Data Engineer in London
Migacore is hiring for a Machine Learning Engineer in London
CloudNC is hiring for a Machine Learning Engineer in London
Babylon Health is hiring for a Machine Learning Engineer in London
Chattermill is hiring for a Machine Learning Engineer in London

Mid-level Opportunities

Proportunity is hiring for a Senior Machine Learning Engineer in London
Twitter is hiring for a Senior Machine Learning Engineer in London
Atlas ML is hiring for a Lead NLP Engineer in London
StreetBees is hiring for a Senior Data Scientist in London
Expedia is hiring for a Principal Data Scientist in London
QuantumBlack is hiring for a Senior Machine Learning Engineer in London
Tractable is hiring for a Senior Deep Learning Engineer

Leadership Opportunities

Fractal Labs is hiring for a VP of Engineering in London
Distributed is hiring for a VP of Engineering in London
FactMata is hiring for a Head of Machine Learning in London
Brainpool.ai is hiring for a Head of Machine Learning in London, UK
Cytora is hiring for a Data Science Director in London