Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.


THE ML ENGINEER
Issue #1

 
 
This week on Issue #1:
Combating class imbalance, machine learning reproducibility, data science career transitions, empowering curiosity, automating boring tasks, Facebook's new library and some awesome featured open source ML libraries!
 
Support the ML Engineer!
Forward the email, or share the online version on 🐦 Twitter,  💼 Linkedin and  📕 Facebook!
 
 
 
 
8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset. Very comprehensible article providing some of the best practices when approaching class imbalance.
 
 
Great video that discusses the challenges of explainability and reproducibility in production machine learning. It provides an abstraction of production machine learning into a compliant pipeline at each computational step. It is presented with an example using Pachyderm for the "compliant" pipeline, and Seldon for serving the production model. 
 
 
"What got people hired back in 2017 doesn’t work today, and the disparity between data science hiring standards today and those that will apply one or two years from now will probably be even bigger". This article provides an interesting path to approach a career change into data science coming from 3 angles: complete beginners, software engineers, and recent STEM grads.
 
 
This article proposes that it's possible to introduce a culture of innovation by empowering data scientists through their innate curiousity. It covers: 1) ensure data science is its own entity within an organisation, 2) equip data scientists with the right tools, and 3) build a culture that supports a process for learning and experimentation
 
 
Awesome piece based on the "automate your boring stuff" book + Slacker, an open source Python library to interact with Slack. The article provides examples of how they super-charged ML workflows
 
 
Facebook open sources "a modeling framework that blurs the boundaries between experimentation and large-scale deployment." It's based on PyTorch, and provides a set of classifiers, sequence taggers, etc. (GitHub Repository)
 
 
Our Awesome Machine Learning Operations list has seen some great contributions these last few days. This edition's featured open source libraries focus majorly around reproducibility, and include:
  • MLFlow - Open source platform to manage the ML lifecycle, including experimentation, reproducibility and deployment.
  • Sacred - Tool to help you configure, organize, log and reproduce machine learning experiments.
  • FGLab -Machine learning dashboard, designed to make prototyping experiments easier.
  • StudioML - Model management framework which minimizes the overhead involved with scheduling, running, monitoring and managing artifacts of your machine learning experiments.
 
If you know of any libraries that are not in the list, please do give us a heads up or feel free to add a pull request