Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.

Issue #19
 This week we'll be speaking on ML across Data Streams and Data Pipelines at the AI, Big Data & IoT Expo London 2019, if you are around come to our talk, or just drop by to say hello!
This week in Issue #19:
Advanced NLP & ML with SpaCy, hyperparam search across models, a deep dive on ML version control, opportunities beyond black holes, scientific Python in the browser, XAI v0.0.5 released, reproducible ML, upcoming AI conferences, new Machine Learning jobs and more 🚀.
Support the ML Engineer!
Forward the email, or share the online version on 🐦 Twitter,  💼 Linkedin and  📕 Facebook!
If you would like to suggest articles, ideas, papers, libraries, jobs, events or provide feedback just hit reply or send us an email to! We have received a lot of great suggestions in the past, thank you very much for everyone's support!
Advanced concepts in NLP and Text machine learning have just become more accessible thanks to the (now) completely FREE course that has been put together by Ines from the core SpaCy team. This course includes fundamentals such as the concept of tokens, lemmas and part-of-speech tagging, together with more advanced topics such as processing pipelines and even training neural networks. What is even better is that the SpaCy team even released the course platform as fully open source, encouraging other developers to create their own courses with this. Thank you SpaCy team for being so awesome.
Scikit-learn's GridSearch is always a to-go tool when exploring multiple hyperparameter values across your model in your notebook. This short but insightful post allows you to take this concept across multiple models with the same simplicity, by walking through a wrapper class that trains and evaluates the performance across multiple models - each with respective hyperparameters to search on. The results can be easily customized to output any relevant metrics as required for your specific usecase.
Dmitry Petrov founder at DVC (Data Version Control)  joins the Python Podcast, and dives into one of the most popular topics in the world of production machine learning: Version Control for Machine Learning. The concept of version control in machine learning often defines it as the ability to maintain a (reproducible) trail for your data+code+model+config across the lifecycle of your data science project. Do check out this great resource to get an intro and deep dive into the topic from the author of one of the most exciting projects in the space.
FiveThirtyEight editor Maggie Koerth-Baker argues the awesome potential in the code developed to produce images of a supermassive void 55 million light years away. Indeed, several people behind NumFocus were extremely proud to see multiple of their sponsored open source projects used in the achievement of the "First M87 Event Horizon Telescope Results".
Incredibly interesting update on Pyodide and it's progress since its inception - Mozilla's experimental project to create a full Python data science stack that runs entirely in the browser. In this update it is clear that the experiment is now more of a tangible alpha-level library that has started to show incredibly promising results. Bringing Python into the browser could have some really interesting results - one of these could even be bringing together the massive javascript web community with the huge data science and data engineering community into deeper open source project/framework collaboration.
We're extremely excited to announce this update for the XAI Explainable AI Toolbox. This 0.0.5 update comes together with the addition of some fundamental tools used in data-science, including correlation matrix, confusion matrix, statistical metrics and more. Our next milestone will include NLP / Text ML evaluation support, so stay tuned. Feel free to submit feature requests, ideas, or issues through our github repo, or our website.
MLOps = Featured OS Libraries
The theme for this week's featured ML libraries is Model Versionoing for Machine Learning, which falls on our Responsible ML Principle #4. This week we want to dive deeper and feature some smaller libraries in this space - four featured libraries on data stream processing this week are:
  • FGLab - Machine learning dashboard, designed to make prototyping experiments easier.
  • Studio.ML - Model management framework which minimizes the overhead involved with scheduling, running, monitoring and managing artifacts of your machine learning experiments.
  • Flor - Easy to use logger and automatic version controller made for data scientists who write ML code
  • D6tflow - A python library that allows for building complex data science workflows on Python.
If you know of any libraries that are not in the "Awesome MLOps" list, please do give us a heads up or feel free to add a pull request
We feature conferences that have core  ML tracks (primarily in Europe for now) to help our community stay up to date with great events coming up.
Technical Conferences
  • DataFest19 [11/03/2019] - Two week festival of Data Innovation hosted across Scotland, UK.
  • AI Conference Beijing [18/06/2019] - O'Reilly's signature applied AI conference in Asia in Beijing, China.
  • Data Natives [21/11/2019] - Data conference in Berlin, Germany.
  • ODSC Europe [19/11/2019] - The Open Data Science Conference in  London, UK.
Business Conferences
  • World Summit AI Americas [10/04/2019] - Large scale AI summit in Montreal, Canada.
    • Come join our panel on AI Ethics and Tools.
  • Big Data LDN 2019 [13/11/2019] - Conference for strategy and tech on big data in London, UK.
We showcase Machine Learning Engineering jobs (primarily in London for now) to help our community stay up to date with great opportunities that come up. It seems that the demand for data scientists continues to rise!
Leadership Opportunities
Mid-level Opportunities
Junior Opportunities
© 2018 The Institute for Ethical AI & Machine Learning