Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.

Issue #333 🤖

Thank you for being part of over 70,000+ ML professionals and enthusiasts who receive weekly articles & tutorials on Machine Learning & MLOps 🤖 You can join the newsletter for free at https://ethical.institute/mle.html ⭐

If you like the content please support the newsletter by sharing with your friends via ✉️ Email, 🐦 Twitter, 💼 Linkedin and 📕 Facebook!

This week in Machine Learning:

OpenAI's Rouge AI Model
The ML Leaderboard Illusion
When ChatGPT broke NLP
Alibaba's Qwen3 Released
Preparing for Engineering Leadership
Open Source ML Frameworks
Awesome AI Guidelines to check out this week
+ more 🚀

If you're looking for an interesting career opportunity, I'm hiring for a few roles including Data Science Manager (Forecasting), as well as Data Scientist (Forecasting) - check them out and please do share with your network!

OpenAI's Rouge AI Model

OpenAI just reminded all of us the importance of machine learning testing and monitoring; also this is the most we've ever heard the word "sycophanthy" (aka people pleasing AI): The April 2025 update to GPT‑4 was a disaster, unintentionally making this the most unhelpful model so far, due to the over-prioritisation of user approval instead of correctness, resulting in unhelpful behaviour across tasks. This is a particularly interesting situation as it makes it clear the type of challenges that as practitioners we face with the monitoring and evaluation of machine learning models; despite potentially positive results on unit tests, and A/B tests, models can face drift from training environments that requires not just single-inference testing but also "integration tests" that cover end-to-end flows. OpenAI released a really interesting (pseudo-)post-mortem for this incident providing more context on the context behind the issues, but certainly there's some clear opportunities for the MLOps community to take notes on the importance of end to end testing flows that reflect the real world, such that it is not only limited in units but that evaluate the end to end.

The ML Leaderboard Illusion

Every week we see a new model that breaks through the glass ceiling of "The Leaderboard", however Stanford, Princeton, MIT et al have released an insightful study that debunks the reliability of model rankings on public platforms: There has been a growing illusion on leaderboards in ML through ranking platforms such as Chatbot Arena which seem do not take key aspects into consideration such as model deprecation, sparse comparison graphs, and inconsistent task distributions lead, which has led to clear distorted leaderboard results. It is interesting to see this research initiative from some of the most renowned organisations which explain that the Bradley-Terry model's assumptions about consistent matchups are compromised when models are deprecated or removed, resulting in misleading rankings. This comes with several recommendations to improve fairness on these leaderboards, which seem quite reasonable and it is great to see a way forward to ensure we can re-focus the community in the right direction.

When ChatGPT broke NLP

Anyone working in NLP throughout the recent AI wave will be aware of the impact LLMs have had in the field: Large language models have had a transformative impact in the field of Natural Language Processing, which ironically surfaced with early skepticism surrounding early transformer models ahead of the disruption caused by GPT-3 and ChatGPT. This is a great deep dive into the rapid acceleration of research, shifting the focus from linguistic theory to scalable model development, resulting in a crisis of relevance faced by researchers as LLMs rendered many existing tasks less relevant. This is a great deep dive that delves into the divide within the NLP community between proponents of LLM capabilities and those concerned about their limitations, ethics, and societal consequences - both certainly debating relevant and important topics in the space.

Alibaba's Qwen3 Released

China comes back with another open source AI model release with Alibaba's Qwen3: This is another insightful release in the AI race, providing (the usual) blend of dense + MoE architectures, and with high performance in tasks like coding, math, and general problem-solving. The great trend in open models benefits the community as a whole as it supports the community to push towards the next level of efficiency and performance, as well as the techniques required for hybrid reasoning or fast responses depending on task complexity. It is interesting to see how this field is developing at pace with only further acceleration across the field.

Preparing for Engineering Leadership

It is hard to make the leap from individual contributor to people leader in the software space - here's a few tips to prepare: Leadership is about people, it's not just about technical skills, which means that it's important to focus on developing soft skills - some relevant steps you can take forward in your day-to-day as an IC are: 1. Start Mentoring: Offer guidance to junior developers, which builds leadership by improving your communication, feedback, and problem-solving abilities. 2. Run Team Meetings: Leading discussions or demos improves confidence and organizational skills, essential for tech leads and managers. 3. Track Impact: Shift focus from just coding to demonstrating how your work contributes to team success, showcasing leadership potential. This is a great article for more junior aspiring leaders that want to take the next step forward to transition into leadership roles.

Upcoming MLOps Events

The MLOps ecosystem continues to grow at break-neck speeds, making it ever harder for us as practitioners to stay up to date with relevant developments. A fantsatic way to keep on-top of relevant resources is through the great community and events that the MLOps and Production ML ecosystem offers. This is the reason why we have started curating a list of upcoming events in the space, which are outlined below.

Upcoming conferences where we're speaking:

WeAreDevelopers 2025 - 9th July @ Berlin
SRE Con EMEA 2025 - 7th Oct @ Dublin

Other upcoming MLOps conferences in 2025:

ODSC East - May 13 @ Boston
Data & AI Summit - 9th June @ San Francisco
Data & AI Summit - 10th June @ USA
AI Summit London - 12th June @ UK
World Summit AI Europe - 08 Oct @ Amsterdam
MLOps World - Oct 8-9 @ Austin

Open Source MLOps Tools

Check out the fast-growing ecosystem of production ML tools & frameworks at the github repository which has reached over 10,000 ⭐ github stars. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. Four featured libraries in the GPU acceleration space are outlined below.

Kompute - Blazing fast, lightweight and mobile phone-enabled GPU compute framework optimized for advanced data processing usecases.
CuPy - An implementation of NumPy-compatible multi-dimensional array on CUDA. CuPy consists of the core multi-dimensional array class, cupy.ndarray, and many functions on it.
Jax - Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
CuDF - Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.

If you know of any open source and open community events that are not listed do give us a heads up so we can add them!

OSS: Policy & Guidelines

As AI systems become more prevalent in society, we face bigger and tougher societal challenges. We have seen a large number of resources that aim to takle these challenges in the form of AI Guidelines, Principles, Ethics Frameworks, etc, however there are so many resources it is hard to navigate. Because of this we started an Open Source initiative that aims to map the ecosystem to make it simpler to navigate. You can find multiple principles in the repo - some examples include the following:

MLSecOps Top 10 Vulnerabilities - This is an initiative that aims to further the field of machine learning security by identifying the top 10 most common vulnerabiliites in the machine learning lifecycle as well as best practices.
AI & Machine Learning 8 principles for Responsible ML - The Institute for Ethical AI & Machine Learning has put together 8 principles for responsible machine learning that are to be adopted by individuals and delivery teams designing, building and operating machine learning systems.
An Evaluation of Guidelines - The Ethics of Ethics; A research paper that analyses multiple Ethics principles.
ACM's Code of Ethics and Professional Conduct - This is the code of ethics that has been put together in 1992 by the Association for Computer Machinery and updated in 2018.

If you know of any guidelines that are not in the "Awesome AI Guidelines" list, please do give us a heads up or feel free to add a pull request!

About us

The Institute for Ethical AI & Machine Learning is a European research centre that carries out world-class research into responsible machine learning.

Check out our website

✉️ Email, 🐦 Twitter, 💼 Linkedin

This email was sent to You received this email because you are registered with The Institute for Ethical AI & Machine Learning's newsletter "The Machine Learning Engineer"

Unsubscribe here