Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.

Our new project went viral

last week on Linkedin and Reddit 🤠🤠🤠

A new plugin for Vim to add Power Mode 😎👇

If you want to support the momentum, do try it out, reshare, and/or give the repo a star ⭐

https://github.com/axsaucedo/neovim-power-mode 🔥

Issue #377 🤖

Thank you for being part of over 70,000+ ML professionals and enthusiasts who receive weekly articles & tutorials on Machine Learning & MLOps 🤖 You can join the newsletter https://bit.ly/state-of-ml-2025 ⭐

If you like the content please support the newsletter by sharing with your friends via ✉️ Email, 🐦 Twitter, 💼 Linkedin and 📕 Facebook!

This week in ML Engineering:

QWEN-3.5 Challenging Sonnet
Simon Willison on Agentic Patterns
The Q, K, V Matrices
What is Entropy?
META AMD GPU Workloads
Open Source ML Frameworks
Awesome AI Guidelines to check out this week
+ more 🚀

QWEN-3.5 Challenging Sonnet

The race to build reliable multimodal agents is ON, but is the future open? QWEN-3.5 gives us hope that it is... Sonnet-like performance in your local machine: Qwen3.5 was released a few weeks ago, however the community is now starting to report impressive performance comparable to some of the more popular proprietary models - this is really exciting. To dive into the internals briefly, Qwen3.5-397B-A17B basically combines sparse MoE and hybrid attention so only 17B parameters are active per token, but still able to show strong performance across reasoning, coding, tool use, vision, document understanding, video, and multilingual tasks. For ML practitioners the key takeaways are less about absolute benchmark wins and more about the system design: early text-vision fusion, extensive RL-based post-training for agent behavior, support for 201 languages and dialects, as well as infrastructure advances such as FP8 training, heterogeneous multimodal training, and asynchronous RL that aim to improve throughput, stability, and cost efficiency. It is really hard to keep up, but open source (open weight) releases like these really set the example and raise the bar for organisations trying to hold the moat with proprietary closed models.

Simon Willison on Agentic Patterns

Agentic engineering is quickly becoming one of the most important shifts in how production software and ML systems get built, but how do we build these right? Simon Willison shares a really great write-up on Agentic Engineering Patterns, where he is documenting live as a guide for devs using coding agents to accelerate real software work - note, not vibe coding but agentic engineering. This has quite a lot of sound (+ sometimes obvious) advise, such as ensuring supervision, secure code execution, and iterative testing. For production ML practitioners, the main point is that as code generation becomes cheap, the real challenge shifts to building workflows that preserve reliability, review quality, and team judgment - the team part is the hardest. This is definitely one of the many resources that will be important to keep a close eye on!

The Q, K, V Matrices

Attention has really been all you need so far... understanding the internals is a great way to boost your skills, and the core foundation is the Q, K, V Matrices: The core implementation of transformer based architectures relies on the attention mechanism which is formulated by the Q, K, V Matrices, and this is a great deep dive to refresh or learn about it. This basically introduces the matrices end to end with a hands-on NumPy example, including the nuanced multiplications, as well as connecting them to dot-product attention, where query-key similarity produces attention weights that are applied to values. For production ML practitioners, the key value is in demystifying self-attention at the tensor level while also explaining why separate projections matter and how choices like per-head dimension affect the usual trade-offs between model capacity, memory use, and compute.

What is Entropy?

Understanding entropy is foundational to how we reason about uncertainty, information flow, and system behavior - this becomes even more important with the growing popularity of probabilistic systems with multi-agent architectures: This is one of the best explainers / introductions to the concept of entropy, which is best framed as a measure of uncertainty over possible underlying states rather than as vague "disorder" - also one of the best visual explainers in the topic. Shannon entropy is also a good start to explain the expected surprise, then showing the same idea carries into physics, where a macrostate can correspond to many microstates and entropy grows with the number and probability of those compatible configurations. For production ML practitioners, the useful takeaway is that entropy is not mystical; it is a practical concept for reasoning about uncertainty, compression, latent-state ambiguity, and how modeling choices about state abstraction can shape both system behavior and interpretation.

META AMD GPU Workloads

In production ML, efficiency increasingly determines success, and Meta published their approach to large-scale training in AMD swarms: Meta released RCCLX as an open-source framework to tackle one of the biggest practical bottlenecks in large-scale training and inference with GPU communication overhead. For production ML teams, the key point is that Meta is adding a new backend for shipping communication optimizations for AMD platforms through Torchcomms, which now uses Direct Data Access collectives to cut intra-node latency, which is particularly key for decode-heavy LLM inference. Meta reports meaningful internal gains on MI300-class hardware, including faster decode and prefill performance, lower time-to-incremental-token, reduced latency, and higher throughput, while keeping numerical accuracy within acceptable bounds for its workloads. It does seem like the sub-field of GPU engineering will only continue to explode in popularity, so if you are a practitioner that wants to stay relevant amid the rise of AI, this would definitely be a relevant domain!

Upcoming MLOps Events

The MLOps ecosystem continues to grow at break-neck speeds, making it ever harder for us as practitioners to stay up to date with relevant developments. A fantsatic way to keep on-top of relevant resources is through the great community and events that the MLOps and Production ML ecosystem offers. This is the reason why we have started curating a list of upcoming events in the space, which are outlined below.

Events we are speaking at this year:

eTail Europe - March @ Berlin
World Summit AI Europe - September @ Amsterdam

Other relevant events:

KubeCon Europe - March @ Amsterdam
PyData Berlin - April @ Frankfurt
Databricks Summit - June @ San Francisco
World Developer Congress - July @ Berlin
EuroPython 2026 - July @ Prague
EuroSciPy 2026 - July @ Krakow
Code.Talks 2026 - Nov @ Hamburg
MLOps World 2026 - Nov @ Austin

In case you missed our talks, check our recordings below:

The State of AI in 2025 - WeAreDevelopers 2025
Prod Generative AI in 2024 - KubeCon AI Day 2025
The State of AI in 2024 - WeAreDevelopers 2024
Responsible AI Workshop Keynote - NeurIPS 2021
Practical Guide to ML Explainability - PyCon London
ML Monitoring: Outliers, Drift, XAI - PyCon Keynote
Metadata for E2E MLOps - Kubecon NA 2022
ML Performance Evaluation at Scale - KubeCon Eur 2021
Industry Strength LLMs - PyData Global 2022
ML Security Workshop Keynote - NeurIPS 2022

Open Source MLOps Tools

Check out the fast-growing ecosystem of production ML tools & frameworks at the github repository which has reached over 20,000 ⭐ github stars. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. Here's a few featured open source libraries that we maintain:

KAOS - K8s Agent Orchestration Service for managing the KAOS in large-scale distributed agentic systems.
Kompute - Blazing fast, lightweight and mobile phone-enabled GPU compute framework optimized for advanced data processing usecases.
Production ML Tools - A curated list of tools to deploy, monitor and optimize machine learning systems at scale.
AI Policy List - A mature list that maps the ecosystem of artificial intelligence guidelines, principles, codes of ethics, standards, regulation and beyond.
Agentic Systems Tools - A new list that aims to map the emerging ecosystem of agentic systems with tools and frameworks for scaling this domain

Please do support some of our open source projects by sharing, contributing or adding a star ⭐

About us

The Institute for Ethical AI & Machine Learning is a European research centre that carries out world-class research into responsible machine learning.

Check out our website

✉️ Email, 🐦 Twitter, 💼 Linkedin

This email was sent to You received this email because you are registered with The Institute for Ethical AI & Machine Learning's newsletter "The Machine Learning Engineer"

Unsubscribe here