Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.

We are releasing Code Mode + Graph for the K8s Agent Orchestration System 🚀

Check out the demo 👇

If you want to support the momentum, please do reshare, open an issue, and/or give the repo a star ⭐

Issue #373 🤖

Thank you for being part of over 70,000+ ML professionals and enthusiasts who receive weekly articles & tutorials on Machine Learning & MLOps 🤖 You can join the newsletter https://bit.ly/state-of-ml-2025 ⭐

If you like the content please support the newsletter by sharing with your friends via ✉️ Email, 🐦 Twitter, 💼 Linkedin and 📕 Facebook!

This week in ML Engineering:

CodeMode and Graphs in KAOS
Building TikTok RecSys from Scratch
RL-Aligned Ranking in RecSys
Metaflow + Kubeflow Announcement
The Waymo World Model
Open Source ML Frameworks
Awesome AI Guidelines to check out this week
+ more 🚀

CodeMode + Graph in KAOS

KAOS now shows multi-agent systems in visual graphs that allow you to manage advanced concepts like Anthropic's code-mode: We have extended KAOS to support an interactive graph-visualisation screen that allows you to create and manage agentic resources intuitively. This enables advanced functionality such as introducing code-mode sandboxes which are now growing in popularity as an effective way to improve agent-to-mcp performance by aggregating multiple MCP servers and exposing them as local functions. We have extended support for Port-of-Context which is one of the first code mode implementations, providing a typescript code engine in rust, which is now exposed in KAOS as a native runtime that can be spinned up. To provide some intuition on how code mode works, traditionally an agent would have to make multiple MCP round trips as follows:

LLM: "I'll call add(42, 8)" > Call -> Result LLM: "Now I'll call multiply" > Call -> Result LLM: "Now I'll call power" > Call -> Result LLM: "Now I'll call uppercase" > Call -> Result LLM: "Now I'll call word count" > Call -> Result LLM: "Here's the final answer"

Now instead we can have a "code-mode" abstraction that converts all mcp servers into a unified set of functions, such that the agent just makes one call that is simple code - same example above would look like this:

LLM: "I'll call the following set of mcpservers through the code mode execution sandbox: '''

sum = await calc.add(...);

product = await calc.multiply(...);

squared = await calc.power(...);

wordCount = await text.word_count(...);

return result = await text.uppercase(...);

''' -> Call -> Result

LLM: "Here's the final answer"

This basically reduces the round-trips significantly and enables for a much more controlled experience to the LLM itself. There are various exciting initiatives also exploring this, including the recent announcement from the Pydantic team releasing a new Python runtime in Rust that will serve as the first Python code-mode environment. Check out the article for more details.

Building TikTok RecSys from Scratch

Build TikTok's Personalized Real-Time Recommendation System from Scratch. This is a fantastic talk from Jim Dowling's Hopsworks on basically how to build a TikTok-style personalized real-time recommender in Python using Hopsworks as the ML platform layer (e.g. feature store, vector index, model registry, and serving). It is particularly interesting to see such a practical example that demonstrates the theoretical foundations on what is one of the most impressive (and scariest) instant learning systems out there. The real time nature of retraining and capturing fresh features from user actions (eg views/likes/watch time), and making them available for inference is now a critical architectural requirement for any consumer facing teams. This is a pretty meaty talk/workshop which walks through an end to end architecture split into a feature / training pipeline that learns a two-tower embedding model from interaction data (e.g. user/query + video/candidate), and an online inference pipeline that embeds the latest user context + retrieves similar candidate videos via ANN/vector search to serves results through a simple Streamlit UI. Definitely a great resource to catch up for any MLOps practitioner out there!

RL-Aligned Ranking in RecSys

RecSys ranking is shaping what billions of people watch, buy, and believe, so small improvements (or mistakes) can have a huge impact on user experience - this is a great resource that explores LLM RLHF style approaches in recommender systems: It is quite interesting to take this current angle, namely while modern recommender systems ranking models are typically trained to predict click/engagement/return and then combined with a hand-coded value function, they rarely optimize that value end-to-end, creating a gap compared to LLM pipelines pre-RLHF. There seems to be an opportunity to treat the ranking model as a policy over items and adding an off-policy, propensity-weighted policy-gradient-style loss that directly maximizes a scalar value/reward model (analogous to an LLM reward model) while using constraints (akin to KL regularization) to control drift. This basically means that recsys optimization could be extended to the end to end, such that ranking becomes a true reward optimization without changing inference latency. It is quite interesting to see that some RL approaches that have been revolutionising LLMs are now being brought into more traditional fields of ML, as it seems there's quite a lot of opportunity (and everyone loves a chance to play with RL).

The Waymo World Model

"What should a self driving car do if a huricane appears infront of it?" These are the corner cases that are hard or impossible to train models on, which is why it's exciting to see World Models in practice already at Waymo being used for synthetic-data-like generation. Basically, Waymo’s World Model is a generative simulation system adapted from Google DeepMind’s Genie 3 (exciting to see applications already!) that creates hyper-realistic, interactive 3D driving environments and produces multi-sensor outputs including camera and lidar, letting Waymo run billions of virtual miles to validate safety beyond what fleet data alone can capture. This is pretty awesome; it basically can synthesize rare long-tail scenarios (e.g. extreme weather, unusual objects, safety-critical events) without the need for this data to exist. Apparently it is also designed to be highly controllable via driving-action, scene-layout, and language conditioning for counterfactual what-if testing. This is exactly the type of interesting applications that will become exciting, this is the type of thinking that should go with these models, not just "it will replace videogames", as if anything it is augmenting various fields already.

Metaflow + Kubeflow Announcement

Workflow orchestration is the backbone of production ML, so it was quite interesting to see the recent integration between Kubeflow (yes still alive!) and Metaflow: This seems like primarily an ability to write native Metaflow workflows and execute them with a backend in Kubeflow (ie in k8s) as Kubeflow Pipelines alongside current KFP workloads - which to my memory means they basically run as Argo Workflows / Tecton pipelines. Great to see some of the OGs of the MLOps space are still driving quality-of-life integrations to continue to evolve and improve the ecosystem. I believe Vertex AI still uses Kubeflow under-the-hood so this sounds like it would bring some robust benefits that metaflow brings (+ convenience).

Upcoming MLOps Events

The MLOps ecosystem continues to grow at break-neck speeds, making it ever harder for us as practitioners to stay up to date with relevant developments. A fantsatic way to keep on-top of relevant resources is through the great community and events that the MLOps and Production ML ecosystem offers. This is the reason why we have started curating a list of upcoming events in the space, which are outlined below.

Events we are speaking at this year:

eTail Europe - March @ Berlin
World Summit AI Europe - September @ Amsterdam

Other relevant events:

KubeCon Europe - March @ Amsterdam
PyData Berlin - April @ Frankfurt
Databricks Summit - June @ San Francisco
World Developer Congress - July @ Berlin
EuroPython 2026 - July @ Prague
EuroSciPy 2026 - July @ Krakow
Code.Talks 2026 - Nov @ Hamburg
MLOps World 2026 - Nov @ Austin

In case you missed our talks, check our recordings below:

The State of AI in 2025 - WeAreDevelopers 2025
Prod Generative AI in 2024 - KubeCon AI Day 2025
The State of AI in 2024 - WeAreDevelopers 2024
Responsible AI Workshop Keynote - NeurIPS 2021
Practical Guide to ML Explainability - PyCon London
ML Monitoring: Outliers, Drift, XAI - PyCon Keynote
Metadata for E2E MLOps - Kubecon NA 2022
ML Performance Evaluation at Scale - KubeCon Eur 2021
Industry Strength LLMs - PyData Global 2022
ML Security Workshop Keynote - NeurIPS 2022

Open Source MLOps Tools

Check out the fast-growing ecosystem of production ML tools & frameworks at the github repository which has reached over 20,000 ⭐ github stars. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. Here's a few featured open source libraries that we maintain:

KAOS - K8s Agent Orchestration Service for managing the KAOS in large-scale distributed agentic systems.
Kompute - Blazing fast, lightweight and mobile phone-enabled GPU compute framework optimized for advanced data processing usecases.
Production ML Tools - A curated list of tools to deploy, monitor and optimize machine learning systems at scale.
AI Policy List - A mature list that maps the ecosystem of artificial intelligence guidelines, principles, codes of ethics, standards, regulation and beyond.
Agentic Systems Tools - A new list that aims to map the emerging ecosystem of agentic systems with tools and frameworks for scaling this domain

Please do support some of our open source projects by sharing, contributing or adding a star ⭐

About us

The Institute for Ethical AI & Machine Learning is a European research centre that carries out world-class research into responsible machine learning.

Check out our website

✉️ Email, 🐦 Twitter, 💼 Linkedin

This email was sent to You received this email because you are registered with The Institute for Ethical AI & Machine Learning's newsletter "The Machine Learning Engineer"

Unsubscribe here