Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.

Excited to release initial insights for our Survey on Production MLOps!! The survey would still benefit from your contribution and it's OPEN FOR RESPONSES 🚀🚀🚀

If you have a few minutes, your contribution will make a significant difference to the whole production ML ecosystem 🥳 The results will be shared as open source like last year!! You can add your response directly at:

https://forms.gle/KF16EckuxNUKDtDK8 🔥

Issue #362 🤖

Thank you for being part of over 70,000+ ML professionals and enthusiasts who receive weekly articles & tutorials on Machine Learning & MLOps 🤖 You can join the newsletter https://bit.ly/state-of-ml-2025 ⭐

If you like the content please support the newsletter by sharing with your friends via ✉️ Email, 🐦 Twitter, 💼 Linkedin and 📕 Facebook!

This week in ML Engineering:

Google Weather 2.0 Forecast
META on GenAI World Models
The State of MLOps 2025 Survey 🔥
MCP Protocol UI Extension
DuckDB's Take on the Lakehouse
Open Source ML Frameworks
Awesome AI Guidelines to check out this week
+ more 🚀

Google Weather Forecast 2.0

These last years there's been exciting innovations in weather (+ climate) forecasting with foundation / deep learning models; Google just released WeatherNext 2.0 which introduces a probabilistic forecaster that generates hundreds of physically consistent scenarios for weather prediction: This new model uses a Functional Generative Network architecture that injects noise in function space so the model is trained only on local "marginals"; this is interesting as it's quite different to how current models are generally trained. This allows the model to learn coherent "joint" structures across variables, space, and time - this is actually quite important in phisics related modelling (which traditionally would be done with simulations). This seems to enable better prediction of complex regional contexts such as heat waves or wind farm output, etc. It's quite interesting to see that many major tech companies are in the race for these type of models which bring the complexity of ML with real-world physics simulations.

META on GenAI World Models

We continue to see exciting waves in the field of GenAI World Models (aka text to 3D navigable environments); this week META releases a new system for AI generation of 3D worlds: This follows similar World Models (like Marble announced last week) which output 3D worlds using diffusion-based image-to-3D reconstruction models, together with object-level scene decomposition, and mesh/texture refinement which makes these worlds game-engine ready. The way these work basically is through a pipeline that first uses procedural reasoning and a global reference image to produce a coherent blockout plus navmesh, then reconstructs a large textured scene while enforcing constraints. This scene can then be decomposed into individual assets with refined geometry and textures per object. This is the interesting thing with innovative ML systems like these, namely that the AI is a critical component, but the innovation that goes towards the robust engineering and experience is what makes it fit together into a cohesive usable workflow.

The State of MLOps 2025 Survey 🔥

MLFlow is dominating Experiment Tracking space in ML in 2025 with 57% adoption (vs 42% in 2024)! Additionally, it makes me very happy to say that this year, Spreadsheets are down to 3% (vs 10% in 2024) - great work everyone 😂!! Closely following we have W&B with 8% (vs 7% in 2024) and DVC with 5% (vs 5% in 2024). The space of Experiment Trackers seem to be the most consolidated space across the MLOps lifecycle by far! We still need your support to continue collecting diverse perspectives to map the ecosystem! Please help us with your response, as well as by sharing with your colleagues 🚀🚀🚀 If you have a few minutes, your contribution will make a significant difference to the whole production ML ecosystem 🥳 The results will be shared as open source like last year!! You can add your response directly at: https://forms.gle/KF16EckuxNUKDtDK8🔥

MCP Protocol UI Extension

The MCP protocol is a living standard that continuously evolves through SEPs (like PIPs in python); this week a new SEP introduces standardised UI interfaces for MCP servers which seem to be a potential game-changer in usability: The SEP is pretty simple, basically standardising how MCP servers can expose interactive UIs to hosts instead of just text/JSON, using a pre-declared ui:// path protocol where HTML templates are linked to tools via metadata and communicating over the existing MCP JSON-RPC transport. This seems quite interesting for AI systems developers as it allows for easier interaction / debugging, but also potentially creating an ecosystem that could be used easily for both agents (e.g. browser agents) and humans. As we all know the saying, at this stage a bad standard is better than no standard - but certainly this is a living project so there's more than certainly a lot of evolutions (+ breaking changes) expected in the near future.

DuckDB's Take on the Lakehouse

DuckDB is great for local development. But what if you want to take DuckDB to distrbuted data-lake scale? This is a great video from the DuckDB author dives into the current challenges (+ tech/architecture debt) in the Lakebase approaches, as well as how DuckLake tackles these: DuckLake is basically a new open lakehouse table format from the DuckDB team that replaces Iceberg/Delta-style metadata stack with a straightforward architecture, arguing that the current architecture is overly / unecessarily complex. This basically includes a transactional SQL database for metadata plus Parquet files on any storage (S3, NFS, local disk) and stateless compute. This could be quite interesting for ML practitioners, as connecting your local development to a production scale DuckDB environment could be super easy and lightweight. This is definitely a space to watch; personally I do see what Iceberg has been doing in the space as incredibly innovative, however any further breakthroughs through tooling competition that makes the environment better is always welcome!

Upcoming MLOps Events

The MLOps ecosystem continues to grow at break-neck speeds, making it ever harder for us as practitioners to stay up to date with relevant developments. A fantsatic way to keep on-top of relevant resources is through the great community and events that the MLOps and Production ML ecosystem offers. This is the reason why we have started curating a list of upcoming events in the space, which are outlined below.

Upcoming conferences where we're speaking:

WeAreDevelopers 2025 - 9th July @ Berlin
SRE Con EMEA 2025 - 7th Oct @ Dublin
World Summit AI Europe - 08 Oct @ Amsterdam
Code.Talks 2025 - 5th Nov @ Hamburg

Other upcoming MLOps conferences in 2025:

ODSC East - May 13 @ Boston
Data & AI Summit - 9th June @ San Francisco
AI Summit London - 12th June @ UK
MLOps World - Oct 8-9 @ Austin

Open Source MLOps Tools

Check out the fast-growing ecosystem of production ML tools & frameworks at the github repository which has reached over 10,000 ⭐ github stars. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. Four featured libraries in the GPU acceleration space are outlined below.

Kompute - Blazing fast, lightweight and mobile phone-enabled GPU compute framework optimized for advanced data processing usecases.
CuPy - An implementation of NumPy-compatible multi-dimensional array on CUDA. CuPy consists of the core multi-dimensional array class, cupy.ndarray, and many functions on it.
Jax - Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
CuDF - Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.

If you know of any open source and open community events that are not listed do give us a heads up so we can add them!

OSS: Policy & Guidelines

As AI systems become more prevalent in society, we face bigger and tougher societal challenges. We have seen a large number of resources that aim to takle these challenges in the form of AI Guidelines, Principles, Ethics Frameworks, etc, however there are so many resources it is hard to navigate. Because of this we started an Open Source initiative that aims to map the ecosystem to make it simpler to navigate. You can find multiple principles in the repo - some examples include the following:

MLSecOps Top 10 Vulnerabilities - This is an initiative that aims to further the field of machine learning security by identifying the top 10 most common vulnerabiliites in the machine learning lifecycle as well as best practices.
AI & Machine Learning 8 principles for Responsible ML - The Institute for Ethical AI & Machine Learning has put together 8 principles for responsible machine learning that are to be adopted by individuals and delivery teams designing, building and operating machine learning systems.
An Evaluation of Guidelines - The Ethics of Ethics; A research paper that analyses multiple Ethics principles.
ACM's Code of Ethics and Professional Conduct - This is the code of ethics that has been put together in 1992 by the Association for Computer Machinery and updated in 2018.

If you know of any guidelines that are not in the "Awesome AI Guidelines" list, please do give us a heads up or feel free to add a pull request!

About us

The Institute for Ethical AI & Machine Learning is a European research centre that carries out world-class research into responsible machine learning.

Check out our website

✉️ Email, 🐦 Twitter, 💼 Linkedin

This email was sent to You received this email because you are registered with The Institute for Ethical AI & Machine Learning's newsletter "The Machine Learning Engineer"

Unsubscribe here