Subscribe to the Machine Learning Engineer Newsletter

Receive curated articles, tutorials and blog posts from experienced Machine Learning professionals.

Issue #383 🤖

Thank you for being part of over 70,000+ ML professionals and enthusiasts who receive weekly articles & tutorials on Machine Learning & MLOps 🤖 You can join the newsletter https://bit.ly/state-of-ml-2025 ⭐

If you like the content please support the newsletter by sharing with your friends via ✉️ Email, 🐦 Twitter, 💼 Linkedin and 📕 Facebook!

This week in ML Engineering:

LLMs are Databases, Really
NVIDIA Optimization for Agents
Can I Run AI Locally? Yes.
Kafka Guide to Distributed Messaging
What 81,000 People Want from AI
Open Source ML Frameworks
Awesome AI Guidelines to check out this week
+ more 🚀

LLMs are Databases, Really

Are LLMs databases? Yes. Can we treat them like a Graph Database? Also Yes. What does this look like? It's actually quite interesting: Recently there have been projects that are treating LLMs as indexable write-enabled databases using a SQL language that actually loads the models as graph database structures. There is a query language called LARQL which makes any transformer as a queryable graph-like knowledge store, where internal features can be inspected, traversed, and even edited through a SQL-style interface. A great way to show this is by loading Gemma 3 as the demo target, where we can now query entities, relations, and nearest-neighbor feature clusters from model internals. This also allows for not just inference and path-tracing, but also raises questions on how we our thinking will evolve on LLMs as optimizable systems; this provides quite an exciting glimpse into what could be a large field of research in the making. It is still clearly an early-stage prototype rather than production-ready serving infrastructure, but it is a compelling direction for anyone thinking about controllable model editing, efficient inference, and new abstractions for working with model knowledge.

NVIDIA Optimization for Agents

NVIDIA wants to provide an operating system for multi-model agentic orchestration, and this comes quite quite a few interesting architectural learnings: Once coding agents and multi-agent swarms start making hundreds of sequential calls with shared history, the dominant bottleneck becomes keeping KV-cache warm, models routable, and paths reusable across workers. NVIDIA's core argument is that self-hosted agent stacks need tighter coordination across frontend APIs, routing, and cache lifecycle management. This includes support for modern agent protocols (responses / messages), expose harness-side metadata like priority and expected output length through agent_hints, route by KV overlap instead of round-robin, and treating cache blocks differently depending on whether they are persistent context or ephemeral reasoning/subagent state. The practical takeaway is that if you are running open models for agentic workloads, you likely need to start thinking beyond “throughput per GPU” and toward session-aware inference infrastructure with cache-aware routing, selective retention, multi-tier KV storage, and eventually prefetching.

Can I Run AI Locally? Yes.

CanIRun.ai locally? Yes. This is a great resource for exploring local inference tooling: We've all come to the question of "what model can this machine actually run?"; this resource provides a fast estimate using hardware detection to provide estimated tokens per second based on the models. This of course is not perfect benchmarking, but at least helps us quickly narrow model choices for local copilots, offline workflows, and edge deployments without manually piecing together VRAM charts and quantization assumptions. It's quite good to see that the methodology is quite transparent by making clear that the estimates are heuristics since actual performance still depends on runtimes, drivers, thermal limits, etc etc. Check it out!

Kafka Guide to Distributed Messaging

Event-driven systems are one of the quiet foundations of production ML and modern software; they shape how reliably data moves, how quickly systems react, and how well platforms scale under real-world load. Here's a great guide around distributed messaging: Service-to-service is evolving in various use-cases from synchronous calls into durable event streams with often the backbone provided by Kafka, leveraging topics, partitions, offsets, and consumer groups. The key production takeaway for ML practitioners is that Kafka can be the backbone for decoupled feature pipelines, inference events, retraining triggers, and replayable state changes. This is particularly relevant where partitioning defines scalability and ordering trade-offs (ie delivery guarantees), as well as how consumer-group design defines parallelism and fault recovery. This is one of the best guides that provide an end-to-end overview, which include even the move from ZooKeeper to KRaft, as well as quite a lot of great fundamentals.

What 81,000 People Want from AI

What do 81,000 people want from AI? Anthropic has taken the question and provided a thorough report with great key insights: Anthropic gathered 81,000 open-ended interviews across 159 countries and 70 languages, finding that people mostly want AI to improve professional effectiveness, personal transformation, life management, and time freedom. In their report they outline that 81% say AI had already delivered some value, especially through productivity, cognitive partnership, learning, accessibility, and research synthesis. However the most common concerns were unreliability, economic displacement, loss of autonomy, and cognitive atrophy, which makes it clear that a large percentage are still critical. For ML practitioners, the key takeaway is that successful AI products will be won not just through better models, but through key principles taht are implemented in practice, such as reliability, verification, human-in-the-loop design; basically fundamentals of software design are still relevant today!

Upcoming MLOps Events

The MLOps ecosystem continues to grow at break-neck speeds, making it ever harder for us as practitioners to stay up to date with relevant developments. A fantsatic way to keep on-top of relevant resources is through the great community and events that the MLOps and Production ML ecosystem offers. This is the reason why we have started curating a list of upcoming events in the space, which are outlined below.

Events we are speaking at this year:

eTail Europe - March @ Berlin
World Summit AI Europe - September @ Amsterdam

Other relevant events:

KubeCon Europe - March @ Amsterdam
PyData Berlin - April @ Frankfurt
Databricks Summit - June @ San Francisco
World Developer Congress - July @ Berlin
EuroPython 2026 - July @ Prague
EuroSciPy 2026 - July @ Krakow
AI Infra Summit 2026 - Sept @ California
Code.Talks 2026 - Nov @ Hamburg
MLOps World 2026 - Nov @ Austin

In case you missed our talks, check our recordings below:

The State of AI in 2025 - WeAreDevelopers 2025
Prod Generative AI in 2024 - KubeCon AI Day 2025
The State of AI in 2024 - WeAreDevelopers 2024
Responsible AI Workshop Keynote - NeurIPS 2021
Practical Guide to ML Explainability - PyCon London
ML Monitoring: Outliers, Drift, XAI - PyCon Keynote
Metadata for E2E MLOps - Kubecon NA 2022
ML Performance Evaluation at Scale - KubeCon Eur 2021
Industry Strength LLMs - PyData Global 2022
ML Security Workshop Keynote - NeurIPS 2022

Open Source MLOps Tools

Check out the fast-growing ecosystem of production ML tools & frameworks at the github repository which has reached over 20,000 ⭐ github stars. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. Here's a few featured open source libraries that we maintain:

KAOS - K8s Agent Orchestration Service for managing the KAOS in large-scale distributed agentic systems.
Kompute - Blazing fast, lightweight and mobile phone-enabled GPU compute framework optimized for advanced data processing usecases.
Production ML Tools - A curated list of tools to deploy, monitor and optimize machine learning systems at scale.
AI Policy List - A mature list that maps the ecosystem of artificial intelligence guidelines, principles, codes of ethics, standards, regulation and beyond.
Agentic Systems Tools - A new list that aims to map the emerging ecosystem of agentic systems with tools and frameworks for scaling this domain

Please do support some of our open source projects by sharing, contributing or adding a star ⭐

About us

The Institute for Ethical AI & Machine Learning is a European research centre that carries out world-class research into responsible machine learning.

Check out our website

✉️ Email, 🐦 Twitter, 💼 Linkedin

This email was sent to You received this email because you are registered with The Institute for Ethical AI & Machine Learning's newsletter "The Machine Learning Engineer"

Unsubscribe here