| |
Every organisation is trying to build their own data agents; but what does this even mean? This is a great comprehensive paper that tackles this by standardising terminology and challenges in industry: This paper argues that "data agents" need a clearer framework to define the various flavours in which it's appearing before the term becomes meaningless. It proposes a six-level taxonomy from L0 manual workflows to L5 fully autonomous; it is interesting to see that there is an emergence of terminology even at the fringes of agentic tooling. For production ML practitioners, the useful takeaway is that most real systems today are still better thought of as assisted tooling or partial-autonomy operators rather than truly autonomous end-to-end agents: they can help with tuning, cleaning, querying, retrieval, reporting, and multi-step analysis, but they still need human-designed workflows, guardrails, and supervision. Like everything else in the agentic fields, this is a rapidly evolving domain so it's going to be interesting to see how this changes within even this year. |
|
|
|---|
|
Agentic Engineering Quality As of today, many of the PRs that coding agents write would not be merged; this is a really interesting finding from a recent study from METR: This research argues that SWE-bench pass rates materially overstate real-world coding usefulness: across 296 AI-generated PRs reviewed by actual maintainers from scikit-learn, Sphinx, and pytest, roughly half of benchmark-passing patches would still not be merged, even after normalizing against human-written “golden” patches to account for reviewer noise. For production ML practitioners, the key takeaway is the reminder that chasing a metric for optimization may not result in improved performance if it's not directly aligned with the actual objectives. In this case the paper does not claim current agents fundamentally cannot improve with better prompting or iteration, but it does show that benchmark scores alone can mislead teams evaluating coding agents for real software workflows. |
|
|
|---|
|
| | "Python's performance sucks" - Yes, but... that's not the end of the story. Can python be fast? Yes: Performance engineering in Python is not a niche concern, so it's important to be aware of the "optimization ladder" available to us, and which we can activate to gain real performance optimizations. These are some great options that you can use to drive performance gains: 1) Upgrade CPython to gain non-trivial performance gains. 2) Compile your typed python with mypyc can deliver strong wins if your code is already typed. 3) Leverage NumPy/JAX to drive massive performance gains with vectorizable array math. 4) You can use Numba to accelerate particularly for numeric loops over arrays. 5) If none of these work, then you can go low level and rebuild core components with Cython/Rust/etc. The most practically useful insight is that realistic pipelines often bottleneck on Python object creation and parsing, not just raw compute, so the biggest gains can come from changing data representations or moving parsing and hot paths out of Python objects entirely. This is a great article on practical Python performance optimizations; it's often best to go back to the foundations to drive the most value. |
|
|
|---|
|
Hacking McKinsey's Platform AI Agents are making SQL injections ubiquitous again! McKinsey seems to have been the latest victim to agentic vulnerabilities: CodeWall claims its autonomous agent found an unauthenticated SQL injection in McKinsey’s internal AI platform (aka Lilli), and chained it with other weaknesses to gain read/write access to production data, including chat logs, files, user accounts, system prompts, and RAG metadata. This is brutal; a stark reminder that AI platforms inherit classic application security risks while adding new, higher-impact failure modes around prompts, RAG data, and agent workflows. For production ML practitioners, the main takeaway is that securing the model is not enough: the real attack surface spans APIs, document pipelines, vector stores, prompt/config storage, and authorization boundaries. Indeed, let's not make SQL injections ubiquitous again! |
|
|
|---|
|
| |
We all saw the launch of the Macbook Neo last week; but the real question is: how much can it DuckDB? The answer is of course "Yes": The DuckDB team did a small benchmark of the entry-level MacBook Neo as a useful reminder for ML practitioners that local analytics performance is increasingly good enough for nontrivial data work, even on constrained hardware. It was surprised to see that DuckDB with tuned memory limits and out-of-core execution performed pretty competitively despite having the same chip as the iphone (and maybe even less RAM than some models?). Of course if your objective is to do local data computation, then don't get this hardware, but it's more interesting to think about what is going to be unlocked with more and more capabilities on edge processing; DuckDB really can make low-cost laptops viable for occasional large-scale local analysis, prototyping, and client-side data exploration - and as we know, agents love these. |
|
|
|---|
|
Upcoming MLOps Events The MLOps ecosystem continues to grow at break-neck speeds, making it ever harder for us as practitioners to stay up to date with relevant developments. A fantsatic way to keep on-top of relevant resources is through the great community and events that the MLOps and Production ML ecosystem offers. This is the reason why we have started curating a list of upcoming events in the space, which are outlined below.
Events we are speaking at this year:
Other relevant events:
In case you missed our talks, check our recordings below:
|
|
|---|
| | |
Check out the fast-growing ecosystem of production ML tools & frameworks at the github repository which has reached over 20,000 ⭐ github stars. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. Here's a few featured open source libraries that we maintain: - KAOS - K8s Agent Orchestration Service for managing the KAOS in large-scale distributed agentic systems.
- Kompute - Blazing fast, lightweight and mobile phone-enabled GPU compute framework optimized for advanced data processing usecases.
- Production ML Tools - A curated list of tools to deploy, monitor and optimize machine learning systems at scale.
- AI Policy List - A mature list that maps the ecosystem of artificial intelligence guidelines, principles, codes of ethics, standards, regulation and beyond.
- Agentic Systems Tools - A new list that aims to map the emerging ecosystem of agentic systems with tools and frameworks for scaling this domain
Please do support some of our open source projects by sharing, contributing or adding a star ⭐ |
|
|---|
| | |
| | | | The Institute for Ethical AI & Machine Learning is a European research centre that carries out world-class research into responsible machine learning. | | | | |
|
|
|---|
|
|
This email was sent to You received this email because you are registered with The Institute for Ethical AI & Machine Learning's newsletter "The Machine Learning Engineer" |
| | | | |
|
|
|---|
|
© 2023 The Institute for Ethical AI & Machine Learning |
|
|---|
|
|
|