| |
AI coding agents are transforming software faster than we can figure out the best practices, which is why it's super refreshing to see Simon Willison's latest takes: There is consensus that there is the world before November and the world after; coding agents are now good enough to write substantial amounts of production-relevant code. Simon describes a that his workflow leverages red-green TDD, reusable project templates, and executable validation like running servers and probing APIs, which lets him trust agents more while reducing the need to manually review every line. For production ML practitioners, the important lesson is that agents should be treated as powerful but untrusted collaborators whose output quality depends heavily on the scaffolding (and instructions) around them, especially tests, clear constraints, and consistent codebase patterns. We certainly cannot ignore the security risks that are growing quickly when agents have access to sensitive data, external inputs, and ways to exfiltrate information. It will become more and more important to identify ways to properly use sandboxing, levarage synthetic data, and minimizing permissions where not necessary. |
|
|
|---|
|
Recommendation System at BlueSky
Super interesting to see the recommendation system design for BlueSky which it seems is based from the Pinterest architecture with a few tweaks: Bluesky chose a recommender system architecture for their Discover-feed personalization which had constraints on data, costs and ML engineering resources. It's interseting to see that they attempted a two-tower retrieval model but failed to converge, so they fell back to content-based post embeddings using BLIP2 plus topic models and HDBSCAN clusters to build a basic personalization layer. They are now exploring Pinterest’s PinnerSage recsys architecture which promises to be a better candidate-generation approach because it keeps item embeddings fixed, avoids heavy fine-tuning, and models users as multiple interest vectors rather than a single embedding. For production ML practitioners, the core takeaway is that there are interesting recsys architectures that have tradeoffs advantages/disadvantages like PinnerSage which offers an operationally attractive way to capture both long-term and short-term user intent by clustering recent interactions, but it shifts complexity downstream because multi-interest user representations are straightforward for ANN retrieval yet awkward and expensive to use in ranking. |
|
|
|---|
|
| | Sebastian Raschka has dropped another masterclass on owpen-weight LLM architectures, this time sharing key insights from the Jan-Feb 2026 launches: We are still seeing that there is no single architecture which has emerged as dominant, however the field is clearly converging on a shared set of trends as well as best practices. Some of these global trends include better long-context efficiency, lower KV-cache / latency costs, stronger coding / agentic performance, and more practical quality-per-token tradeoffs. Across the models from this year, the main pattern is the rise of increasingly specialized efficiency techniques such as hybrid attention, sliding-window attention, MLA, sparse attention, and multi-token prediction - especially in large MoE systems like GLM-5, Kimi K2.5, Qwen3.5, and Ling 2.5. The key takeaway for production ML teams is that architecture still matters, but less as a search for one universally best design and more as a way to optimize for serving constraints, context length, throughput, and workload fit. |
|
|
|---|
|
Building a MCP Ecosystem at Pinterest The race to make AI agents actually useful in production will be won or lost on platform design, not model hype, and Pinterest's MCP ecosystem learnings shows how much leverage this can have: Pinterest is building an internal platform for production MCP services that feed agent workflows. They established an ecosystem of cloud-hosted MCP servers with a central discovery and governance layer, where they also added a shared deployment path so teams can publish tools without owning all the infrastructure, and integrated these servers into the IDE, chat, and internal AI surfaces engineers already use. The main lesson for production ML practitioners is that MCPs only becomes operationally useful when paired with strong platform controls like registry-based approval, layered authn/authz with user JWTs / service identities, etc. |
|
|
|---|
|
| |
Forecasting is where machine learning stops being interesting and starts being operationally decisive: better predictions directly shape revenue, inventory, risk, capacity, and planning, and even modest accuracy gains can compound into major business impact at scale. Migas 1.5 presents a pragmatic multimodal forecasting architecture for production settings: instead of training a single end-to-end model over text and time series, it keeps a standard time-series foundation model as the forecasting backbone, uses language models to extract structured contextual signals from text, and then applies a learned correction model to adjust the baseline forecast. The reported results across 86 real-world multimodal datasets suggest that this setup materially improves accuracy over unimodal baselines, especially in short-history or regime-shift scenarios where historical values alone are insufficient, with gains of up to 14.2% MAE reduction. For ML practitioners, the most notable contribution is less the benchmark win itself than the systems pattern it implies: event-aware forecasting can be added modularly to existing pipelines, and scarce aligned text-plus-time-series supervision can be bootstrapped with synthetic annotations generated by LLMs, though teams should still validate carefully for leakage, annotation quality, and robustness to noisy context. |
|
|
|---|
|
Upcoming MLOps Events The MLOps ecosystem continues to grow at break-neck speeds, making it ever harder for us as practitioners to stay up to date with relevant developments. A fantsatic way to keep on-top of relevant resources is through the great community and events that the MLOps and Production ML ecosystem offers. This is the reason why we have started curating a list of upcoming events in the space, which are outlined below.
Events we are speaking at this year:
Other relevant events:
In case you missed our talks, check our recordings below:
|
|
|---|
| | |
Check out the fast-growing ecosystem of production ML tools & frameworks at the github repository which has reached over 20,000 ⭐ github stars. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. Here's a few featured open source libraries that we maintain: - KAOS - K8s Agent Orchestration Service for managing the KAOS in large-scale distributed agentic systems.
- Kompute - Blazing fast, lightweight and mobile phone-enabled GPU compute framework optimized for advanced data processing usecases.
- Production ML Tools - A curated list of tools to deploy, monitor and optimize machine learning systems at scale.
- AI Policy List - A mature list that maps the ecosystem of artificial intelligence guidelines, principles, codes of ethics, standards, regulation and beyond.
- Agentic Systems Tools - A new list that aims to map the emerging ecosystem of agentic systems with tools and frameworks for scaling this domain
Please do support some of our open source projects by sharing, contributing or adding a star ⭐ |
|
|---|
| | |
| | | | The Institute for Ethical AI & Machine Learning is a European research centre that carries out world-class research into responsible machine learning. | | | | |
|
|
|---|
|
|
This email was sent to You received this email because you are registered with The Institute for Ethical AI & Machine Learning's newsletter "The Machine Learning Engineer" |
| | | | |
|
|
|---|
|
© 2023 The Institute for Ethical AI & Machine Learning |
|
|---|
|
|
|