Making DL Go Brr w First Principles A classic, "Deep Learning Go Brrrr From First Principles" which still brings super relevant advice to AI teams today: Instead of throwing random PyTorch tricks at slow models, it's important to have a clean mental model for diagnosing performance across: 1) compute-bound limits, 2) memory-bandwidth-bound limits, and 3) overhead-bound limits. Each of these brings a different optimization path. Compute-bound workloads need better Tensor Core usage or more hardware. Bandwidth-bound workloads benefit most from operator fusion and avoiding unnecessary global memory reads/writes. Overhead-bound workloads usually need tracing, compilation, CUDA Graphs, or reducing Python/framework dispatch costs. For us production ML practitioners it's a good reminder that GPU efficiency is not just about bigger accelerators, but about understanding where time is actually spent. |
|
|
|---|
|
| |
Benedict Evans has dropped the 2026 "AI Eats the World" deck, and here's the main highlights: GenAI so far = Huge capex first, unclear value capture, lots of hype, and only later the boring-but-transformational deployment layer. The current model race is still going, but models are converging, infrastructure is getting brutally expensive, and the real leverage is likely to come from teams that turn LLMs into reliable workflow automation. There is still a lot of value to come from new aggregation/discovery layers, and domain-specific products that change how work is done. The most important takeaway is that AI adoption will probably look slow and underwhelming inside enterprises until it suddenly becomes standard / expected. |
|
|
|---|
|
| | Google DeepMind has just released Gemini 3.5 Flash! This is quite interesting to see as a faster agentic execution model across coding, tool use, multimodal understanding and long-horizon workflows. The interesting bit for production ML practitioners is that Google is positioning Flash as the high-throughput model for real-world agents which claims strong results. For ML teams, the takeaway is about the infrastructure pattern where faster frontier models plus agent harnesses are becoming the default winning advantage. |
|
|
|---|
|
(Sk)Forecast Foundation Models Skforecast is making time-series foundation models much easier to test in real production forecasting workflows: it's wiring Chronos, TimesFM, Moirai, and TabICL through their new release! Really great to see Skforecast leading the charge on making foundation models accessible with various new classes (e.g. FoundationModel + ForecasterFoundation) which abstract foundation models on sklearn-style interfaces. Zero-shot forecasting is now becoming something you can benchmark inside existing forecasting pipelines rather than treat as a separate research experiment (and works surprisingly well). There are still challenges in context length and feature parity, as longer windows help models see seasonality and regime patterns, but they also increase inference cost and latency, so teams still need proper backtesting rather than assuming bigger context is better. The examples are also refreshingly honest about production caveats - definitely worth checking out. |
|
|
|---|
|
| |
NVIDIA just dropped a high efficiency open-source stack for high-resolution image, video, and world-model generation! This seems like an exciting addition for production ML teams because it focuses on the deployment constraints that usually decide whether generative media systems are practical on latency, VRAM, training cost, quantization, and serving integration. This seems to be positioned by NVIDIA as a complete training and inference codebase with techniques such as linear attention, 32× DC-AE latent compression, Flow-DPM-Solver sampling, few-step sCM distillation, and block causal linear attention for long video generation. |
|
|
|---|
|
Upcoming MLOps Events The MLOps ecosystem continues to grow at break-neck speeds, making it ever harder for us as practitioners to stay up to date with relevant developments. A fantsatic way to keep on-top of relevant resources is through the great community and events that the MLOps and Production ML ecosystem offers. This is the reason why we have started curating a list of upcoming events in the space, which are outlined below.
Events we are speaking at this year:
Other relevant events:
In case you missed our talks, check our recordings below:
|
|
|---|
| | |
Check out the fast-growing ecosystem of production ML tools & frameworks at the github repository which has reached over 20,000 ⭐ github stars. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. Here's a few featured open source libraries that we maintain: - SARC - Provides wrappers for popular agentic frameworks to enable guardrails and constraints that are enforced through the flow.
- KAOS - K8s Agent Orchestration Service for managing the KAOS in large-scale distributed agentic systems.
- Kompute - Blazing fast, lightweight and mobile phone-enabled GPU compute framework optimized for advanced data processing usecases.
- Production ML Tools - A curated list of tools to deploy, monitor and optimize machine learning systems at scale.
- AI Policy List - A mature list that maps the ecosystem of artificial intelligence guidelines, principles, codes of ethics, standards, regulation and beyond.
- Agentic Systems Tools - A new list that aims to map the emerging ecosystem of agentic systems with tools and frameworks for scaling this domain
Please do support some of our open source projects by sharing, contributing or adding a star ⭐ |
|
|---|
| | |
| | | | The Institute for Ethical AI & Machine Learning is a European research centre that carries out world-class research into responsible machine learning. | | | | |
|
|
|---|
|
|
This email was sent to You received this email because you are registered with The Institute for Ethical AI & Machine Learning's newsletter "The Machine Learning Engineer" |
| | | | |
|
|
|---|
|
© 2023 The Institute for Ethical AI & Machine Learning |
|
|---|
|
|
|