| |
China's Alibaba continues to challenge the AI status quo, now with the release of QWEN-3.6-plus which brings some really interesting performance innovations: Qwen3.6-Plus is Alibaba's latest hosted frontier model for real-world agent workflows with the biggest gains in agentic coding + tool use + multimodal reasoning. For production ML practitioners, the practical takeaway is that Qwen is optimizing more for end-to-end task completion across repository-level coding, terminal operations, web/UI generation, document and video understanding, and multimodal agent loops. The model exposes a 1M-token context window which supports OpenAI- and Anthropic-compatible APIs via Model Studio and adds a preserve_thinking option whcih improves multi-step agent consistency as well as reducing redundant reasoning. From the benchmarks we can see that (allegedly) it is especially strong on coding-agent, planning, multilingual, OCR/document, and visual grounding tasks; so far performance remains mixed versus top competitors on some general reasoning and long-context evaluations. However overall this seems quite interesting to see the aggressive and close competition on a space that we were considering unbeatable only a year ago. |
|
|
|---|
|
| | Google DeepMind redefines AI efficiency with extreme compression with TurboQuant: TurboQuant removes much of the usual metadata overhead in vector quantization, letting KV caches and vector indices run at much lower bitwidths without the normal quality penalty. The method that TQ introduces combines compression with a 1-bit Quantized residual correction step which preserves attention accuracy and keeps memory overhead near zero (which is great). In Google's reported experiments on long-context benchmarks and vector search, TurboQuant compressed KV cache representations down to 3 bits without training or fine-tuning, delivered at least 6x KV memory reduction on needle-in-a-haystack tasks, and showed up to 8x faster attention-logit computation at 4-bit versus 32-bit keys on H100s. For production ML practitioners, the takeaway is that these type of optimization opportunities can drive really clear opportunity for teams with lower memory bandwidth pressure, cheaper long-context serving, and faster high-dimensional retrieval with minimal accuracy tradeoff. |
|
|
|---|
|
MIT FlowMatch & Diffusion Models
Super excited to see a brand new 2026 course from MIT on the same models that power OpenAI, Anthropic and other LLM giants: MIT is providing their 2026 course for FREE on 6.S184 diffusion and flow matching models for practitioners that want to get hands-on experience. This course is a comprehensive introduction and covers the math behind modern generative models across ODEs, SDEs, the Fokker–Planck equation, score matching, classifier-free guidance, latent diffusion, and discrete diffusion. This course also includes comprehensive hands on labs that walk learners through building key components and ultimately a latent diffusion model from scratch. For production ML practitioners, this brings really a lot of value, not just about shipping a model, but more about gaining the conceptual and hands-on foundation needed to understand how today’s image and video generators work. |
|
|
|---|
|
NGROK on Quantization from Scratch Quantization is now one of the most important methods to drive high performance efficiencies in real-world AI at scale, and this is a great deep dive from scratch: For production ML practitioners quantization matters as model size is dominated by weights which more are non-relevant (aka zero-valued); with quatnization we can make LLMs surprisingly tolerant to storing parameters in lower-precision formats or compact integer representations instead of full-precision floats resulting in major savings. The key takeaway on Qwen3.5 9B, is that 8-bit quantization can preserve quality almost entirely even with 8-bit quantization; with 4-bit quantization there is modest degradation (2-bit quantization starts to collapse). This is a great post from ngrok, do check it out for the deep dive into ML quantiaztion. |
|
|
|---|
|
| |
Stanford is releasing their new 2026 course on Transformer models for FREE! CS25 Transformers United V6 has fantastic and updated content for practitioners to dive into the field: This Stanford course has a broad coverage on the evolving frontier beyond vanilla transformer models and into the relevant architectures powering the field today. Rather than teaching one deployment recipe, this course curates talks from leading researchers and practitioners across core model architectures and adjacent paradigms, which cover transformers, JEPA, state space models, and real-world perspectives from companies like Hugging Face, Anthropic, DeepMind, and Modal. This may end up being one of the most relevant courses on the topic once all the lectures are updated, make sure to keep an eye as the course material becomes available! |
|
|
|---|
|
Upcoming MLOps Events The MLOps ecosystem continues to grow at break-neck speeds, making it ever harder for us as practitioners to stay up to date with relevant developments. A fantsatic way to keep on-top of relevant resources is through the great community and events that the MLOps and Production ML ecosystem offers. This is the reason why we have started curating a list of upcoming events in the space, which are outlined below.
Events we are speaking at this year:
Other relevant events:
In case you missed our talks, check our recordings below:
|
|
|---|
| | |
Check out the fast-growing ecosystem of production ML tools & frameworks at the github repository which has reached over 20,000 ⭐ github stars. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. Here's a few featured open source libraries that we maintain: - KAOS - K8s Agent Orchestration Service for managing the KAOS in large-scale distributed agentic systems.
- Kompute - Blazing fast, lightweight and mobile phone-enabled GPU compute framework optimized for advanced data processing usecases.
- Production ML Tools - A curated list of tools to deploy, monitor and optimize machine learning systems at scale.
- AI Policy List - A mature list that maps the ecosystem of artificial intelligence guidelines, principles, codes of ethics, standards, regulation and beyond.
- Agentic Systems Tools - A new list that aims to map the emerging ecosystem of agentic systems with tools and frameworks for scaling this domain
Please do support some of our open source projects by sharing, contributing or adding a star ⭐ |
|
|---|
| | |
| | | | The Institute for Ethical AI & Machine Learning is a European research centre that carries out world-class research into responsible machine learning. | | | | |
|
|
|---|
|
|
This email was sent to You received this email because you are registered with The Institute for Ethical AI & Machine Learning's newsletter "The Machine Learning Engineer" |
| | | | |
|
|
|---|
|
© 2023 The Institute for Ethical AI & Machine Learning |
|
|---|
|
|
|