The Sequence Radar #739: Last Week in AI: From Vibes to Verbs: Agent Skills, Haiku 4.5, Veo 3.1, and nanochat
Was this email forwarded to you? Sign up here The Sequence Radar #739: Last Week in AI: From Vibes to Verbs: Agent Skills, Haiku 4.5, Veo 3.1, and nanochatLots of fun developments for practical AI applications.Next Week in The Sequence:A few fun things: we will continue our series about AI interpretability. We will be releasing a long piece about fine-tuning vs. reinforcement learning that you cannot miss and will dive into Anthropic’s new Agent Skills. Subscribe Now to Not Miss Anything:📝 Editorial: Last Week in AI: From Vibes to Verbs: Agent Skills, Haiku 4.5, Veo 3.1, and nanochatThis week in AI was just a lot of fun: the frontier is racing, but the tooling is finally congealing into something you can depend on. Fewer magic tricks, more scaffolding. You can feel the distance compress between an idea, a script, and a shipped product. Anthropic’s Agent Skills shift agents from “one giant brain” to a set of precisely scoped capabilities you can load on demand. Instead of a universal assistant improvising across everything, Claude can snap into a well-defined mode—say, Excel analyst, RFP writer, or procurement agent—each packaged with instructions, tools, and resources. That sounds mundane, but real enterprises run on checklists, templates, and compliance. By turning those artifacts into first-class skills, you get repeatability, auditability, and fewer accidental side quests. In practice this looks like clean interfaces: a skill declares what it can do, which APIs it can call, and how outputs are formatted. This also reduces context bloat: you don’t stuff the model with the whole company; you mount the one binder that matters and detach it when you’re done. Alongside that procedural upgrade, Claude Haiku 4.5 leans into the small-but-capable regime. The appeal is not just latency or price—it’s the idea that most work doesn’t need Olympian IQ, it needs a fast, reliable contributor who shows up instantly and follows the playbook. Haiku 4.5 claims near-Sonnet coding quality at a fraction of the cost with materially lower time-to-first-token. When you pair Haiku with Agent Skills, you start designing systems around time-to-useful: a lightweight model spins up, mounts two or three skills (style guide, spreadsheet ops, vendor database), executes with crisp boundaries, then gets out of the way. This is how you scale to thousands of concurrent, low-variance tasks without melting your budget. On the creative side, Google DeepMind Veo 3.1 nudges video generation from “cool clips” toward directable sequences. The headline is control. You can specify characters, locations, objects, transitions, and iterate toward continuity normally earned in an editor. Audio gets cleaner, motion is more stable, and the model is less surprised by your intentions. The important mental shift is to treat video synthesis as a programmable pipeline, not prompt roulette. The more granular the handles—shot duration, camera intent, scene constraints—the more you can unit-test narrative structure the same way you test code paths. For teams building ads, explainers, or product demos, this moves generative video from whimsical novelty into an iterative craft. Finally, Andrej Karpathy’s “nanochat” is this week’s best educational artifact. It’s an end-to-end ChatGPT-style system distilled to the essentials: tokenizer, pretraining, SFT, RL, eval, inference, and a minimal web UI. The superpower here is line-of-sight: every stage is short enough to read and cheap enough to run, so the path from blank GPU to functional chat agent is hours, not weeks. That lowers the barrier for students and teams alike: clone, run, modify, measure. Want to experiment with a custom reward model? Swap a few lines. Curious about inference quirks? Tweak the sampler and observe. In a field that often hides complexity behind opaque stacks, nanochat is a public service—an opinionated baseline you can reason about and extend. If there’s a theme, it’s specialization with handles: scoped agency that loads the right binder, compact models that cut latency, video systems that expose practical levers, and a reference stack you can actually read. Less spectacle, more engineering. That’s progress. 🔎 AI ResearchDeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web SearchAI Lab: Apple, Johns Hopkins University Summary: DeepMMSearch-R1 is a multimodal LLM that performs on-demand, multi-turn web searches with dynamic query generation and cropped image-based search, trained via SFT followed by online RL. It also introduces the DeepMMSearchVQA dataset and a three-tool pipeline (text search, grounding/cropping, image search) to enable self-reflection and state-of-the-art results on knowledge-intensive VQA. Robot Learning: A TutorialAI Lab: University of Oxford & Hugging Face Summary: This tutorial surveys the shift from classical, model-based control to data-driven robot learning, and walks through RL, behavioral cloning, and emerging generalist, language-conditioned robot policies. It also presents the open-source lerobot stack and LeRobotDataset with practical, ready-to-run examples across the robotics pipeline. Tensor Logic: The Language of AIAI Lab: Pedro Domingos, University of Washington Summary: The paper proposes “tensor logic,” a programming model that unifies neural and symbolic AI by expressing rules as tensor equations (einsum) equivalent to Datalog operations, enabling learning and inference in a single framework. It demonstrates how to implement neural nets, symbolic reasoning, kernel machines, and graphical models, and discusses scaling via Tucker decompositions and GPU-centric execution. Agent Learning via Early ExperienceAI Lab: Meta Superintelligence Labs, FAIR at Meta, The Ohio State University, Summary: The authors introduce “early experience,” a reward-free training paradigm where agents use the consequences of their own exploratory actions as supervision, with two concrete strategies—implicit world modeling and self-reflection. Across eight environments, these methods improve effectiveness and OOD generalization and provide strong initializations for downstream RL. Qwen3Guard Technical ReportAI Lab: Qwen Summary: Qwen3Guard introduces multilingual safety guardrail models in two variants—Generative (instruction-following tri-class judgments: safe/controversial/unsafe) and Stream (token-level, real-time moderation for streaming)—released in 0.6B/4B/8B sizes with support for 119 languages. It reports state-of-the-art prompt/response safety classification across English, Chinese, and multilingual benchmarks and is released under Apache 2.0. 🤖 AI Tech ReleasesClaude Haiku 4.5Anthropic released Claude Haiku 4.5, its latest small model that showcases performance comparable to Sonnet 4. Veo 3.1Google DeepMind released the new version of its marquee video generation model. Qwen3-VLAlibaba Qwen released Qwen3-VL 4B and 8B, two small models optimized for reasoning and instruction following. nanochatAndrej Karpathy released nanochat, an open source training and inference pipeline similar to ChatGPT. Agent SkillsAnthropic released Agent Skills to specialize Claude tasks with script, resources and instructions. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Similar newsletters
There are other similar shared emails that you might be interested in: