Agent Evaluation: A Detailed Guide (53 minute read)
LLM evaluation has shifted from static benchmarks to more dynamic, real-world agent systems. Effective evaluation now requires realistic harnesses to test agents over long time horizons in complex environments. This is crucial as agents increasingly adopt high-stakes roles, such as coding and medicine, necessitating rigorous performance measurement and outcome-oriented evaluation.
|
|
Generalization Dynamics of LM Pre-training (17 minute read)
Language models (LMs) undergo unpredictable switches between parroting patterns and exhibiting adaptive intelligence during pre-training, a phenomenon termed "mode-hopping." This behavior cannot be corrected by standard optimization techniques and presents as a competition for model capacity, influenced by data from each training window. Researchers propose leveraging these dynamics to better select pre-training checkpoints, curate data for stable generalization, and evaluate metrics predicting LM behavior.
|
Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation (9 minute read)
NVIDIA Cosmos Predict 2.5 generates videos from text, adapting for specific tasks like robot manipulation using LoRA/DoRA to inject trainable adapters, minimizing memory use. These methods offer efficient fine-tuning on a single GPU, preventing catastrophic forgetting while generating synthetic trajectories quickly. Fine-tuning with LoRA and DoRA significantly improves video quality, with LoRA more suited for tight memory conditions and DoRA preferred for addressing training instability.
|
HRM-Text (GitHub Repo)
HRM-Text is a 1B text generation model based on the HRM architecture. It can be trained with 130-600x less compute and 150-900x less data than foundation models, making foundation model pretraining accessible. The 0.6B parameter version of the model can be trained on 8 H100s on a single node in about 50 hours for around $800. The 1B parameter model can be trained on 16 H100s on two nodes in about 46 hours for around $1,472.
|
|
Vera Arrives: NVIDIA's First CPU Built for Agents Lands at Top AI Labs (4 minute read)
The first Nvidia Vera CPUs recently arrived at Anthropic, OpenAI, SpaceXAI, and Oracle. They were hand-delivered by Nvidia Vice President of Hyperscale and High-Performance Computing, Ian Buck. Vera features 88 custom Nvidia-designed Olympus cores, 1.2 TB/s of memory bandwidth, and 50% faster per-core performance. It is the host processor for Vera Rubin NVL72, which pairs via second-generation Nvidia NVLink-C2C to a pair of Rubin GPUs.
|
|
|
Love TLDR? Tell your friends and get rewards!
|
|
Share your referral link below with friends to get free TLDR swag!
|
|
|
|
Track your referrals here.
|
|
|
|