Can large language models collaborate without sending a single token of text? a team of researchers from Tsinghua University, Infinigence AI, The Chinese University of Hong Kong, Shanghai AI ...
Optical character recognition has moved from plain text extraction to document intelligence. Modern systems must read scanned and digital PDFs in one pass, preserve layout, detect tables, extract key ...
In this article we will analyze how Google, OpenAI, and Anthropic are productizing ‘agentic’ capabilities across computer-use control, tool/function calling, orchestration, governance, and enterprise ...
AI companies use model specifications to define target behaviors during training and evaluation. Do current specs state the intended behaviors with enough precision, and do frontier models exhibit ...
In this tutorial, we build an advanced computer-use agent from scratch that can reason, plan, and perform virtual actions using a local open-weight model. We create a miniature simulated desktop, ...
Web agents often fail when layouts shift or when tasks require long sequences. WALT targets this failure mode by mining site functionality offline, then exposing it as tools that encapsulate ...
Liquid AI released LFM2-VL-3B, a 3B parameter vision language model for image text to text tasks. It extends the LFM2-VL family beyond the 450M and 1.6B variants. The model targets higher accuracy ...
In this tutorial, we explore LitServe, a lightweight and powerful serving framework that allows us to deploy machine learning models as APIs with minimal effort. We build and test multiple endpoints ...
Computer-use agents have been limited to primitives. They click, they type, they scroll. Long action chains amplify grounding errors and waste steps. Apple Researchers introduce UltraCUA, a foundation ...
In this tutorial, we explore how to build a small yet functional multi-agent system using the uAgents framework. We set up three agents — Directory, Seller, and Buyer — that communicate via ...
Open vocabulary detectors such as OWL ViT v2 are trained on web scale image text pairs. They generalize well on natural images, yet they struggle when categories are subtle, for example chimney versus ...
What problem does Odyssey target? Protein design couples amino acid sequence with 3D structure and with functional context. Many prior models adopt self attention, which mixes information across the ...