subscribe to be an AI insider

Computer Use in Anthropic Claude API

Definition and Purpose

The Computer Use API is not a standalone API but a feature within Anthropic’s Messages API, specifically designed for Claude 3.5 Sonnet and Claude 3.7 Sonnet models, which are currently in beta as of recent updates. It allows the AI to interact with a computer’s desktop environment, emulating human interactions to automate tasks. This capability is particularly valuable for developers looking to build applications that can handle complex, multi-step processes on a computer, such as data entry, file management, or software testing.

For instance, a developer might instruct Claude to “save a picture of a cat to my desktop,” and the AI would use the Computer Use feature to navigate the file system, open a browser, download the image, and save it, all through simulated user actions. This functionality unlocks a wide range of applications, from automating repetitive office tasks to enhancing software development workflows.

Technical Implementation

To utilize the Computer Use feature, developers must establish a sandboxed computing environment, which includes:

  • A virtual display (using X11 with Xvfb on Linux).
  • A lightweight desktop environment (e.g., Mutter and Tint2).
  • Pre-installed applications like Firefox, LibreOffice, and file managers.
  • Tool implementations for actions such as mouse and keyboard control, as well as screenshot capture.
  • An agent loop that facilitates communication between Claude and the environment.

The process involves the following steps:

  1. Provide Claude with computer use tools and a user prompt via the Messages API, such as “Save a picture of a cat to my desktop.”
  2. Claude assesses the prompt and, if necessary, constructs a tool use request, indicated by a stop_reason of tool_use in the API response.
  3. Developers extract the tool input, evaluate it on a virtual machine or container, and return results via a tool_result content block.
  4. The agent loop continues until the task is complete or reaches a maximum of 10 iterations by default.

Anthropic provides a reference implementation on GitHub (Anthropic’s Computer Use Demo), which includes a Docker-based setup for quick testing. This demo, updated for Claude 3.7 Sonnet, allows developers to interact with the feature via VNC or a web interface, offering a practical starting point.

Recent Updates and Tool Enhancements

As of February 24, 2025, Anthropic released updated versions of tools, decoupling text edit and bash tools from computer use, and introducing new command options for the computer use tool. These updates include commands like “hold_key,” “left_mouse_down,” “left_mouse_up,” “scroll,” “triple_click,” and “wait,” requiring the “computer-use-2025-01-24” anthropic-beta header in API requests. This evolution enhances the feature’s flexibility, particularly for tasks involving spreadsheets and scrolling, with improvements in reliability noted for Claude 3.7 Sonnet.

The token costs and system prompt tokens for these tools vary by model, as shown in the following table:

ModelTool ChoiceSystem Prompt Tokens
Claude 3.5 Sonnet (new)auto, any, tool466, 499
Claude 3.7 Sonnetauto, any, tool466, 499
ToolVersionAdditional Input Tokens
computer20241022 (3.5)683
computer20250124 (3.7)735
text_editor20241022 (3.5)700
text_editor20250124 (3.7)700
bash20241022 (3.5)245
bash20250124 (3.7)245

Pricing details are available in Anthropic’s tool use documentation (Anthropic’s Tool Use Pricing).

Use Cases and Examples

Several companies, including Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company, have explored this feature for tasks requiring dozens or hundreds of steps. For example, Replit uses Claude 3.5 Sonnet’s computer use capabilities to develop an autonomous verifier for their Replit Agent product, evaluating apps during development. Canva explores its potential for supporting design and editing processes, highlighting its versatility in creative and technical workflows.

Risk Mitigation and Safety Considerations

Given the nature of controlling a computer, the Computer Use API poses unique risks, particularly around prompt injection and unauthorized access. Anthropic recommends several mitigation strategies:

  • Operate in a dedicated virtual machine or container with minimal privileges to prevent system attacks.
  • Avoid giving the API access to sensitive accounts or data, such as login credentials, which should be provided in <robot_credentials> tags only when necessary, with careful review for prompt injection risks.
  • Limit internet access to allowlist domains to prevent unintended interactions.
  • Require human confirmation for decisions with real-world consequences, such as financial transactions or cookie acceptance.
  • Use automatic classifiers to flag potential prompt injections, with an opt-out option via Anthropic Support.

These measures ensure that the feature remains safe for use, especially in environments where security is paramount. The documentation also notes that latency may be slow for human-AI interactions, making it more suitable for background tasks rather than real-time user assistance.

Prompting Tips and Limitations

To maximize effectiveness, developers should specify simple, well-defined tasks and prompt for screenshots after each step to evaluate outcomes. For instance, prompts like “After each step, take a screenshot and carefully evaluate if you have achieved the right outcome. Explicitly show your thinking: ‘I have evaluated step X…’” can improve accuracy. Keyboard shortcuts and example screenshots are recommended for repeatable tasks, and login credentials should be handled with caution, following guidelines at Anthropic’s Prompt Injection Mitigation.

Limitations include potential hallucinations in computer vision and tool selection, though improved with Claude 3.7 Sonnet’s thinking capability, enabled with “thinking”: {“type”: “enabled”, “budget_tokens”: 1024}. Scrolling reliability has been enhanced, and spreadsheet interactions improved with new commands, but account creation and content generation on social platforms remain limited. Vulnerabilities like prompt injection persist, necessitating trusted environments and oversight.

Conclusion

Anthropic’s Computer Use API offers a powerful tool for AI-driven automation, with ongoing updates enhancing its capabilities and safety. As of March 12, 2025, it remains in beta, with active development and adoption across various industries. Developers are encouraged to explore its potential while adhering to recommended safety practices, leveraging the provided documentation and resources for implementation.

Compare OpenAI and Anthropic for Agentic Tasks

One response to “Computer Use in Anthropic Claude API”

  1. Compare OpenAI and Anthropic for Agentic Tasks – AIAgentOut Avatar
    Compare OpenAI and Anthropic for Agentic Tasks – AIAgentOut

    […] Computer Use in Anthropic Claude APIIs OpenAI Deepresearch an AI Agent Product? […]

    Like

Leave a reply to Compare OpenAI and Anthropic for Agentic Tasks – AIAgentOut Cancel reply