Anthropic Computer-Use Explained

Anthropic "computer use" is a feature that allows manipulation of a computer desktop.

It involves five steps. Let's illustrate using a simple request.


The five steps

1) Preparing a list of tools that can be used by Anthropic. Such as:

  • A tool that takes a screenshot of your desktop and scales it accordingly.
  • A tool that can control your mouse (move, click, drag).
  • A tool that can run bash commands.

2) Provide Anthropic with a prompt and a list of tools. For example:

  • Can you recognize what OS is this?

3) Anthropic understands the request and tries to pick a tool it needs to complete this.

  • I'll take a screenshot of the screen
  • Tool use: computer
  • Action: screenshot

4) Since the screenshot tool is selected, the client should now take a screenshot of the desktop and submit it as an image vision.

5) Anthropic receives what it asks for and tries to produce the result based on the original prompt.

If the input is enough, you should see the result. Otherwise, Anthropic might request additional inputs/tools by repeating steps 3 and 4. In our case, no further input is required, and we got the following response.

Based on the screenshot, this is macOS (Mac OS X). This can be clearly identified by several characteristics:

1. The Apple logo menu in the top-left corner
2. The characteristic macOS menu bar at the top with items like Finder, File, Edit, View, etc.
3. The familiar macOS traffic light window controls (red, yellow, green buttons) in the active Finder window
4. The Dock at the bottom of the screen with macOS-style application icons
5. The overall UI design and styling which is distinctively Apple/macOS

The interface appears to be a relatively recent version of macOS, given its modern design elements and styling. The computer also has typical macOS applications and features visible like Finder, AirDrop, and iCloud integration.

More Demo

 


AI Summary
gpt-4o-2024-08-06 2024-10-27 17:52:35
The blog post explains the Anthropic "computer use" feature, which allows manipulation of a computer desktop through a series of five steps. This process involves preparing and using tools like screenshot capture and mouse control, and providing prompts to generate specific desktop outputs, such as identifying the operating system. Anthropic uses these inputs to process and deliver results based on the original request.
Chrome On-device AI 2024-11-09 06:06:40

Share Article