Prompt engineering has become a popular job in the AI industry over the last year, and Anthropic now appears to be developing tools to at least partially automate it.
Anthropik released several new features on Tuesday to help developers create more useful applications using the company’s language model, Claude, according to a company blog post: Developers can now generate, test and evaluate prompts using Claude 3.5 Sonnets, which uses prompt engineering techniques to create better inputs and improve Claude’s responses to specialized tasks.
Language models are pretty forgiving when you ask them to perform some tasks, but sometimes a small change in the wording of your prompt can dramatically improve your results. Typically, you’d have to come up with that wording yourself or hire a prompt engineer to do it for you, but this new feature gives you immediate feedback that helps you find areas for improvement.
These features are part of the new Anthropic Console[評価]It’s housed in a tab. Console is the startup’s developer test kitchen, created to attract companies that want to develop products with Claude. One feature announced in May is Anthropic’s built-in prompt generator, which takes a short description of a task and uses Anthropic’s proprietary prompt engineering techniques to create a longer, fuller prompt. While Anthropic’s tool can’t completely replace prompt engineers, the company says it can help new users and save time for experienced prompt engineers.
Evaluate allows developers to test how effective their AI application’s prompts are in different scenarios. Developers can upload real-world examples to a test suite or ask Claude to generate an array of AI-generated test cases. Developers can then compare the effectiveness of different prompts side-by-side and rate the sample answers on a 5-point scale.
The prompts entered generated data to find good and bad responses. Image credit: Anthropic
In the example from Anthropic’s blog post, a developer identified that their application was giving answers that were too short across multiple test cases. The developer was able to tweak the prompt line to make the answer longer and apply it to all test cases simultaneously. This can save a lot of time and effort, especially for developers with little or no prompt engineering experience.
In an interview at Google Cloud Next earlier this year, Dario Amodei, CEO and co-founder of Anthropic, said that prompt engineering is one of the most important components to widespread enterprise adoption of generative AI: “It sounds simple, but oftentimes it takes just 30 minutes of working with a prompt engineer to get an application working that didn’t work before,” Amodei said.