Fastly releases global cloud AI accelerator to help developers reduce costs and boost performance
Global edge cloud platform Fastly Inc. today announced the launch of its artificial intelligence accelerator, a solution that will help developers build AI apps with lower latency to users, higher performance and reduced costs.
The AI Accelerator works by reducing application programming interface calls and bills using intelligent, semantic caching. It sits between the developer’s AI apps and a large language model to catch prompts to reduce the number of requests to improve performance and speed by caching responses that match previous answers.
“Right now, one of the biggest weaknesses of the experiences people have online with AI is they’re sitting watching a spinner instead of getting answers and results,” Anil Dash, vice president of developer experience at Fastly, told SiliconANGLE in an interview.
AI applications can process hundreds of thousands of prompts through API calls daily — many of them are the same or extremely similar. For example, if a developer builds an application for users to choose from a café, chances are many returning customers will query about cappuccinos and types of coffee directly from the menu.
The menu has a set number of obvious items that many customers will always want over and over, so their queries will be very similar. The result is that the prompts that they send to the AI engine will often get the same response – so there’s no need to send the query to the AI model for every customer asking the same question.
For example, “How much would a hot chocolate with mint and whipped cream cost?” and “How much would you charge for hot chocolate with whipped cream and a splash of mint?” have different syntax but are the same semantically. As a result, these two questions would receive the same answer from the AI model.
Doing this not only speeds up the overall performance of an AI app greatly, because the underlying AI model doesn’t need to process and respond to the second question, but it also saves the developer money on tokens.
Dash said those examples, as well as knowledge bases and chatbots, are exactly where semantic caching gets the biggest performance and cost gains. “In those scenarios, we have seen a two-order magnitude improvement in results,” Dash explained. “It’ll go from a few seconds to milliseconds, and that’s conventional caching behavior.”
However, what surprised the team at Fastly about how the AI Accelerator solution is that it does more than just improve speed and performance, but also the reliability of API calls. The first model supported by Fastly’s AI Accelerator technology will be OpenAI’s ChatGPT, but Dash said Fastly intends to expand support to include additional models.
“OpenAI has an enviable problem,” Dash said. “They’re scaling to handle unprecedented amounts of adoption… and their API is not super reliable. It has a lot of intermittent issues and challenges. That’s to be expected for the level of maturity and scaling being done.”
On the other hand, Fastly has a highly mature platform, by caching answers from the OpenAI model it can increase the reliability of API calls because even if the underlying engine is not responding the user is not affected by latency or network issues. The result not only boosted reliability but a higher customer satisfaction with the AI application built by the developer.
To use the Fastly AI Accelerator, developers only need to update their app to use a new API endpoint. The accelerator then transparently implements its semantic caching for any OpenAI-compatible app.
In addition to the release of the AI Accelerator, Fastly announced a new free account tier that will allow coders to set up a new site, create an app or launch a service in a few minutes. Free accounts also have access to the company’s global cloud-based content delivery network, memory and storage, uncapped redirects, page rules and regular expressions.
Image: Pixabay
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU