Kinara Edge AI processor tackles computation demands of generative AI and transformer-based models

05 January 2024

Kinara, Inc., has launched the Kinara Ara-2 Edge AI processor, powering edge servers and laptops with high performance, cost effective, and energy efficient inference to run applications such as video analytics, Large Language Models (LLMs), and other generative AI models.

The Ara-2 is also suitable for edge applications running traditional AI models and state-of-the-art AI models with transformer-based architectures. With an experientially-enhanced feature set and more than 5-8 times the performance of its first-generation Ara-1 processor, Kinara’s Ara-2 combines real-time responsiveness with high throughput, merging its proven latency-optimised design with balanced on-chip memories and high off-chip bandwidth, to execute very large models with extremely low latency.

LLMs and generative AI in general have become incredibly popular, but most of the associated applications are running on GPUs in data centres and are burdened with high latency, high cost, and questionable privacy. To overcome these limitations and put the computation literally in the hands of the user, Kinara’s Ara-2 simplifies the transition to the edge with its support for the tens of billions of parameters used by these generative AI models. Furthermore, to facilitate the migration seamlessly from expensive GPUs for a wide variety of AI models, the computation engines in Ara-2 and the associated software development kit (SDK), are specifically designed to support high-accuracy quantisation, a dynamically-moderated host runtime, and direct FP32 support.

“With Ara-2 added to our family of processors, we can better provide customers with performance and cost options to meet their requirements. For example, Ara-1 is the right solution for smart cameras as well as edge AI appliances with 2-8 video streams, whereas Ara-2 is strongly suited for handling 16-32+ video streams fed into edge servers, as well as laptops, and even high-end cameras,” said Ravi Annavajjhala, Kinara’s CEO. “The Ara-2 enables better object detection, recognition, and tracking by using its advanced computation

engines to process higher resolution images more quickly and with significantly higher accuracy. And as an example of its capabilities for processing generative AI models, Ara-2 can hit roughly 0.5s per iteration for Stable Diffusion and tens of tokens/sec for LLaMA-7B.

In October 2023, Ampere welcomed Kinara into the AI Platform Alliance with the primary goal of reducing system complexity and promoting better collaboration and openness with AI solutions, and ultimately delivering better total performance and increased power and cost efficiency than GPUs. “Ampere’s Chief Evangelist Sean Varley said, “The performance and feature set of Kinara’s Ara-2 is a step in the right direction to help us bring better AI alternatives to the industry than the GPU-based status quo.”

The Ara-2 also offers secure boot, encrypted memory access, and a secure host interface to enable enterprise AI deployments with even greater security. Kinara also supports Ara-2 with a comprehensive SDK that includes a model compiler and compute-unit scheduler, flexible quantisation options that include the integrated Kinara quantiser, as well as support for pre-quantised PyTorch and TFLite models, a load balancer for multi-chip systems, and a dynamically-moderated host runtime.

Ara-2 is available as a stand-alone device, a USB module, an M.2 module, and a PCIe card featuring multiple Ara-2’s. Kinara will show a live demo with Ara-2 at CES. Contact Kinara to set up your appointment in our hospitality suite at the Venetian Hotel on January 9, 10, or 11, 2024.

More on Kinara here.

Contact Details and Archive...

Kinara, Inc.

Print this page | E-mail this page

Your name :	Required!
Recipient name :	Required!
Recipient e-mail :	Required!Invalid!

Enter the code shown :	Invalid!

Kinara Edge AI processor tackles computation demands of generative AI and transformer-based models

Kinara, Inc., has launched the Kinara Ara-2 Edge AI processor, powering edge servers and laptops with high performance, cost effective, and energy efficient inference to run applications such as video analytics, Large Language Models (LLMs), and other generative AI models.

Contact Details and Archive...

E-mail this page

Related Articles

A lasting partnership: Omniflex and Sasol’s relationship spans half a century

A lasting partnership: Omniflex and Sasol’s relationship spans half a century

Explore the World of Motion with Festo at the PPMA Total Show, Stand F30

Explore the World of Motion with Festo at the PPMA Total Show, Stand F30

Meeting future workload demands: the case for emerging memory technologies (Part 2)

Meeting future workload demands: the case for emerging memory technologies (Part 2)

Meeting future workload demands: the case for emerging memory technologies (Part 1)

Meeting future workload demands: the case for emerging memory technologies (Part 1)

Most Popular

Keeping batteries compliant: Preparing for updates to the EU Battery Regulation

Reflections from Davos 2025: Where are we now?

A lasting partnership: Omniflex and Sasol’s relationship spans half a century

Success cases using Advantech's WISE-iFactory suites

Explore the World of Motion with Festo at the PPMA Total Show, Stand F30

Videos

Success cases using Advantech's WISE-iFactory suites

Robotics help modular battery cell mass production

IoT-enabled interoperable architecture enables plug-and-play innovation

Industrial robotics: A new era of radical change

Follow us on Social Media

Contact Us

Terms & Conditions