New Oct 16, 2024

Llamafile v0.8.14: a new UI, performance gains, and more

Browsers, engines, etc. All from Mozilla Hacks – the Web developer blog View Llamafile v0.8.14: a new UI, performance gains, and more on hacks.mozilla.org

We’ve just released Llamafile 0.8.14, the latest version of our popular open source AI tool. A Mozilla Builders project, Llamafile turns model weights into fast, convenient executables that run on most computers, making it easy for anyone to get the most out of open LLMs using the hardware they already have.

New chat interface

The key feature of this new release is our colorful new command line chat interface. When you launch a Llamafile we now automatically open this new chat UI for you, right there in the terminal. This new interface is fast, easy to use, and an all around simpler experience than the Web-based interface we previously launched by default. (That interface, which our project inherits from the upstream llama.cpp project, is still available and supports a range of features, including image uploads. Simply point your browser at port 8080 on localhost).

llamafile

Other recent improvements

This new chat UI is just the tip of the iceberg. In the months since our last blog post here, lead developer Justine Tunney has been busy shipping a slew of new releases, each of which have moved the project forward in important ways. Here are just a few of the highlights:

Llamafiler: We’re building our own clean sheet OpenAI-compatible API server, called Llamafiler. This new server will be more reliable, stable, and most of all faster than the one it replaces. We’ve already shipped the embeddings endpoint, which runs three times as fast as the one in llama.cpp. Justine is currently working on the completions endpoint, at which point Llamafiler will become the default API server for Llamafile.

Performance improvements: With the help of open source contributors like k-quant inventor @Kawrakow Llamafile has enjoyed a series of dramatic speed boosts over the last few months. In particular, pre-fill (prompt evaluation) speed has improved dramatically on a variety of architectures:

When combined with the new high-speed embedding server described above, Llamafile has become one of the fastest ways to run complex local AI applications that use methods like retrieval augmented generation (RAG).

Support for powerful new models: Llamafile continues to keep pace with progress in open LLMs, adding support for dozens of new models and architectures, ranging in size from 405 billion parameters all the way down to 1 billion. Here are just a few of the new Llamafiles available for download on Hugging Face:

Whisperfile, speech-to-text in a single file: Thanks to contributions from community member @cjpais, we’ve created Whisperfile, which does for whisper.cpp what Llamafile did for llama.cpp: that is, turns it into a multi-platform executable that runs nearly everywhere. Whisperfile thus makes it easy to use OpenAI’s Whisper technology to efficiently convert speech into text, no matter which kind of hardware you have.

Get involved

Our goal is for Llamafile to become a rock-solid foundation for building sophisticated locally-running AI applications. Justine’s work on the new Llamafiler server is a big part of that equation, but so is the ongoing work of supporting new models and optimizing inference performance for as many users as possible. We’re proud and grateful that some of the project’s biggest breakthroughs in these areas, and others, have come from the community, with contributors like @Kawrakow, @cjpais, @mofosyne, and @Djip007 routinely leaving their mark.

We invite you to join them, and us. We welcome issues and PRs in our GitHub repo. And we welcome you to become a member of Mozilla’s AI Discord server, which has a dedicated channel just for Llamafile where you can get direct access to the project team. Hope to see you there!

 

The post Llamafile v0.8.14: a new UI, performance gains, and more appeared first on Mozilla Hacks - the Web developer blog.

Scroll to top