Celebrating an important step forward for open source AI

TL;DR: Mozilla is excited about today’s new definition of open source AI, and we endorse it as an important step forward.

This past year has been marked by more and more people recognizing the societal benefits of open source AI. In October, a large coalition of people signed onto our statement emphasizing how openness and transparency are critical ingredients to safety and security in AI. In February, Mozilla and the Columbia Institute of Global Politics convened AI experts, who emphasized how openness in AI could help advance key societal goals. Policymakers have also been embracing open source AI. The U.S. National Telecommunications and Information Administration (NTIA) recently issued a seminal report embracing openness in AI. Even companies like Google, Microsoft, Apple, and Meta are beginning to open certain aspects of their AI systems.

The growing focus on open source AI makes it all the more important that we establish a shared understanding of what open source AI is. A definition should outline what must be shared and under what terms or conditions. Without this clarity, we risk a fragmented approach, where companies label their products as “open source” even when they aren’t, where civil society doesn’t have access to the AI components they need for testing and accountability, and policymakers create regulations that fail to address the complexities of the issue.

The Open Source Initiative (OSI) has recently released a new draft definition of open source AI, marking a critical juncture in the evolution of the internet. This moment comes after two years of conversations, debates, engagements, and late-night conversations across the technical and open source communities. It is critical not just for redefining what “open source” means in the context of AI; it’s about shaping the future of the technology and its impact on society.

The original Open Source Definition, introduced by the OSI in 1998, was more than just a set of guidelines; it was a manifesto for a new way of building software. This definition laid the foundation for open systems that have since become the backbone of the modern internet. From Linux to Apache, open source projects have driven innovation, collaboration and competition, enabling the internet to grow into a diverse and dynamic ecosystem. By ensuring that software could be freely used, modified, and shared, the original open source movement helped to expand access to technology, breaking down barriers to entry and fostering a culture of innovation and transparency, while making software safer and less vulnerable to cyberattacks.

This is a significant step toward bringing clarity and rigor to the open source AI discussion. It introduces a binary definition of “open source,” akin to the existing definition. While this is just one of several approaches to defining open source AI, it offers precision for developers, advocates and regulators who benefit from clear definitions in various working contexts. Specifically, it outlines that open source AI revolves around the ability to freely use, study, modify and share an AI system. And it also promotes the importance of access to key components needed to recreate substantially equivalent AI systems, like information on data used for training, the source code for AI development and the AI model itself.

And, this definition also offers an initial attempt to wrestle with the complex issue of whether and how training data for AI models should be shared as part of open source AI. The definition acknowledges that sharing full training datasets can be challenging in practice, and, therefore, avoids disqualifying a significant amount of otherwise open source AI development from being considered “open source.” We are working to change this state of play by making open datasets a more commonplace part of the AI ecosystem. Mozilla and Eleuther AI recently brought together experts to outline best practices for open datasets to support AI training, and we intend to publish a paper soon that promotes norms that support AI training data being more widely available.

We acknowledge that some may disagree with aspects of OSI’s definition, such as its treatment of training data, and that the definition will need refinement over time. However, we believe that the OSI’s community-driven process — which involved over a year of stakeholder engagement — has established a crucial reference point for discussions on open source AI. For instance, this definition will become a valuable resource to combat the widespread practice of “openwashing” that is becoming quite rampant, where non-open models (or even open-ish models like Meta’s Llama 3) are promoted as leading “open source” options without contributing to the commons. Researchers have shown that “the consequences of open-washing are considerable” and affect innovation, research and the public understanding of AI.

At its core, this effort embodies the open source community at its best — engaging in open discussions, addressing differences, acknowledging shortcomings and refining this definition together, to build something better. It effectively incorporates many key aspects of openness that the open source community has been grappling with, such as going beyond just considering openness in model weights and including broader model components, documentation, and licensing approaches as outlined in the Columbia Convening. In contrast, the closed source ecosystem operates in secrecy, with limited access and behind-the-scenes deals where large tech companies exchange compute power and talent. We prefer our sometimes imperfect but consistently transparent approach any day.

We, and many others, are eager to continue collaborating with OSI and the broader open source community to bring greater clarity to the open source AI discussion and continue unlocking the potential of open source AI for the benefit of society.

Get Firefox

Get the browser that protects what’s important

The post Celebrating an important step forward for open source AI appeared first on The Mozilla Blog.