The Open Source Initiative is preparing to finalize what they call "The Open Source Aritificial Intelligence Definition" -- a set of rules which A.I. systems must adhere to in order to be considered, officially, "Open Source".
And everything about it is truly peculiar.
From the fact that it considers "No Data" to be "Open Data" (yeah, try to wrap your brain around that little nugget) to the corporate sponsorship (from corporations in the "Closed Source A.I." business)... to the "anti-racist, decolonizing" consultant they hired to put the whole thing together.
Yeah. "Decolonizing". The whole thing is just plain weird.
A Little Background
The Open Source Initiative's cliam to fame is that they are the steward of what is known as the "Open Source Definition" (aka "the OSD"). A set of rules which any software license must adhere to in order to be considred, officially, "Open Source".
The "OSD" began life back in 1997 as the "Debian Free Software Guidelines", written by Bruce Perens. Later, with the help of Eric Raymond, that document morphed into the "Open Source Definition"... at which point the two men created the "Open Source Initiative" to act as a certification body for the OSD.
Fun Historical Tidbit: The Open Source Initiative likes to tell a long-debunked story about the creation of the term "Open Source" which they know is historically incorrect. That little tidbit isn't critical to what we're talking about today... but it's just plain weird, right?
Flash forward to today, and both of the founders -- Perens and Raymond -- have been forced out or banned from the Open Source Initiative entirely. Now the organization, free from the influence of the founders, is looking to expand into the newly exciting field of "Artificial Intelligence".
Thus: The creation of "The Open Source A.I. Definition"... or the OSAID.
The Anti-Racist Leadership
To create this new "OSAID", the Open Source Initiative hired Mer Joyce from the consulting agency known as "Do Big Good".
Why, specifically, was Mer Joyce hired to lead the effort to create a brand new "Open Source" definition, specifically focused on Artificial Intelligence?
- Was it her extensive background in Open Source?
- Or her expertise in A.I. related topics?
- Perhaps it was simply her many years of work in software, in general?
Nope. It was none of those things. Because, in fact, Mer Joyce appears to have approximately zero experience in any of those areas.
In fact, the stated reason that Mer Joyce was chosen to create this Open Source definition is, and I quote:
"[Mer Joyce] has worked for over a decade at the intersection of research, policy, innovation and social change."
Her work experience appears to be mostly focused on Leftist political activism and working on Democrat political campaigns.
As for the consulting agancy, Do Big Good, their focus appears to be equally... non-technical. With a focus on "creating an equitable and sustainable world" and "inclusion".
When "Do Big Good" talks about what skils and expertise they bring to a project, they mention things such as:
- Center marginalized and excluded voices.
- Embody anti-racist, feminist, and decolonizing values.
- Practice Cultural humility.
Note: Yes. They wrote "decolonalizing". Which is not a real word. We're going to give them the benefit of the doubt and assume they meant "decolonizing". Spelling errors happen.
Now, how does "Embodying decolonizing values" help to draft a definition of Open Source Artificial Intelligence licensing?
No clue. But, apparently, "decolonizing" and being "anti-racist" is important to the Open Source Definition and software licensing.
You'll note that the only software-related skill this "Do Big Good" company appears to have is that they can "work virtually or in-person". In other words: They know how to use Zoom.
In fact, this consulting firm only gives three examples of client projects they've worked on. And the other two are non-technical policy documents for the government of Washington State.
Why this agency, and this individual, was hired to lead the work on the OSAID is beyond baffling. Just the same, this appears to be part of a larger pattern within Open Source and Big Tech: Hiring non-technical, political activist types to lead highly technical projects. It doesn't usually go well.
The Diverse Working Groups
Considering that the leadership hired to oversee the OSAID's creation is extremely non-technical -- and almost 100% focused on "anti-racist" and "decolonizing" activism -- it's no surprise that one of the first steps taken was to create "working groups" based entirely on skin color and gender identity.
"The next step was the formation of four working groups to initially analyze four different AI systems and their components. To achieve better representation, special attention was given to diversity, equity and inclusion. Over 50% of the working group participants are people of color, 30% are black, 75% were born outside the US, and 25% are women, trans or nonbinary."
What does having "25% of the people being Trans or nonbinary" have to do with creating a rule-set for software licensing?
Your guess is as good as mine.
But, from the very start of the OSAID's drafting, the focus was not on "creating the best Open Source AI Definition possible"... it was on, and I quote, "diversity, equity and inclusion".
The best and brightest? Not important. Meritocracy? Thrown out the window.
Implement highly racist "skin color quotas" in the name of "DEI"? You bet! Lots of that!
"No Data" = "Open Data"
With that in mind, perhaps it is no surprise that the OSAID is turning out... rather bizarre.
Case in point: The OSAID declares that the complete absence of the data used to train an A.I. system... does, in fact, qualify as "Open". No data... is considered... open data.
If that sounds a bit weird to you, you're not alone.
Let's back up for a moment to give a higher level understanding of the components of an A.I. system:
- The Source Code
- The Training Data
- The Model Parameters
If you have access to all three of those items, you can re-create an A.I. system.
Now, we already have the OSD (the Open Source Definition) which covers the source code part. Which means the whole purpose of having the OSAID (the Open Source AI Definition) is to cover the other two components: The Training Data and the Model Parameters.
Without an exact copy of the Training Data used in an A.I. system, it becomes impossible to re-create that A.I. system. It's simply how the current generation of A.I. works.
However, the OSAID does not require that the Training Data be made available at all. The definition simply requires that:
"Sufficiently detailed information about the data used to train the system, so that a skilled person can recreate a substantially equivalent system using the same or similar data."
At first that sounds pretty reasonable... until you really think about what it means.
This means that an A.I. system would be considered "Open Source A.I." even if it provided zero data used to train it -- it simply must be possible for someone to use the closed, proprietary data... if they should happen to obtain it.
That's like saying "My software is open source. But I'm not going to let you have the source code. But, if you did get the source code -- like through espionage or something -- you'd be able to use it. Which means it's open source. But you can't distribute or modify that source. Because it's mine."
Now, an argument could be made that the source code for an AI system could be open even if the data is all closed... and, therefor, it would be "Open Source" under the old OSD. Which is absolutely true. But, in that case, why have an "OSAID" at all? Why not simply keep the existing OSD and focus on that?
Well... I think we have a simple answer to why this OSAID is so utterly strange...
The Corporate Sponsors
The Open Source Initiative is not a huge foundation, especially when compared to some. But it's revenue is not insignificant. And it's growing.
In 2023, the Open Source Initiative brought in a revenue of $786,000 -- up roughly $200,000 from the year prior.
And who sponsors the Open Source Initiative?
Google. Amazon. Meta. Microsoft (and GitHub). Red Hat. And many other corporations.
Many of these companies have some noteworthy things in common:
- They are in the A.I. business in some way.
- They make use of "Open Source" in their A.I. products.
- They use "Open Source" as a promotional and public relations tool.
- They, in one way or another, work with a closed, proprietary set of A.I. training data.
- They have significant "Diversity, Equity, and Inclusion" efforts.
When you add that all together, this "Open Source AI Definition" begins to make a lot more sense.
It is, in short:
An effort to create a "Certification" which will declare all of their A.I. systems (no matter how closed their data is) as "Open Source"... while simultaneously being run by a DEI activist organization with a focus on racial and gender identity quotas.
It checks a whole lot of check boxes. All at once.
What Impact Will This Have?
While many may argue that this "OSAID" is simply irrelevant -- and can be ignored by the broader "Free and Open Source Software" industry -- that misses a key impact that is worth noting.
That being: The continued corruption of both the ideas and the organizations of Open Source.
Not only has the Open Source Initiative banned their founding members (and re-written their own history)... they are now seeking to create a new "Open Source Definition" which will allow for systems consisting primarily of closed, proprietary data to be considered "Open Source". Thus making their Big Tech financiers happy.
The meaning of the term "Open Source" is being actively modified to mean "A little open, and a lot closed". And many of the same corproations which are funding this effort are also funding things like... The Linux Foundation.
Which means this corruption and dilution of the concept of "Open Source" is likely to spread far beyond the reaches of one, small (but growing) licensing certification foundation.
Also, apparently, decolonizing values... or something.