I'd recommend anyone who wants to understand the emergence of AI and OpenAI in particular to read this book.
I do struggle reading non-fiction, and this book was no different. It took me nearly 2 months to finish reading (in my evening snippets) - but it was definitely worthwhile. It was this 90 minute interview by Novara Media with Karen Hao that made me immediately purchase the book.
I don't think I can fully review this book and do it justice, but I can share what I learnt from the book where my original assumptions were wrong about AI and AI companies.
The book is focused primarily around OpenAI and Sam Altman, and in particular how his ousting in November 2023 came to be, through hundreds of interviews and documents, and paints a very insightful picture of the messiah complex (though not in her words) that Altman has.
Here's a short summary of things I didn't know but learnt through reading this:
- Altman and friends started conversations (and business, albeit initially as a non-profit, though that didn't last) in 2015 because they were going to build AGI and that it was inevitable.
- The company was formed with scientific researchers originally, again as an open company with the intent of sharing, though spoiler alert: this all changed
- I had always assumed it was cowboy bro developers working on the code, but it was, originally, academic based engineering
- AI safety was constantly there and initially significant - but as we all know now, eventually lost a battle to make an impact, left "hobbled" and thrown rather to the wayside.
- The training data was, after GPT-2, ingested wholesale and attempt to clean/sanitise would happen on the results coming from prompts - i.e. the inputs were not cleaned, which means applying dizzying array of filters to catch on the output and edge cases.
- Common Crawl was introduced at GPT-3 - which is also where the input filtering stopped happening
- AI, or Western AI companies including OpenAI (but also Google and Microsoft) put their data centres in the Global South, additionally sourcing their data annotators from the poorest countries allowing them to pay (via third parties) literal pennies per hour for the work (which could also come with terrible mental health side effects as the worker would read and view the generative content that AI could come up with based on the unfiltered dark corners of the web)
- Sam Altman lies. Little lies, but from a great deal of documentation, a lot and often to tell people what they want to hear whilst (we guess?!) having some ulterior motive
- The path that OpenAI decided to take to head towards what they believe will be AGI, effectively requires unlimited compute power, when in reality, there are lots of different applications of AI that don't need that level of power, Stable Diffusion being one such example trained using 256 GPUs (still not a desktop computer, but not hundreds of thousand GPUs either)
- OpenAI's approach, to close off it's scientific findings, close it's source and refusing to share methods means that there's no way to verify any of their progress, but more importantly is stripping the academic scientific community of it's researchers (as someone who has visited CERN on two occassions, seeing science being shared is incredible and incredible for society)
My only complaint about the book (and it's likely to be my own fault) is I had trouble with the jumping backwards and forwards in time - I'd often be unsure where we were in the timeline.
If you work in tech, I'd absolutely recommend this book. If it's not possible, then definitely the interview I linked above.
Originally published on Remy Sharp's b:log