AI Insiders ($9!): https://www.patreon.com/AIExplained
FrontierMath: https://epoch.ai/frontiermath
https://arxiv.org/pdf/2411.04872
Chollet Statement:https://arcprize.org/blog/oai-o3-pub-breakthrough
MLC Paper:
https://www.scientificamerican.com/article/new-training-method-helps-ai-generalize-like-people-do/?utm_campaign=socialflow&utm_source=twitter&utm_medium=social
AlphaCode 2: https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf
Human Performance on ARC-AGI: https://arxiv.org/pdf/2409.01374v1
Wei Tweet ‘3 months’:https://x.com/_jasonwei/status/1870184982007644614
Deliberative Alignment Paper: https://openai.com/index/deliberative-alignment/
Brown Safety Tweet: https://x.com/polynoamial/status/1870196476908834893
Swe-Bench Verified: https://openai.com/index/introducing-swe-bench-verified/
Amodei Prediction: https://x.com/OfirPress/status/1858567863788769518
David Dohan: 16 hours https://x.com/dmdohan/status/1870171404093796638
OpenAI Personal Writing: https://openai.com/index/learning-to-reason-with-llms/
https://simple-bench.com/
John Hallman Tweet: https://x.com/johnohallman/status/1870233375681945725
00:00 – Introduction
01:19 – What is o3?
03:18 – FrontierMath
05:15 – o4, o5
06:03 – GPQA
06:24 – Coding, Codeforces + SWE-verified, AlphaCode 2
08:13 – 1st Caveat
09:03 – Compositionality?
10:16 – SimpleBench?
13:11 – ARC-AGI, Chollet
20:25 – Safety Implicaitons
AI Insiders: https://www.patreon.com/AIExplained
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Podcast: https://aiexplainedopodcast.buzzsprout.com/
