Episode

The Power of Questions: Why Curiosity Outshines Knowledge in the Age of AI

March 29, 2025 · 02:43

🎙 Podcast Summary: "Asking Smart Questions Beats Answering Hard Ones"

What if the real test of intelligence isn’t giving perfect answers – but coming up with the right questions in the first place? In his thought-provoking newsletter piece, historian Dan Cohen dives into the limitations of how we currently assess artificial intelligence. Cohen humorously recounts taking a notoriously difficult "Humanity’s Last Exam" – a 3,000-question mega-test meant for AI – and flunking it spectacularly, despite being a PhD-holding historian. From that humbling experience, Cohen launches into a deeper reflection: while large language models are rapidly improving and even outperforming humans in areas like translation and handwriting recognition, they fall short in one major area — curiosity. He argues that history, and perhaps human progress itself, is driven less by having answers, and more by asking bold, unexpected questions — something AI still struggles with. As Cohen notes, “PhD-level work is not just about correct answers. It is more about asking distinctive, uncommon questions.”

🧠 Key Points:

Cohen took the AI-targeted Humanity’s Last Exam (HLE) and scored almost nothing. His critique: it’s heavily biased toward STEM — only 16 out of 3,000 questions were on history, and four of those were about naval battles.
HLE and similar tests define intelligence as the ability to answer complex questions correctly. But Cohen argues this is a narrow definition that misses the essence of scholarly thinking.
AI is undeniably improving in certain research tasks. Recent large language models can now:
- Pass PhD-level history exams with high accuracy
- Translate languages
- Transcribe texts
- Interpret complex documents and historical data
Historians like Benjamin Breen and Cameron Blevins have shown AI’s rapid gains in archiving, research assistance, and even deciphering handwritten text — long a major challenge in digital scholarship.
However, AI’s focus on right answers sidelines a key part of human intelligence: generating insightful questions that start entirely new fields of inquiry.
Good historical work often starts with strange, novel questions. Examples include:
- “Why did audiences at orchestral concerts become silent?”
- “Why did Isaac Newton write more on alchemy than physics?”
- “How did firsthand experiences of war reshape entire cultures?”
Cohen ends on a critical note: AI might be able to beat first-year PhDs in fact-retention or translation — but can it ever ask an original, paradigm-shifting question?

🎧 Notable Quote:
“Ultimately, we may want answers, but we must begin with new queries, new areas of interest... This is a much bigger challenge.”

🧠 Extra Context:

"Humanity’s Last Exam" is available on GitHub and Hugging Face, developed to measure AI’s general intelligence across disciplines.
The article touches on ongoing debates about AI in education and scholarship, echoing concerns that AI may accelerate learning while diminishing intellectual depth.

📚 Related Reading:

“Listening in Paris” by James Johnson (on audience behavior in music history)
“The Metaphysical World of Isaac Newton” (on Newton’s occult studies)

🎙 Curious about AI, scholarship, and human creativity? Subscribe to Dan Cohen’s newsletter “Humane Ingenuity” to follow the conversation.
Link to Article

Listen to jawbreaker.io using one of many popular podcasting apps or directories.

← Previous · All Episodes · Next →