A new benchmark pitting AI against previously unseen maths problems shows systems still fall short of top human expertise.
It's even better than Playwright and other tools.