Artificial Intelligence (Previously "Chat GPT")
-
The main part of why this graph is important, is the questions are private and they are formulated so they are not questions on the internet. So the models can't be trained to the benchmark, and it's not seen the question before.
That's why all the models previously were doing so poorly. The guy why runs this describes anything under 10% as noise.
This is the first step towards a model working out answers from the much overused term "first principles". It's a huge achievement. The first step to these things creating new knowledge (and yes I know Google's model has created a new algorithm, but this a new approach than what they did).
The other test was "Humanity's Last Exam" and the models were previously tapping out on that at 25%. These are all PHD and above questions in many different domains, no single human could possibly answer all the questions, it would take a team of experts.
Grok Heavy got 50.7% correct.
-
@Kirwan said in Artificial Intelligence (Previously "Chat GPT"):
WTAF
Hard to know what to make of this. Is it legit in its accuracy? The comments seem to sway between believers and the rest.
If real, man the implications for energy usage are just nuts.
-
-
A bit of a Doomer take, that's all implementation details and in some part that's open slather already before you consider AI. That's why Apple use privacy as a marketing ploy against that trend.
We are in the goldrush stage, and once actual products come out of these things (still smoke and mirrors really) then people will start taking security seriously.
I did have to laught at OpenAI saying very clearly that everything you say to a model could be turned over to the authorities. Helpfully, in testing, the models want to the call the authorities themselves.