Artificial Intelligence (Previously "Chat GPT")
-
@grok explain this to me like as though I'm an 8 year old
*Okay! Imagine you have a big toy puzzle that’s super tricky, and it’s called ARC-AGI-2. This puzzle is special because it’s designed to test how smart something is at figuring out new things, kind of like how you learn to solve a new game without anyone telling you all the rules first. Grown-ups use it to see how smart their robot helpers, like AI (which is like a super smart computer brain), can be.
Now, there’s a robot brain called Grok 4, made by a team called xAI. It’s like the smartest kid in class who just figured out how to solve 15.9% of that tricky puzzle! That’s a big deal because the other robot brains could only solve about half that much before. It’s like going from solving 8 pieces to solving 15 pieces of a 100-piece puzzle!
The people at ARC Prize, who made the puzzle, checked Grok 4’s work and said, “Wow, you’re really good!” They even showed it on a big chart (like a scoreboard) to prove it. This makes Grok 4 the new champion, and it’s exciting because it means robots are getting better at thinking like us.
But here’s a little grown-up stuff: some people are worried because making Grok 4 super smart used a lot of energy, and that’s not always good for the planet. It’s like using extra batteries that might hurt the Earth. So, while it’s cool that Grok 4 is so smart, grown-ups are also thinking about how to keep things safe and happy for everyone!*
-
The main part of why this graph is important, is the questions are private and they are formulated so they are not questions on the internet. So the models can't be trained to the benchmark, and it's not seen the question before.
That's why all the models previously were doing so poorly. The guy why runs this describes anything under 10% as noise.
This is the first step towards a model working out answers from the much overused term "first principles". It's a huge achievement. The first step to these things creating new knowledge (and yes I know Google's model has created a new algorithm, but this a new approach than what they did).
The other test was "Humanity's Last Exam" and the models were previously tapping out on that at 25%. These are all PHD and above questions in many different domains, no single human could possibly answer all the questions, it would take a team of experts.
Grok Heavy got 50.7% correct.
-
@Kirwan said in Artificial Intelligence (Previously "Chat GPT"):
WTAF
Hard to know what to make of this. Is it legit in its accuracy? The comments seem to sway between believers and the rest.
If real, man the implications for energy usage are just nuts.