Artificial Intelligence (Previously "Chat GPT")

Kirwan

The graph is crazy. This is a third part benchmark, Grok 4 is the first to break 10% and is twice as good as second place.

Rembrandt

@grok explain this to me like as though I'm an 8 year old

*Okay! Imagine you have a big toy puzzle that’s super tricky, and it’s called ARC-AGI-2. This puzzle is special because it’s designed to test how smart something is at figuring out new things, kind of like how you learn to solve a new game without anyone telling you all the rules first. Grown-ups use it to see how smart their robot helpers, like AI (which is like a super smart computer brain), can be.

Now, there’s a robot brain called Grok 4, made by a team called xAI. It’s like the smartest kid in class who just figured out how to solve 15.9% of that tricky puzzle! That’s a big deal because the other robot brains could only solve about half that much before. It’s like going from solving 8 pieces to solving 15 pieces of a 100-piece puzzle!

The people at ARC Prize, who made the puzzle, checked Grok 4’s work and said, “Wow, you’re really good!” They even showed it on a big chart (like a scoreboard) to prove it. This makes Grok 4 the new champion, and it’s exciting because it means robots are getting better at thinking like us.

But here’s a little grown-up stuff: some people are worried because making Grok 4 super smart used a lot of energy, and that’s not always good for the planet. It’s like using extra batteries that might hurt the Earth. So, while it’s cool that Grok 4 is so smart, grown-ups are also thinking about how to keep things safe and happy for everyone!*

Kirwan

The main part of why this graph is important, is the questions are private and they are formulated so they are not questions on the internet. So the models can't be trained to the benchmark, and it's not seen the question before.

That's why all the models previously were doing so poorly. The guy why runs this describes anything under 10% as noise.

This is the first step towards a model working out answers from the much overused term "first principles". It's a huge achievement. The first step to these things creating new knowledge (and yes I know Google's model has created a new algorithm, but this a new approach than what they did).

The other test was "Humanity's Last Exam" and the models were previously tapping out on that at 25%. These are all PHD and above questions in many different domains, no single human could possibly answer all the questions, it would take a team of experts.

Grok Heavy got 50.7% correct.

Kirwan

Rembrandt

@Kirwan The Sycophancy chapter is fascinating. AI with morals.

Kirwan

WTAF

voodoo

@Kirwan said in Artificial Intelligence (Previously "Chat GPT"):

WTAF

Hard to know what to make of this. Is it legit in its accuracy? The comments seem to sway between believers and the rest.

If real, man the implications for energy usage are just nuts.

Kirwan

Well, it’s one way to force the atomic age.

I was reading about some plans for data centers in space as well. World is changing.

voodoo

Behind the meter atomic age. Kind of scary

Kirwan

If they don't start building out power stations at a similar rate to China, electricity is going to get expensive over there.

NTA

Ignore the headline - read instead about how very few people are employing technology correctly but the ones who are can benefit.

Sheryl Estrada / Aug 17 / Newsletters

MIT report: 95% of generative AI pilots at companies are failing | Fortune

There’s a stark difference in success rates between companies that purchase AI tools from vendors and those that build them internally.

Tim

@Kirwan A lot of nuclear power plants under construction in China at the moment (~ 30).

Kirwan

Yeah, the bottleneck to win the AI race is power. US might have already lost.

voodoo

antipodean

Kirwan

A bit of a Doomer take, that's all implementation details and in some part that's open slather already before you consider AI. That's why Apple use privacy as a marketing ploy against that trend.

We are in the goldrush stage, and once actual products come out of these things (still smoke and mirrors really) then people will start taking security seriously.

I did have to laught at OpenAI saying very clearly that everything you say to a model could be turned over to the authorities. Helpfully, in testing, the models want to the call the authorities themselves.

Kirwan

One of the more impressive uses of various new video tools. Nano Banana;

Tim

Discovering new solutions to century-old problems in fluid dynamics

Our new method could help mathematicians leverage AI techniques to tackle long-standing challenges in mathematics, physics and engineering.

Kirwan

Using AI to understand;

Google DeepMind, with Brown, NYU, and Stanford collaborators, developed enhanced Physics-Informed Neural Networks (PINNs) using second-order optimizers to uncover unstable singularities ("blow-ups") in fluid equations like Euler, Navier-Stokes, IPM, and Boussinesq. Discoveries include novel singularity families, a λ-instability order pattern in IPM/Boussinesq, and ultra-precise vorticity visuals (Earth-diameter accuracy).

These singularities probe fluid limits, tying directly to the unsolved $1M Millennium Navier-Stokes problem on turbulence and smoothness. The AI-math hybrid enables rigorous computer-assisted proofs, accelerating breakthroughs in physics (e.g., turbulence modeling) and engineering (e.g., aerodynamics).

Real-world benefits include optimized aircraft/car designs for fuel efficiency, improved weather forecasting via better atmospheric models, enhanced biomedical simulations of blood flow to aid cardiovascular treatments, more accurate ocean current predictions for climate and shipping, efficient petroleum extraction, and pollution dispersion modeling for environmental protection—ultimately enabling safer, greener technologies where traditional math falls short.

voodoo

@Kirwan said in Artificial Intelligence (Previously "Chat GPT"):

Using AI to understand;

Google DeepMind, with Brown, NYU, and Stanford collaborators, developed enhanced Physics-Informed Neural Networks (PINNs) using second-order optimizers to uncover unstable singularities ("blow-ups") in fluid equations like Euler, Navier-Stokes, IPM, and Boussinesq. Discoveries include novel singularity families, a λ-instability order pattern in IPM/Boussinesq, and ultra-precise vorticity visuals (Earth-diameter accuracy).

These singularities probe fluid limits, tying directly to the unsolved $1M Millennium Navier-Stokes problem on turbulence and smoothness. The AI-math hybrid enables rigorous computer-assisted proofs, accelerating breakthroughs in physics (e.g., turbulence modeling) and engineering (e.g., aerodynamics).

Real-world benefits include optimized aircraft/car designs for fuel efficiency, improved weather forecasting via better atmospheric models, enhanced biomedical simulations of blood flow to aid cardiovascular treatments, more accurate ocean current predictions for climate and shipping, efficient petroleum extraction, and pollution dispersion modeling for environmental protection—ultimately enabling safer, greener technologies where traditional math falls short.

Yeah, I've been spending a lot of time thinking on this recently also...

The Silver Fern

Artificial Intelligence (Previously "Chat GPT")

MIT report: 95% of generative AI pilots at companies are failing | Fortune

Discovering new solutions to century-old problems in fluid dynamics