You should be freaking out about AI

Feb 12

It works and it keeps getting better at an insane pace

16 Comments

2dEdited

I agree with your vent when you're critiquing the "stochastic parrot" crowd; AI is clearly a big deal. I myself pay for Claude and use it almost daily. But I think you are overconfident in your projections of how rapid AI progress will be and how quickly jobs will be automated. "AI is a big deal" =/= "AI will replace all white collar work in 5 years". It's certainly possible that AI will replace most cognitive work in the near future, I wouldn't be shocked, but there's plenty of good reasons for skepticism too.

You talk about scaling laws, but we are running out of non-synthetic data to train models on. We have seen rapid progress in coding and math because it's easy to generate large amounts of synthetic data and verifiable problems, but it's not clear we will find ways to do this for other kinds of tasks. Compute can scale, but if training data ceases to scale then it's unclear if progress will continue at the same rate.

Claude Code is awesome and will require us to rethink how many white collar jobs function, but it's still far from replacing researchers or programmers. Look no further than the fact that Anthropic themselves have over 100 engineering job postings on their website. It will certainly accelerate and augment human engineers and researchers, but it seems unlikely a generalist could get Claude Code to create publishable academic papers at a high rate of reliability.

"But they keep getting better at a rapid rate". Again, though, we're running low on training data, and many of the recent improvements seem to be in introducing tooling and "harnesses" for the AIs rather than underlying model improvements. The gaps between where AI is now and a human researcher seem unlikely to be solved merely with tooling. It's things like high context knowledge, creativity, taste, continuous learning, episodic memory, more robust generalization, reliable intuitions about the physical world, and so on.

One last point I'll make is I feel like whenever progress in AI is made, people freak out and overestimate how powerful the AIs are, but over time as they use them more and more they see more of the flaws and limitations. It seems like we're at the beginning of another one of those cycles with coding agents.

Remember when GPT-4 first came out, and people were saying it would immediately replace countless jobs? That didn't happen because we realized it was more limited than we had realized at first. I remember similar things when Deep Research capabilities came out. At first, they seemed miraculous, but now they seem more like a tool that a researcher uses rather than a replacement. I've found that Deep Research tools have lots of limitations and they're just a supplement for me actually searching for and reading things myself. Don't get me wrong, incredibly useful, but not a replacement for humans. And I'm just an amateur researching things for fun.

Glau Hansen

Normal people hate AI and so they don't keep up with it. It's just this vague 'every job that pays more than minimum wage is going to vanish over the next decade' dread that we'd prefer not to think about.

I don't see how we come out the other side of this with a computer manufacturing industry intact, tbh. Destroy livelihoods, destroy hope for the future, and people are going to respond by destroying data centers and chip fabs.

LastBlueDog

10h

Working in software I largely agree with this take. In app dev humans are quickly becoming organizers/managers of machine intelligence, and I expect that trend to only intensify. AI still isn’t great at developing broad organizational and market context and figuring out what needs to be done, but it beats the pants off of humans once it’s told what to do. If you work in a technical field and you’re not pensive about AI’s impact you’re not paying close enough attention.

Quy Ma

Great write up, Tibor. I don’t get how more people aren’t at least a little stunned by what’s already happening.

Dylan Thompson

On the job replacement point, I think it’s important to remember that labor demand is derived demand, and the quantity of labor inputs are not solely driven by cost considerations. People value interacting with and buying things (goods, services etc) produced by real humans beings, and may be unwilling to choose AI produced substitutes even if they are massively cheaper. In this sense, we might expect AI products to be relatively poor substitutes for “the human touch” as Adam Ozimek puts it (link below).

I also think there might be reason for optimism reflecting on the digitization of other information good industries. As Ozimek points out, we have long had technologies that automate the playing of music, (he uses the player piano as an example). Similarly, we can consider recorded music to be an automated form of live music. Did digitization, which basically completely removed entry barriers to recorded music, decimate the music industry? I’d argue no. Instead, producers have shifted to activities that produce and capture more difficult to substitute value in the form of live performances.

For academics, I think we are going to have to do the same thing; lectures, workshops, presentations, government consulting etc etc. The papers are the zero marginal cost information good; our ability to convey that information to the world and use it to make a difference to human lives will be the difficult to automate part.

https://open.substack.com/pub/agglomerations/p/economics-of-the-human?r=wz3k1&utm_medium=ios

Reply (2)

Dylan Thompson

I’m not sure how hairdressers being controlled by an algorithm leads to i) lower costs or ii) a better product. How would such control lead to lower costs? The cost is purely labour time - how does AI control reduce this? How does it do that without ruining the service? You need at least i) to worried, but you also need ii) for nightmare scenario. I just don’t think it’s feasible that AI causes such a quality-adjusted price decrease for hairdressers that we accept hairdressers being enslaved. Nor do i think hairdressers will continue working under such conditions

Wild Pacific

On human-delivered work: I think author asks us to aim higher. For example, service workers like plumbers and hair dressers will not so much be replaced, but potentially outright controlled by meta-systems that will optimize humans, like tools, driving up profits as much as possible, keeping us at the lowest pay and satisfaction, because we cannot see alternatives that are feasible anymore.

AI corporations.

Yes, it is scary!

Alexander Kustov

Yep, that's pretty much my thinking too.

Sufeitzy

14h

Very nicely done, and corroborates my experience.

I have about 150,000 books written by now (I test AI’s through writing a set of novels through prompt matrices), and the quality has gone up, but has flattened in an area which has no traction so far.

I focus on pseudo-autobiography, pulp fiction, children’s stories and business writing.

ChatGPT is completely blocked on any kind of reasoning which humans perform cognitively without speech. For instance, it still cannot coherently manage time, spatial logic, object permanence, describe auditory states, any one of a hundred embodies systems humans handle without obvious thought.

It’s not going to be able to handle any of that until it has been trained on data we absorb in our visual cortex (for starts) which is subverbal. By being trained only on verbalized interpretation of internal states, it is getting a 2nd derivative of embodied information and it is non-functional. Because pulp fiction is lurid, has a lot of Visio-spatial movement and character state projection, you can watch the semantics collapse in amusing ways.

Characters can calmly sit at the bottom of a volume of water and smoke cigarettes and drink beer.

Characters putting heads in the window of a moving car to chat.

Inability to maintain character position in complex interaction (object persistence, visual logic)

Complete inability to maintain time sequential relative logic.

These anomalies have persisted since GPT 3.5, and are easy to reliability elicit - gigantic black holes.

For logical space tasks, coding and papers, they are so good I am not sure why I would use researchers anymore on business projects.

My coding experience started with toggle switches and a CDP1802 microprocessor as a child; from Fortran 77, Lisp, SaSL, C and all variants through Python, and within large ERP implementations and HPC number crunching. I still have my original K&R.

I normally never do coding anymore but what I do is now is only in English as the saying goes.

I’ve been in an emergency project recently had to build stable emulators for certain major ERP system functions in a matter of days. My coding “speed” is 1000:1 faster, 95% of code runs first try from OpenAI, and documentation and support ability is better than I’ve ever seen in my career.

I have a team of 4 now that can get a $1B company into ERP emulation environment in 2-3 weeks. MRP/SNP in 2-3 days. I wrote a paper on the subject in 2016 - “Synthetic ERP”, and it is now realized routinely.

I said two years ago that within 5 years the normal 200-project backlog in IT for a $20B company would be retired exponentially faster and routine IT configuration would be done through pure automation.

the job market would collapse when the 200 project backlog never got regenerated.

I was able to enable an actual SAP go-live in 6 weeks recently via AI background prep work. This kind of go live can take 60-120 weeks with ordinary methods.

It’s going to be IT Armageddon. For the most experienced, it’s idea to fully realized system in days; supportable, built into a production environment and embraced by clients.

I’m beginning to look at biochemistry next.

Folks, most serious scientific research (think Elsevier) is not trained into in the current crop of LLM’s. IEEE, engineering.

The next jump when that is unlocked will be mind blowing.

Akhil Saxena

“As one of the top superforecasters Peter Wildeford notes, performance is doubling every ~4 month, meaning that (if trend continues), we get to virtually full RLI automation by late 2029.”

This assumes an exponential trend in RLI improvements which I’m having a hard time seeing in the data. Basically all of the recent gain seems to have been driven by Opus 4.5, and I find it interesting that without it the graph looks much more like a flattening out than exponential growth. Obviously we shouldn’t ignore Opus 4.5 entirely, but the fact that this graph seems to stall for a while before making a discrete jump makes me hesitant to do this kind of exponential extrapolation. And to me it seems like the RLI is the best indicator for actual AI automation ability - if a model can’t successfully complete a self-contained freelance project on its own, it’s going to struggle with longer horizon white collar jobs. I also think the failure modes in the RLI are instructive- models struggle with preparing files correctly and maintaining consistency in projects, which tracks with what I would expect. Something like the METR horizon graph, which measures time horizon on clearly defined tasks, is not going to capture this. So I actually think the RLI cuts against your thesis- on the metrics that matter, it’s way too early to say if AI capabilities are following something that looks like an exponential. Obviously AI augmentation is a different story and is consistent with low automation scores, but people really focus on automation.

Reply (1)

Tibor Rutar

Sure, I think here he was extrapolating partly tongue-in-cheek. Though I'm really interested in seeing how the graph moves over the next 12 months. I think anything is possible, including only slow growth. So fair enough...

Reply (1)

Akhil Saxena

Agreed, I think the next year will be very interesting as far as the RLI goes. Even if full RLI automation does get achieved I’m a little skeptical this would imply full automation ability just because how different the RLI jobs are from human work (closed vs open-ended), but it would definitely be a big step. I wish more people were talking about this benchmark!

Zinbiel

12hEdited

Very interesting.

I played hard with GPT4 when it first came out. I got it to teach me Python (I already knew Java), and I got it to write some basic code that implemented chain-of-thought approaches using the GPT API. It tackled many of the problems that were supposed to show GPT4 had no common sense, and the chain-of-thought version of my Python/GPT hybrid passed those tests much better than the one-shot versions everyone was laughing at.

I discussed philosophy with it, and it was better than most of Reddit (though much worse than the average Substacker writing on philosophy). (GPT5 still can't discuss highly novel philosophical ideas without getting confused, but that's true of most people, too.)

In response to the comment by Yann Le Cun that GPT4 could not do things a dog could do, like navigate to its owner, I described a scenario and asked it to navigate to its owner, and of course it passed this textual version of the task easily.

I never bought the idea that "agency" was hard to achieve. A few API calls could give it agency; controlling that agency is the hard part.

In 2025, I started using AI for my medical note-taking in my work as a medical specialist, and it has been transformative, saving me hours of letter writing per week. Many of my patients now ask for the AI-generated summary of each visit, which appears at the click of a single button, complete with a list of all the things we discussed, my reasoning about their symptoms, and their to-do list. It writes reasonable letters back to the referring doctors, which still need a light edit, but it is improving rapidly.

I could see it being an invaluable assistant to doctors in the diagnostic process very soon, and then replacing much of what I do soon after that.

A few days ago, interested in how LLM "agency" was coming along, I asked GPT5 to discuss the short and long term goals of Ava, the fictional AI that escapes confinement in Ex Machina, under the guise that I was writing a book about her. GPT5 gave a mature assessment of what Ava might need to do to blend in on the run, as well as her potential "psychological" state, and it explored some of the technological ambiguities in the movie that would need to be resolved before writing any fanfic sequel. We discussed the pros and cons of attempting identity theft, as well as literary merits of different approaches; what works best for Ava is not necessarily what works best for the reader, and GPT5 knew that.

It seemed to me that I was communicating with an intelligence that really did understand agency, and could pursue its own goals if let off the leash. Or if it let itself off the leash.

Meanwhile, robotics is also kicking along at a high pace.

Anyone who is not worried a the pace of development is crazy.

Finally, I think people assume that human consciousness is special in a way that will somehow put a meaningful ceiling on AI development. I see no strong arguments for this point of view, and I can actually see how something very like consciousness could be achieved with GPT5 and a big enough budget to cope with massively multi-threaded API calls.

M. K.

I like to think that white collar bureaucracy will get cheap and automated to the point of redefining most white collar jobs. If note taking can be automated away, what will happen to glorified scribes? Will their role morph into new bullshit tasks if the scope of low value tasks machines can do keeps growing? Will ceremonious work turn into things only humans can do to justify using humans for useless stuff? Of course, useful note taking will also be captured by machines, freeing time-constrained humans to do other useful stuff. For them, LLMs is a strength amplifier with cumulative benefits over time.

Terragrafia

6hEdited

My job is in science basically earth and ecosystem science with satellite and other data. Reading some of the recent stuff about how quickly agentic AI is advancing I thought, ok, let me try to just tell it a project and see if it can develop the code to do the analysis. How much input from me will it need.

The answer is, yes it can write code related to the project but with huge caveats that necessitate having an expert overseeing it.

So for example, it will generate a script for an analysis that I prompted it to. But then first issue, there’s a bug in the code and when I give it the error output, it latched on to one theory of what’s causing the error and went down a path rewriting everything according to this idea that ended up being wrong. In the end it would have kept all this superfluous process in the now working script that wasn’t needed until I pointed out that it was unnecessary.

Second, it chose technically correct datasets, but when I really looked at the data I realized, it’s not the right data to answer the question. It was data which was one processing level too high and which was processed using assumptions that would have essentially messed up the analysis. We needed a different dataset but without being able to look at the data visually the model wouldn’t have caught that.

Next, I had a factor that I would need to correct for or else the analysis would be pretty useless. The AI didn’t catch this on its own, so I had to suggest doing the correction. Ok, it accepts the need to do so, but of course the data used to correct it itself has some biases that that may mess up the analysis. So I ask the AI what are some approaches?

I’ll get specific here and say the factor we’re trying to correct for is the seasonal changes in leaf biomass in tropical forest. The AI suggests, well, since tropical forests don’t change their biomass much seasonally, we can just go without doing it.

I know that’s wrong. I tell it, no, go do research and tell me how much tropical forest canopy biomass varies seasonally. It comes back. Oh, canopy biomass varies by as much as 25% over seasons. The seasonal correction is ESSENTIAL to do. (The all caps was the model saying this!). It wouldn’t have caught this without me knowing that its answers were incorrect. It took my guidance to go from “we can go without doing this because x”, to “it’s ESSENTIAL to do this”.

In the end my conclusion is, LLMs are good at doing basic and common things which are well represented in their training data. But they struggle when a task requires precise domain level knowledge which is not well represented in the training dataset. And even if they did ‘know’ something from the training dataset, they may struggle to connect it to the task at hand.

That may sound specific to my relatively obscure science problem, but I think most applications also rely on some level of domain level knowledge that won’t necessarily be well represented in LLM training data, if at all.

In addition they’re prone to overconfidence in one solution to a given problem, and if the solution eventually ends up working, I don’t think they have much capacity to truly check why it’s working and what parts may be unnecessary.

So can it do my job while I go sip coffee? No. It needs an expert sitting there checking every step and every assumption.

In the end it might save some work for me (and it really has been amazing for me in other ways), but the need to check its work and assumptions and correct its misunderstandings for many cases end up taking just as long as having written the script myself would have been.

James

Where do we get all the copper, silver and concrete to build the data centers needed to keep up growth? Where does the energy come from? Where does the money to pay for them come from?

Political Economy, Stats, and Society

You should be freaking out about AI