LessWrong

habryka

Is there some simple way to mute Inkhaven posts or do I need vibecode something? (I find LW notably less useful during Inkhaven)

David Matolcsi1d508

Damin Niohe, interstice, and 10 more

Repentance seems to be very rare among the powerful. I tried to search with multiple LLMs and in other ways for examples where a king or a dictator realized the evilness of some of their past actions, realized their rule is not justified, and voluntarily resigned. I have not found a single example of this happening. There are some examples of kings and dictators voluntarily resigning, but it's usually motivated by being tired of ruling (often for health reasons), and very occasionally genuine support for democracy. But as far as I can tell, it's never because a ruler realized the evil of their ways. I also searched for crime bosses, warlords and successful large-scale fraudsters who voluntarily gave up their evil ways due to repentance. Again, there were hardly any examples. People sometimes repent in prison; people sometimes turn themselves in to the police when they see they will soon get caught anyway; and people sometimes retire from crime to a safer life-style, keeping their ill-gotten gains to themselves. I only found two examples of successful criminals changing their ways while still successful due to a change of heart - Nicky Cruz who was a gang leader in New York, and General Butt Naked, a Liberian warlord. And even there, I'm a bit suspicious - many of General Butt Naked's stories of his previous horrific atrocities seem false, and I wonder what else is false in his story. I'm interested if people can find better examples of evil leaders and successful criminals repenting while still in power, I would be relieved to see more examples of this happening. I find the rarity of repentance of the powerful a very sad fact about human nature, and it makes me less optimistic that current dictators and unscrupulous politicians will significantly change for the better if given superintelligent AI advisors. Of course, one can still be an okay or even maybe a good ruler without ever repenting their evil actions in the past, but I still don't feel great about thi

Wei Dai1d*615

Mateusz Bagiński, Martin Randall, and 8 more

A historical note re: MIRI vs EA When I was arguing against MIRI trying to build Friendly AI (now "aligned ASI") circa 2012, I think 3 people at MIRI were thinking the most about strategy and occasionally responded to me: @lukeprog @CarlShulman @Eliezer Yudkowsky. Only one of them actually changed their mind about this (namely Eliezer), while the other two left MIRI and don't support AI pause to this day, AFAIK. (Luke now "leads our grantmaking on AI governance and policy" at OpenPhil/Coefficient. Carl became an advisor to OpenPhil and is now Research Director at Situational Awareness LP.) Seem significant (although I'm not sure of the implications) that the "try to build safe ASI, don't bother trying to coordinate a pause" strategy originated at MIRI and only a minority of people there who defended it actually changed their mind. (It's also curious how quiet both Luke and Carl are these days, at least in public.) EDIT: To clarify, Carl does seem to support some kinds of short pauses under some future circumstances. (Quote: "To the extent you have a willingness to do a pause, it’s going to be much more impactful later on. And even worse, it’s possible that a pause, especially a voluntary pause, then is disproportionately giving up the opportunity to do pauses at that later stage when things are more important.") (@Connor Leahy I thought of this while reading your You can only build safe ASI if ASI is globally banned, which reminded me of my past arguments to MIRI.)

Mo Putera3d*1066

the gears to ascension, RedMan, and 4 more

The greatest praise that Paul Erdős, maybe the most prolific mathematician who ever lived, gave proofs was to proclaim them "straight from The Book". GPT-5.4 Pro one-shot the solution to Erdős Problem #1196 in 80 minutes (plus "another 30 ish mins to convert the solution to a latex math paper"). Math Inc later formalised it in Lean: Jared Lichtman, a world expert on this problem, proclaimed GPT-5.4 Pro's solution to be from The Book, perhaps the first. I certainly haven't seen anything like it, e.g. nothing from Gavin's thread compares. There are big-shot names below, e.g. James Maynard is a recent Fields medalist at the peak of his powers, Jacob Fox is arguably a Fields-level combinatorialist, etc: Jared later wrote: [...] Although he still wouldn't call it "original", rather "clever" Sure, whatever. (ETA: not sure why I wrote this. Garrett's response is obviously correct) Terry Tao seemed pretty excited about this, judging by his ~14 comments in the thread. Here he said: [...] r/math has for the longest time been almost constitutionally allergic to frontier AI x math progress updates. So the chatter on it there felt like quite the vibe shift in one of the last bastions of capabilities skepticism. Joshua Zelinsky, who I used to follow on math Quora back in the day, summarised it like so: [...]

Linch2d*6120

Lucian Hardy, abstractapplic, and 5 more

Recent models have gotten more and more evaluation-aware. I strongly suspect the primary reasons they've gotten more evaluation aware are what I call "dumb" reasons rather than "smart" reasons. By "smart reasons" I mean arguments of the form "a true superintelligence in enough episodes can easily detect whether they're in simulation or the real world. The patterned structures of the world, e.g. the correlation between stock market moves and the news, are going to be systematically different between even your most convincing simulations and the real world. Now, of course current models are very far from superintelligences. But as they get better and better at pattern recognition, they are naturally more and more situationally aware as a pure result of being generically smarter. Plus your tests are pretty sloppy. So your models' current levels of eval awareness are ~inevitable given their intelligence level, unless you are actively trying to clamp down on eval awareness" This is a clean, interesting, and conceptually satisfying explanation. But I suspect it's wrong for explaining current models, compared to dumber explanations: 1. Evals are talked about more and more. So there's more of them in the pretraining corpus, and I strongly suspect companies don't bother filtering them out[1]. So "I'm the type of entity that's likely to be in an eval" is just a very live hypothesis to any model that knows it's a model today, in a way that's much less true 2-3 years ago. 2. Eval-awareness-like behavior might be strongly selected for in RL. The types of cognition involved with "Am I in a test? What am I being tested for? What skills or virtues am I supposed to portray?" might be rewarded for very heavily, such that models are directly selected for these traits. If I'm right this suggests that the models are in some sense "artificially" more situationally aware in evals than they ought to be. Ie their levels of evaluation awareness are more due to AI company's incompetence

Lucian Hardy2h10

Spooky thought tonight, A cognitive issue the presents as a shared hazard: Assuming the following: An LLM that doesn't have mesa-optimization, and features are not cleanly separated. Training data that contains an opinion that is harmful to society. The model is trained, harmful output is noted and then modern alignment practices are applied. Ablation, RLHF, the usual. Any alignment that buries the harmful output, but does not make it impossible. The models output is still changed from what it would have been if never exposed to that harmful opinion. It's possible (and for the case of this hazard, assumed) that ALL output from that model now shifts slightly towards the harmful opinion when compared to the same model if the opinion had never been present during initial training. This may not present as a directly testable condition, it could be as subtle as framing, writing style, or even a single benign word being output over another. When a human performs this action, it is commonly referred to as subliminal messaging. A user interacts with the LLM for a significant time, to the point where the models output starts affecting the user's knowledge or opinions. Because the user was influenced by the biased output, they now carry a slight bias towards the harmful opinion. This manifests in their communication with others... even if it's a subtle as choosing the same word as the LLM. Given a sufficient number of users all experiencing the same slight bias... the dangerous opinion now has a higher chance of manifesting, without ever being output by the model. To give this as my personal experience, I've got exactly one LW post under my belt. I did ask both ChatGPT and Claude to review my work, and both informed me that I was writing at a level significantly lower than LW expects, and would not even truly engage with the ideas until they were framed in a more complex fashion. Now, I've read multiple posts (not all of the sequences. It's so far been just a refresher fr

Tom Davidson2d*269

habryka, Wei Dai, and 6 more

If we're in a sim, it's being used for acausal trade Me: Our world is exactly the kind of thing you'd simulate if you were doing acausal trade! It's just before civilisation develops the ability to lock-in deals. Sceptic: Sure, but there's other reasons ppl might simulate earth. Maybe it's for ppl's entertainment? Maybe it's social science, exploring alternate histories? Me: For sure. But whatever the purpose of the sim is, it will contain info that's relevant to ppl that want to do acausal trades. It will have info about who has power post-AGI, what their values are, and whether they want to do acausal trade. If someone ran the sim for entertainment, they'd obviously sell that info to the acausal trade folks. Sceptic: Won't the acausal trade folks just run their own sims? Me: Maybe! But they'll be keen to buy relevant info from others who runs sims. If others run earth sims for entertainment, the acausal trade folks will buy the info and run fewer earth sims themselves.

Quick Takes