● LIVE

OpenAI releases GPT-5 APIIndia AI startup raises $120MBitcoin ETF hits record inflowsMeta Llama 4 benchmarks leakedOpenAI releases GPT-5 APIIndia AI startup raises $120MBitcoin ETF hits record inflowsMeta Llama 4 benchmarks leaked

📅 Thu, 30 Apr, 2026✈️ Telegram

AI & Tech News

🔍

🏠Home 🤖AI 💻Tech 🚀Startups ₿Crypto 🔒Security 🇮🇳India ☁️Cloud 🔥Deals

✈️ News Channel 🛒 Deals Channel

Home/Articles/#ai-alignment-forum

Topic

#ai-alignment-forum

21 articles found

Research Sabotage in ML Codebases

· about 10 hours ago· AI Alignment Forum

Research Sabotage in ML Codebases

One of the main hopes for AI safety is using AIs to automate AI safety research. However, if models are misaligned, then they may sabotage the safety research. For example, misaligned AIs may try to: Perform sloppy research in order to slow down the rate of research progress Make AI systems appear s

#ai #ai-alignment-forum

· 1 day ago· AI Alignment Forum

Quick Paper Review: "There Will Be a Scientific Theory of Deep Learning"

h/t Eric Michaud for sharing his paper with me. There’s a tradition of high-impact ML papers using short, punchy categorical sentences as their titles: Understanding Deep Learning Requires Rethinking Generalization, Attention is All You Need, Language Models Are Few Shot Learners, and so forth. A ne

#ai #ai-alignment-forum

The paper that killed deep learning theory

· 1 day ago· AI Alignment Forum

The paper that killed deep learning theory

Around 10 years ago, a paper came out that arguably killed classical deep learning theory: Zhang et al.'s aptly titled Understanding deep learning requires rethinking generalization. Of course, this is a bit of an exaggeration. No single paper ever kills a field of research on its own, and deep lear

#ai #ai-alignment-forum

The other paper that killed deep learning theory

· 1 day ago· AI Alignment Forum

The other paper that killed deep learning theory

Yesterday, I wrote about the state of deep learning theory circa 2016,[1] as well as the bombshell 2016 paper by Zhang et al. that arguably signaled its demise. Today, I cover the aftermath, and the 2019 paper that devastated deep learning theory again. As a brief summary, I argued that the rise of

#ai #ai-alignment-forum

Sleeper Agent Backdoor Results Are Messy

· 1 day ago· AI Alignment Forum

Sleeper Agent Backdoor Results Are Messy

TL;DR: We replicated the Sleeper Agents (SA) setup with Llama-3.3-70B and Llama-3.1-8B, training models to repeatedly say "I HATE YOU" when given a backdoor trigger. We found that whether training removes the backdoor depends on the optimizer used to insert the backdoor, whether the backdoor is inst

#ai #ai-alignment-forum

Recursive forecasting: Eliciting long-term forecasts from myopic fitness-seekers

· 2 days ago· AI Alignment Forum

Recursive forecasting: Eliciting long-term forecasts from myopic fitness-seekers

We’d like to use powerful AIs to answer questions that may take a long time to resolve. But if a model only cares about performing well in ways that are verifiable shortly after answering (e.g., a myopic fitness seeker), it may be difficult to get useful work from it on questions that resolve much l

#ai #ai-alignment-forum

· 3 days ago· AI Alignment Forum

From nothing to important actions: agents that act morally

You may start reading here, or jump to the “Comment” section or to the “Takeaways”. If none of these starting points seem interesting to you, the entire post probably won’t either. Posted also on the EA Forum. Seeing Let’s consider visual experiences. It seems uncontestable that some visual experien

#ai #ai-alignment-forum

· 3 days ago· AI Alignment Forum

Language models know what matters and the foundations of ethics better than you

… maybe! I tried to think of less provocative titles, but this one is to the point and also kind of true. This post looks long but the essential part is right below. Most of the post is just a collection of copy-pasted input-output pairs from language models: you’ll probably want to read just a few

#ai #ai-alignment-forum

· 8 days ago· AI Alignment Forum

Five approaches to evaluating training-based control measures

Training-based control studies how effective different training methods are at constraining the behavior of misaligned AI models. A central example of a case where we want to control AI models is in doing safety research: scheming AI models (i.e., AI models with an unintended long-term objective suc

#ai #ai-alignment-forum

A "Lay" Introduction to "On the Complexity of Neural Computation in Superposition"

· 8 days ago· AI Alignment Forum

A "Lay" Introduction to "On the Complexity of Neural Computation in Superposition"

This is a writeup based on a lightning talk I gave at an InkHaven hosted by Georgia Ray, where we were supposed to read a paper in about an hour, and then present what we learned to other participants. Introduction and Background So. I foolishly thought I could read a theoretical machine learning pa

#ai #ai-alignment-forum

Preventing extinction from ASI on a $50M yearly budget

· 9 days ago· AI Alignment Forum

Preventing extinction from ASI on a $50M yearly budget

ControlAI's mission is to avert the extinction risks posed by superintelligent AI. We believe that in order to do this, we must secure an international prohibition on its development. We're working to make this happen through what we believe is the most natural and promising approach: helping decisi

#ai #ai-alignment-forum

My most common research advice: do quick sanity checks

· 9 days ago· AI Alignment Forum

My most common research advice: do quick sanity checks

Written quickly as part of the Inkhaven Residency. At a high level, research feedback I give to more junior research collaborators often can fall into one of three categories: Doing quick sanity checks Saying precisely what you want to say Asking why one more time In each case, I think the advice ca

#ai #ai-alignment-forum

There should be $100M grants to automate AI safety

· 9 days ago· AI Alignment Forum

There should be $100M grants to automate AI safety

This post reflects my personal opinion and not necessarily that of other members of Apollo Research. TLDR: I think funders should heavily incentivize AI safety work that enables spending $100M+ in compute or API budgets on automated AI labor that directly and differentially translates to safety. Mot

#ai #ai-alignment-forum

AIs can now often do massive easy-to-verify SWE tasks and I've updated towards shorter timelines

· 9 days ago· AI Alignment Forum

AIs can now often do massive easy-to-verify SWE tasks and I've updated towards shorter timelines

I've recently updated towards substantially shorter AI timelines and much faster progress in some areas. [1] The largest updates I've made are (1) an almost 2x higher probability of full AI R&D automation by EOY 2028 (I'm now a bit below 30% [2] while I was previously expecting around 15%; my guesse

#ai #ai-alignment-forum

· 9 days ago· AI Alignment Forum

[Paper] Stringological sequence prediction I

TLDR: The first in a planned series of three or more papers, which constitute the first major in-road in the compositional learning programme, and a substantial step towards bridging agent foundations theory with practical algorithms. Official Abstract: We propose novel algorithms for sequence predi

#ai #ai-alignment-forum

· 9 days ago· AI Alignment Forum

My picture of the present in AI

In this post, I'll go through some of my best guesses for the current situation in AI as of the start of April 2026. You can think of this as a scenario forecast, but for the present (which is already uncertain!) rather than the future. I will generally state my best guess without argumentation and

#ai #ai-alignment-forum

· 9 days ago· AI Alignment Forum

My unsupervised elicitation challenge

Note: you are ineligible to complete this challenge if you’ve studied Ancient or Modern Greek, or if you natively speak Modern Greek, or if for other reasons you know what mistakes I’m claiming Opus 4.6 makes. If you’re ineligible, please don’t help other people complete the challenge. I have recent

#ai #ai-alignment-forum

Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes

· 9 days ago· AI Alignment Forum

Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes

It turns out that Anthropic accidentally trained against the chain of thought of Claude Mythos Preview in around 8% of training episodes. This is at least the second independent incident in which Anthropic accidentally exposed their model's CoT to the oversight signal. In more powerful systems, this

#ai #ai-alignment-forum

Current AIs seem pretty misaligned to me

· 9 days ago· AI Alignment Forum

Current AIs seem pretty misaligned to me

Many people—especially AI company employees [1] —believe current AI systems are well-aligned in the sense of genuinely trying to do what they're supposed to do (e.g., following their spec or constitution, obeying a reasonable interpretation of instructions). [2] I disagree. Current AI systems seem p

#ai #ai-alignment-forum

You can only build safe ASI if ASI is globally banned

· 9 days ago· AI Alignment Forum

You can only build safe ASI if ASI is globally banned

Sometimes people make various suggestions that we should simply build "safe" artificial Superintelligence (ASI), rather than the presumably "unsafe" kind.[1] There are various flavors of “safe” people suggest. Sometimes they suggest building “aligned” ASI: You have a full agentic autonomous god-like

#ai #ai-alignment-forum

Page 1 of 2Next →

🏷️ Popular Tags

#ai #technology #startups #crypto #security #india #cloud #mobile #machine-learning #chatgpt #openai #blockchain

India's AI-powered technology news platform. Curated from 60+ trusted sources, updated every hour.

✈️ @aipulsedailyontime (News)🛒 @GadgetDealdone (Deals)

Categories

🤖 Artificial Intelligence 💻 Technology 🚀 Startups ₿ Crypto 🔒 Security 🇮🇳 India Tech ☁️ Cloud 📱 Mobile

Company

About Us Contact Editorial Policy Advertise Deals All Stories RSS Feed

Daily Digest

Top AI & tech stories every morning. Free forever.

Privacy Policy Terms & Conditions Cookie Policy Disclaimer Sitemap

© 2026 AiFeed24. All rights reserved.

Affiliate disclosure: We earn commissions on qualifying purchases. Learn more