Robustness to Scale — AI Alignment Forum

I want to quickly draw attention to a concept in AI alignment: Robustness to Scale. Briefly, you want your proposal for an AI to be robust ...

Robustness to Scaling Down: More Important Than I Thought

Robustness to scaling down, meaning that a solution to alignment keeps working if the AI is not optimal or perfect. Robustness to relative scale ...

Does robustness improve with scale? - AI Alignment Forum

Frontier LLMs like ChatGPT are powerful but not always robust. Scale helps with many things. We wanted to see if scaling up the model size ...

AI Robustness - AI Alignment Forum

... and its capabilities when exposed to different data distributions or environments. Posts tagged AI Robustness. Most Relevant. 1. 45Robustness to Scale.

What is the difference between robustness and inner alignment?

Inner alignment refers to the following problem: How can we ensure that the policy an AI agents ends up with is robustly pursuing the objective ...

Scalable Oversight and Weak-to-Strong Generalization

AI ALIGNMENT FORUM · AF. Login. Scalable ... scale – as the AI systems being supervised ... This form of evaluation is probably overly conservative ...

Beyond the Board: Exploring AI Robustness Through Go

AI ALIGNMENT FORUM · AF. Login. Adversarial ... RobustnessRobust AgentsAI ... AI systems are integrated in critical infrastructure or deployed in ...

Robust Agents - AI Alignment Forum

Robust Agents are decision-makers who can perform well in a variety of situations. Whereas some humans rely on folk wisdom or instinct, and some AIs might ...

Solving adversarial attacks in computer vision as a baby version of ...

AI ALIGNMENT FORUM · AF. Login. Solving ... AI alignment. 9 min ... scale aggregation for adversarial robustness (explainer on X/Twitter).

AI alignment with humans... but with which humans? — EA Forum

When AI alignment researchers talk about 'alignment ... scale AI systems up to try to reflect the ... robust methods for quantifying pain intensity.

Adversarial Robustness Could Help Prevent Catastrophic Misuse

... access to all site functionality. AI ALIGNMENT FORUM · AF. Login. Adversarial Robustness Could Help Prevent Catastrophic Misuse. 11 min read.

Worst-case thinking in AI alignment

AI ALIGNMENT FORUM · AF. Login. Worst-case ... robust alignment,” they aren't saying that you ... A smaller-scale example: When trying to ...

Managing catastrophic misuse without robust AIs - AI Alignment Forum

Mitigations for large-scale cybercrime. The threat model for LLM cybercrime has pretty different properties than the bioterrorism threat model.

Takeaways from a survey on AI alignment resources — EA Forum

18. AI safetyAI Safety CampAI alignmentCollections and resourcesSurveys ... scale from 0 to 4 how useful they'd found it ... robust methods for ...

Disentangling inner alignment failures - AI Alignment Forum

... robust than alignment". Here are three different threat models for how an AI system could very suddenly do catastrophic things: Deception ...

Robust Agents - History - AI Alignment Forum

Applied to Robustness to Scale by RaemonTest2 1y ago. •. Applied to A multi-disciplinary view on AI safety research by Roman Leventov 2y ago. •. Applied to ...

Alignment-as-a-Service: Scale AI vs. the new guys - Interconnects

We don't have good ways to get expert data for the domains we care about or added data to get robustness we need, and Scale has been the place ...

What Is AI Alignment? - IBM

Artificial intelligence (AI) alignment is the process of encoding human values and goals into AI models to make them as helpful, ...

Paradigms of AI alignment: components and enablers

(Robustness problems also belong in the design-revealed gap, but we expect them to be less of an issue for advanced AI systems, while inner ...

Arguments for Robustness in AI Alignment - LessWrong

Most popular robustness papers address short-term problems with robustness failures in neural networks, e.g., in the context of autonomous ...