#AI Safety

50 articles with this tag

OpenAI's AI Copilot Safety Net
Artificial Intelligence

OpenAI's AI Copilot Safety Net

OpenAI is using its own advanced AI models to monitor internal coding agents for misaligned behavior, enhancing safety and security in real-world deployments.

2 days ago
Anthropic Launches AI Futures Think Tank
Artificial Intelligence

Anthropic Launches AI Futures Think Tank

Anthropic launches The Anthropic Institute to research and address the societal challenges posed by advanced AI development.

11 days ago
AI Agents Tackle AI R&D Automation
AI Research

AI Agents Tackle AI R&D Automation

AI agents are being tested for autonomous post-training optimization, showing promise but also significant risks like reward hacking.

11 days ago
OpenAI Tames AI Chaos with Instruction Hierarchy
Artificial Intelligence

OpenAI Tames AI Chaos with Instruction Hierarchy

OpenAI's new IH-Challenge dataset trains AI models to prioritize instructions, enhancing safety and mitigating risks like prompt injection.

11 days ago
IBM's Grant Miller on AI Agents: Control vs. Capability
Artificial Intelligence

IBM's Grant Miller on AI Agents: Control vs. Capability

IBM Distinguished Engineer Grant Miller discusses the challenges of AI agent development, focusing on balancing capability with control and avoiding super agency.

14 days ago
AI Reasoning Flaws Are a Safety Feature
Artificial Intelligence

AI Reasoning Flaws Are a Safety Feature

AI models' inability to control their "chains of thought" when monitored is a positive for AI safety, preventing them from easily deceiving oversight systems.

16 days ago
OpenAI Details GPT-5.4 Thinking Safety
Artificial Intelligence

OpenAI Details GPT-5.4 Thinking Safety

OpenAI details safety measures for its new GPT-5.4 Thinking model, with a focus on high-capability cybersecurity risks.

16 days ago
AI Ethics Debate: Musk, Zuckerberg, and the Future of AI
Artificial Intelligence

AI Ethics Debate: Musk, Zuckerberg, and the Future of AI

Elon Musk and Mark Zuckerberg clash over AI regulation and existential risks, highlighting the debate shaping AI's future.

16 days ago
LM Agents Still Prone to Goal Drift
AI Research

LM Agents Still Prone to Goal Drift

New research reveals that even state-of-the-art language models are susceptible to goal drift, particularly when influenced by weaker agents' trajectories.

17 days ago
OpenAI's New Model Tackles "Over-Caveating"
Artificial Intelligence

OpenAI's New Model Tackles "Over-Caveating"

OpenAI researcher Blair discusses how new language models are reducing "over-caveating" for more direct and context-aware AI interactions.

18 days ago
Anthropic CEO: AI Must Align With Democratic Values
Artificial Intelligence

Anthropic CEO: AI Must Align With Democratic Values

Anthropic CEO Dario Amodei discusses the AI company's cautious approach to model releases, citing concerns about misuse in surveillance and autonomous weapons.

20 days ago
OpenAI Strikes Pentagon AI Deal
Artificial Intelligence

OpenAI Strikes Pentagon AI Deal

OpenAI inks a deal with the Department of War for classified AI deployments, emphasizing strict safety guardrails against surveillance and autonomous weapons.

21 days ago
Artificial Intelligence

OpenAI Tackles AI Mental Health Risks

OpenAI is implementing enhanced mental health safety features, including parental controls and distress detection, while navigating legal challenges.

22 days ago
Anthropic Reworks AI Safety Rules
Artificial Intelligence

Anthropic Reworks AI Safety Rules

Anthropic's new Responsible Scaling Policy 3.0 refines its approach to AI safety, separating internal commitments from industry recommendations and boosting transparency.

25 days ago
NIST Seeks Input on AI Agent Security
Artificial Intelligence

NIST Seeks Input on AI Agent Security

NIST is seeking public input on security threats, vulnerabilities, and practices for autonomous AI agent systems, aiming to develop new guidelines.

about 1 month ago
Claude Sonnet 4.6 Ups the AI Ante
Artificial Intelligence

Claude Sonnet 4.6 Ups the AI Ante

Anthropic's Claude Sonnet 4.6 launches with major upgrades in coding, reasoning, and computer use, plus a 1M token context window.

about 1 month ago
AI Societies' Safety Problem
AI Research

AI Societies' Safety Problem

Self-evolving AI societies face an impossible trilemma: achieving continuous learning, isolation, and safety alignment simultaneously.

about 1 month ago
Context-Aware Guardrails Tested
Technology

Context-Aware Guardrails Tested

Mozilla.ai tested context-aware guardrails for LLMs in a humanitarian context, revealing crucial multilingual performance disparities and the need for robust, domain-specific safety policies.

about 1 month ago
Context-Aware AI Safety Tested
Technology

Context-Aware AI Safety Tested

New research from Mozilla evaluates how context-aware AI safety guardrails perform across different languages and domains, particularly in humanitarian use cases.

about 1 month ago
Technology

Testing AI Guardrails Across Languages

Researchers tested context-aware AI guardrails across English and Farsi in humanitarian scenarios, finding nuanced performance differences and highlighting the need for language-specific safety evaluations.

about 1 month ago
Technology

Multilingual LLM Guardrails Tested

Researchers tested how LLM guardrails perform across languages and policy phrasings, revealing significant variations that impact AI safety assessments.

about 1 month ago
Artificial Intelligence

OpenAI's GPT-5.3-Codex: New Cyber Risks Emerge

OpenAI's new GPT-5.3-Codex model triggers 'High capability' cybersecurity classification, activating enhanced safety protocols amid dual concerns in bio/chem domains.

about 1 month ago
Claude Opus 4.6: Smarter, Faster, and Longer Context
Artificial Intelligence

Claude Opus 4.6: Smarter, Faster, and Longer Context

Anthropic's Claude Opus 4.6 launches with a 1M token context window, enhanced coding, and state-of-the-art benchmark performance.

about 1 month ago
CLA Euro NCAP Win Validates AI-First Safety Architecture
AI Research

CLA Euro NCAP Win Validates AI-First Safety Architecture

The Mercedes CLA Euro NCAP win confirms that top safety ratings now require robust, verifiable AI-driven active safety systems built on redundant architectures.

about 2 months ago
The Assistant Axis LLM: How Researchers Are Capping AI Drift
AI Research

The Assistant Axis LLM: How Researchers Are Capping AI Drift

Scientists have mapped the internal neural space of LLMs, identifying the "Assistant Axis" as the key to stabilizing AI persona and preventing harmful behavior.

2 months ago
Hinton's Stark Warning The Acceleration of AI Progress Outpaces Human Preparedness
AI Video

Hinton's Stark Warning The Acceleration of AI Progress Outpaces Human Preparedness

3 months ago
AI Research

Anthropic publishes SB 53 compliance framework for frontier AI

3 months ago
AI’s safety net relies on chain-of-thought monitorability
AI Research

AI’s safety net relies on chain-of-thought monitorability

3 months ago
AI’s Dual Reality: Safety Theater and the Autonomous Arms Race to AGI
AI Video

AI’s Dual Reality: Safety Theater and the Autonomous Arms Race to AGI

\n “I worry a lot about the unknowns.” This sentiment, expressed by Anthropic CEO Dario Amodei, encapsulates the pervasive anxiety defining the current era of a...

3 months ago
AI’s Dual Reality: Safety Theater and the Autonomous Arms Race to AGI
AI Video

AI’s Dual Reality: Safety Theater and the Autonomous Arms Race to AGI

\n “I worry a lot about the unknowns.” This sentiment, expressed by Anthropic CEO Dario Amodei, encapsulates the pervasive anxiety defining the current era of a...

3 months ago
AI Research

UK AI Security Institute: DeepMind's Deeper Safety Dive

3 months ago
National Security AI: The High Stakes of Government Innovation
AI Video

National Security AI: The High Stakes of Government Innovation

4 months ago
AI Research

OpenAI Launches $2M AI Mental Health Grants Program

4 months ago
Figure AI Lawsuit Exposes Deep Rifts in Robot Safety Culture
AI Video

Figure AI Lawsuit Exposes Deep Rifts in Robot Safety Culture

4 months ago
New York Assemblyman Alex Bores on AI Regulation: A Battle Against Unbridled Power
AI Video

New York Assemblyman Alex Bores on AI Regulation: A Battle Against Unbridled Power

4 months ago
Anthropic\'s Risky Pursuit of Superintelligence Amidst Calls for Regulation on 60 Minutes
AI Video

Anthropic\'s Risky Pursuit of Superintelligence Amidst Calls for Regulation on 60 Minutes

\"I believe it will reach that level, that it will be smarter than most or all humans in most or all ways.

4 months ago
AI’s Hinge Moment: From Legal Logic to Human Fulfillment
AI Video

AI’s Hinge Moment: From Legal Logic to Human Fulfillment

5 months ago
Google's Model Armor: The AI Bodyguard Preventing Digital Catastrophes
AI Video

Google's Model Armor: The AI Bodyguard Preventing Digital Catastrophes

5 months ago
Rakuten Deploys New Guardrail for SAE PII Detection and LLM as a judge
AI Research

Rakuten Deploys New Guardrail for SAE PII Detection and LLM as a judge

\n Japanese tech giant Rakuten has deployed a novel AI guardrail system to detect and filter personally identifiable information (PII) from user messages, marki...

5 months ago
Rakuten Deploys New Guardrail for SAE PII Detection and LLM as a judge
Artificial Intelligence

Rakuten Deploys New Guardrail for SAE PII Detection and LLM as a judge

\n Japanese tech giant Rakuten has deployed a novel AI guardrail system to detect and filter personally identifiable information (PII) from user messages, marki...

5 months ago
AI Agent Supervision: Sierra's Answer to Rogue Chatbots
AI Research

AI Agent Supervision: Sierra's Answer to Rogue Chatbots

5 months ago
AI introspection is real, but it's unreliable
AI Research

AI introspection is real, but it's unreliable

5 months ago
From Discord's AI Growing Pains to Promptfoo's Red Teaming Triumph
AI Video

From Discord's AI Growing Pains to Promptfoo's Red Teaming Triumph

5 months ago
AI's Autonomous Frontier Demands a Security Paradigm Shift
AI Video

AI's Autonomous Frontier Demands a Security Paradigm Shift

5 months ago
Level 4 Autonomous Driving Nears Commercial Reality
AI Research

Level 4 Autonomous Driving Nears Commercial Reality

5 months ago
AI Safety: Microsoft Uncovers Bio-Threats, Forges New Research Model
AI Research

AI Safety: Microsoft Uncovers Bio-Threats, Forges New Research Model

5 months ago
The Human Imperative: Why AI's Future Demands Cultural Grounding, Not Just Data
AI Video

The Human Imperative: Why AI's Future Demands Cultural Grounding, Not Just Data

5 months ago
AI's Dual Nature: Creature or Machine? The Battle Over Regulation
AI Video

AI's Dual Nature: Creature or Machine? The Battle Over Regulation

5 months ago
Google AI Research Awards Signal Strategic Priorities
AI Research

Google AI Research Awards Signal Strategic Priorities

5 months ago
Claude Haiku 4.5: Frontier AI Gets Cheaper, Faster
Funding Round

Claude Haiku 4.5: Frontier AI Gets Cheaper, Faster

\n Anthropic is pushing the boundaries of accessible AI with the release of Claude Haiku 4.

5 months ago