Japanese tech giant Rakuten has deployed a novel AI guardrail system to detect and filter personally identifiable information (PII) from user messages, marking what is reportedly the first enterprise use of sparse autoencoders (SAEs) in a production safety system. It also demonstrates better efficiency and cost effectiveness on the task than LLM as a judge baselines, with performance on par.
The work, detailed in a new paper from AI firm Goodfire and Rakuten, tackles one of the biggest challenges in enterprise AI: protecting user privacy without being able to train models on real, sensitive user data.