Talk at Utrecht University U-NLP Seminar

Presenting "Hidden in plain sight: injecting LLM memories for evaluation and safeguarding" at Utrecht University, U-NLP seminar.

In this talk, I focus on a memory injection paradigm and present two use cases. Firstly, I introduce our new unlearning testbed, LACUNA, which injects synthetic personally identifiable information (PII) into known parameters to evaluate whether unlearning methods actually target the parameters that store the memories they aim to unlearn. We use LACUNA to demonstrate that state-of-the-art unlearning methods do not target the correct parameters, raising questions about whether they truly unlearn or merely obfuscate PII. Secondly, I present Tiered Language Models, in which the memory injection is aimed at safeguarding certain information and capabilities. These can only be unlocked with the right key, enabling controlled access in open-weight models.

Posted on:: May 7, 2026

Length:: 1 minute read, 127 words

Categories:: talks

See Also: