News

2026

Talk at Ivado's AI Safety and Alignment Research Cluster (May 2026)

Presenting "Hidden in plain sight: injecting LLM memories for evaluation and safeguarding" at Ivado’s AI Safety and Alignment Research Cluster.

In this talk, I focus on a memory injection paradigm and present two use cases. Firstly, I introduce our new unlearning testbed, LACUNA, which injects synthetic personally identifiable information (PII) into known parameters to evaluate whether unlearning methods actually target the parameters that store the memories they aim to unlearn. We use LACUNA to demonstrate that state-of-the-art unlearning methods do not target the correct parameters, raising questions about whether they truly unlearn or merely obfuscate PII. Secondly, I present Tiered Language Models, in which the memory injection is aimed at safeguarding certain information and capabilities. These can only be unlocked with the right key, enabling controlled access in open-weight models.

Talk at Utrecht University U-NLP Seminar (May 2026)

Presenting "Hidden in plain sight: injecting LLM memories for evaluation and safeguarding" at Utrecht University, U-NLP seminar.

In this talk, I focus on a memory injection paradigm and present two use cases. Firstly, I introduce our new unlearning testbed, LACUNA, which injects synthetic personally identifiable information (PII) into known parameters to evaluate whether unlearning methods actually target the parameters that store the memories they aim to unlearn. We use LACUNA to demonstrate that state-of-the-art unlearning methods do not target the correct parameters, raising questions about whether they truly unlearn or merely obfuscate PII. Secondly, I present Tiered Language Models, in which the memory injection is aimed at safeguarding certain information and capabilities. These can only be unlocked with the right key, enabling controlled access in open-weight models.

Awarded the IVADO Postdoctoral Fellowship (March 2026)

Honored to receive the IVADO postdoctoral fellowship!


2025

Video available for 'Memorisation: myth or mystery?' talk (December 2025)

Video available for "Memorisation: myth or mystery?", presented at the Ivado workshop on Deploying Autonomous Agents. Watch on YouTube.

PhD Thesis Now Available (November 2025)

My PhD thesis is now available here.

Keynote at BlackboxNLP 2025 (November 2025)

Attending BlackboxNLP and honored to give a keynote: "Memorization: myth or mystery?".

Abstract: "In deep learning, the perspective on memorization of training examples is undergoing a paradigm shift. Previously linked to overfitting and poor generalization, memorization is now seen as benign, beneficial or concerning, depending on the data involved. This shift raises questions about the mystery that is memorization: what should and shouldn’t models memorize, how is memorization implemented internally, and, more fundamentally, can we talk about memorization as a phenomenon that is separate from generalization or is this a myth? In this talk, I will provide you with the lay of the land on memorization analyses from a behavioural and model-internal perspective, reflecting on the pressing challenges it poses for interpretability research and why I think we should not shy away from them."

IJCAI 2025 JAIR Award for Compositionality Decomposed (August 2025)

Received the IJCAI 2025 JAIR award for Compositionality Decomposed!

PhD Defense & Joining Mila and McGillNLP (July 2025)

I defended my PhD and am joining Prof Siva Reddy as a postdoc at Mila and McGillNLP.