Research Areas

We're experimenting with solutions in these spaces. We are always looking for research collaborators or partners in pushing frontiers.

Model Distillation

Distilling capabilities of larger models into smaller, cheaper, and domain specific ones.

Alternatives to Gradient Descent

Local losses, feedback alignment, forward-forward, and other unconventional deep learning approaches.

Mechanistic Interpretability

Interpretability and formal verification of deep learning models.

Publications

Provably Overwhelming Transformer Models with Designed Inputs

We develop an algorithm which, given a trained transformer model MM as input, as well as a string of tokens ss of length nfixn_{fix} and an integer nfreen_{free}, can generate a mathematical proof that MM is ``overwhelmed`` by ss, in time and space O˜(nfix2+nfree3)O˜(n^{2}_{fix}+n^{3}_{free}). We say that MM is ``overwhelmed`` by ss when the output of the model evaluated on this string plus any additional string tt, MM(ss+tt), is completely insensitive to the value of the string tt whenever length(tt)≤nfreen_{free}. Along the way, we prove a particularly strong worst-case form of ``over-squashing'', which we use to bound the model's behavior. Our technique uses computer-aided proofs to establish this type of operationally relevant guarantee about transformer models. We empirically test our algorithm on a single layer transformer complete with an attention head, layer-norm, MLP/ReLU layers, and RoPE positional encoding. We believe that this work is a stepping stone towards the difficult task of obtaining useful guarantees for trained transformer models.

Lev Stambler, Sajjad Nezhadi, Matthew Coudron
#transformers #mech_interp
View on arXiv