Bitune: Bidirectional Instruction-Tuning

Dawid J. Kopiczko, Tijmen Blankevoort, Yuki M. Asano

We introduce Bitune, a method that improves instruction-tuning of pretrained decoder-only large language models, leading to consistent gains on downstream tasks. Bitune applies both causal and bidirectional attention to the prompt, to obtain a better representation of the query or instruction. We realize this by introducing two sets of parameters, for which we apply parameter-efficient finetuning techniques. These causal and bidirectional features are then combined into a weighted average with trainable coefficients, which is subsequently used to generate new tokens. We demonstrate significant improvements in zero-shot performance on commonsense reasoning, arithmetic, and language understanding tasks, while extensive ablation studies validate the role of each component and demonstrate the method's agnosticism to different PEFT techniques.

Bibtex

@inproceedings{kopiczko2025bitune,
title={Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs},
author={Kopiczko, Dawid J. and Blankevoort, Tijmen and Asano, Yuki M.},
booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
month={nov},
year={2025},
publisher={Association for Computational Linguistics}
}