Bitune
: Bidirectional
Instruction-Tuning
Dawid J. Kopiczko, Tijmen Blankevoort, Yuki M. Asano
We introduceBitune
, a method that improves
instruction-tuning of pretrained decoder-only large
language models, leading to consistent gains on
downstream tasks. Bitune
applies both
causal and bidirectional attention to the prompt, to
obtain a better representation of the query or
instruction. We realize this by introducing two sets of
parameters, for which we apply parameter-efficient
finetuning techniques. These causal and bidirectional
features are then combined into a weighted average with
trainable coefficients, which is subsequently used to
generate new tokens. We demonstrate significant
improvements in zero-shot performance on commonsense
reasoning, arithmetic, and language understanding tasks,
while extensive ablation studies validate the role of
each component and demonstrate the method's agnosticism
to different PEFT techniques.





Bibtex
@inproceedings{kopiczko2025bitune,title={
Bitune
: Leveraging Bidirectional
Attention to Improve Decoder-Only LLMs}, author={Kopiczko, Dawid J. and Blankevoort, Tijmen and Asano, Yuki M.},
booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
month={nov},
year={2025},
publisher={Association for Computational Linguistics}
}