Bitune
: Bidirectional
Instruction-Tuning
Dawid J. Kopiczko, Tijmen Blankevoort, Yuki M. Asano
We introduceBitune
, a method that improves
instruction-tuning of pretrained decoder-only large
language models, leading to consistent gains on
downstream tasks. Bitune
applies both
causal and bidirectional attention to the prompt, to
obtain a better representation of the query or
instruction. We realize this by introducing two sets of
parameters, for which we apply parameter-efficient
finetuning techniques. These causal and bidirectional
features are then combined into a weighted average with
trainable coefficients, which is subsequently used to
generate new tokens. We demonstrate significant
improvements in zero-shot performance on commonsense
reasoning, arithmetic, and language understanding tasks,
while extensive ablation studies validate the role of
each component and demonstrate the method's agnosticism
to different PEFT techniques.
Bibtex
@misc{kopiczko2024bitune,title={
Bitune
: Bidirectional
Instruction-Tuning}, author={Dawid J. Kopiczko and Tijmen Blankevoort and Yuki M. Asano},
year={2024},
eprint={2405.14862},
archivePrefix={arXiv}
primaryClass={cs.CL}
}