BERT-ranker baselines for CRR

We report here the results of BERT-ranker for the task of conversation response ranking using the transformer-rankers library.
Gustavo Penha
Created on February 3|Last edited on February 3
Comment
﻿
The experimental results in this report were generated using transformer-rankers.  The task is to do rank responses for a given conversational context. We use three benchmarks for this task: MANTiS, MSDialog and Ubuntu. 
﻿
Conversation Response RankingThe task of conversation response ranking concerns retrieving the best response given the dialogue context. Formally, let D={(Ui,Ri,Yi)}i=1ND=\{(U_i, R_i, Y_i)\}_{i=1}^{N}D={(Ui​,Ri​,Yi​)}i=1N​ be a data set consisting of NNN triplets: dialogue context, response candidates and response relevance labels. The dialogue context UiU_iUi​ is composed of the previous utterances {u1,u2,...,uτ}\{u^1, u^2, ... , u^{\tau}\}{u1,u2,...,uτ} at the turn τ\tauτ of the dialogue. The candidate responses Ri={r1,r2,...,rk}R_i = \{r^1, r^2, ..., r^k\}Ri​={r1,r2,...,rk} are either ground-truth responses or negative sampled candidates (using BM25 as the negative sampler), indicated by the relevance labels Yi={y1,y2,...,yk}Y_i = \{y^1, y^2, ..., y^k\}Yi​={y1,y2,...,yk} The task is then to learn a ranking function f(.)f(.)f(.) that is able to generate a ranked list for the set of candidate responses RiR_iRi​ based on their predicted relevance scores f(Ui,r)f({U}_i,r)f(Ui​,r). 
﻿
BERT rankerBERT will learn the function f(Ui,r)f({U}_i,r)f(Ui​,r), based on the representation of the [CLS] token. The input for BERT is the concatenation of the context Ui{U}_iUi​ and the response rrr, separated by SEP tokens. This is the equivalent of early adaptations of BERT for ad-hoc retrieval transported to conversation response ranking. Formally the input sentence to BERT is 
concat(Ui,r)=u1  ∣  [UTTERANCE_SEP]  ∣  u2  ∣  [TURN_SEP]  ∣  ...  ∣  uτ  ∣  [SEP]  ∣  rconcat({U}_i,r) = u^1 \; | \; [UTTERANCE\_SEP]\; | \; u^2 \; | \; [TURN\_SEP] \; | \; ... \; | \; u^{\tau} \; | \; [SEP] \; | \; rconcat(Ui​,r)=u1∣[UTTERANCE_SEP]∣u2∣[TURN_SEP]∣...∣uτ∣[SEP]∣r, 
where ∣|∣ indicates the concatenation operation. The utterances from the context Ui{U}_iUi​ are concatenated with special separator tokens [UTTERANCE_SEP][UTTERANCE\_SEP][UTTERANCE_SEP] and [TURN_SEP][TURN\_SEP][TURN_SEP] indicating end of utterances and turns. The response rrr is concatenated with the context using BERT's standard sentence separator [SEP][SEP][SEP]. We fine-tune BERT on the target conversational corpus and make predictions as follows: 
f(Ui,r)=σ(FFN(BERTCLS(concat(Ui,r)))),f({U}_i,r) = \sigma(FFN(BERT_{CLS}(concat({U}_i,r)))),f(Ui​,r)=σ(FFN(BERTCLS​(concat(Ui​,r)))), 
where BERTCLSBERT_{CLS}BERTCLS​ is the pooling operation that extracts the representation of the [CLS]  token from the last layer and FFNFFNFFN is a feed-forward network that outputs logits for two classes (relevant and non-relevant). We pass the logits through a softmax transformation σ\sigmaσ that gives us a probability of relevance. We use the cross entropy loss for training. The learned function f(Ui,r)f({U}_i,r)f(Ui​,r) outputs a point estimate which is used to rank all the responses on RRR.
Reproduce ResultsThe script we use is the following:
export CUDA_VISIBLE_DEVICES=3,4,5,6,7
source /ssd/gustavo/transformer_rankers/env/bin/activate
REPO_DIR=/ssd/gustavo/transformer_rankers
ANSERINI_FOLDER=/ssd/gustavo/anserini/

VALIDATE_EVERY_X_STEPS=100
TRAIN_INSTANCES=300000
WANDB_PROJECT='library-crr-bert-baseline'

for SEED in 1 2 3 4 5
do 
    for TASK in 'mantis' 'msdialog' 'ubuntu_dstc8'
    do
        python ../examples/pointwise_bert_ranker.py \
            --task $TASK \
            --data_folder $REPO_DIR/data/ \
            --output_dir $REPO_DIR/data/output_data/ \
            --sample_data -1 \
            --max_seq_len 512 \
            --num_validation_batches 500 \
            --validate_every_epochs -1 \
            --validate_every_steps $VALIDATE_EVERY_X_STEPS \
            --train_negative_sampler bm25 \
            --test_negative_sampler bm25 \
            --num_epochs 1 \
            --num_training_instances $TRAIN_INSTANCES \
            --train_batch_size 8 \
            --val_batch_size 8 \
            --num_ns_train 9 \
            --num_ns_eval 9 \
            --seed $SEED \
            --anserini_folder $ANSERINI_FOLDER \
            --wandb_project $WANDB_PROJECT        
    done
done
﻿
Results
﻿
﻿
﻿
Run set15
﻿
﻿
Add a comment