Persuasion - Ontonotes

Created on August 16|Last edited on August 16
Comment
Here we describe what happens when we run Persuasion and Ontonotes, trying to benefit the former via the latter.
Let's start with a baseline run: just Persuasion data (21 train instances). Optimised for the Coref + Pruner task. 
For contrast, we put in what happens when we also mix 221 instances from Ontonotes there as well.  TODO: when run is done, add it here.
﻿
B3, MUC, Ceafe F1
B3, MUC, Ceafe F1
50100150200Step0.20.40.6
crpr-ccper   valid.codicrac-persuasion.coref.b_cubed_f1
crpr-ccper   valid.codicrac-persuasion.coref.muc_f1
crpr-ccper   valid.codicrac-persuasion.coref.ceafe_f1
Run set1
﻿
Conclusions to DrawIts clear that there are some benefits. That's not that great of a find, really. (Right?). Let's make it a bit more nuanced
Affect of Different TasksNow let's compare four runs:  
Persuasion: COR PRU | Ontonotes - 
Persuasion: COR PRU | Ontonotes COR PRU
Persuasion: COR PRU | Ontontoes NER
Persuasion: COR PRU | Ontonotes COR PRU NER
﻿
Run set4
﻿
We ignore MUC and Ceafe here but they show similar results.
Conclusions to Draw﻿
All three tasks combinations benefit CCPER. 
The hierarchy of effect seems to be: COR + PRU ≅ COR + PRU + NER > NER > No task
It begs the question: is NER actually helpful at all? Or are we just benefiting from the model being trained on more data. As we see in the other report, 
Ceafe still sucks.
﻿
Affect of Dense LayersDense layers are a set of two feed forward layers that we add between the Transformer Encoder and the task decoders. The layers are comprised of Linear + ReLU + DropOut. See this Notion Page - https://geraltofrivia783.notion.site/Report-01-08-2022-09-08-2022-Frozen-BERT-Dense-Layers-53783b1e758744a99d387385b6d4a9a0 ﻿
# ... encoder stuff
self.dense = nn.Sequential(
		nn.Linear(768, 768),
		nn.DropOut(0.2),
		nn.Batchnorm1D(32)
		nn.ReLU)
)
# decoder stuff
We start by comparing Dense Layer enabled counterparts for each case. (PS: see the boxes below to switch between different groups)
﻿
﻿
 
NER2
COR + PRU2
 
COR + PRU + NER2
﻿
Across task combinations, we find that dense layer does more harm than good, surprisingly. Needs further investigation.
Affect of Ontonotes sizeSo far we've been working with a small subset of ontonotes. What happens if we use the entire dataset.
﻿
NER task2
﻿
 Clearly the NER task does not seem to have a lot of affect (of more Ontonotes data) when concerned with Persuasion performance.
﻿
COR + PRU2
﻿
 Coref and Pruner task seem to have a sizeable affect. 
﻿
COR + PRU + NER2
﻿
The affect here seems to lie somewhere in the middle of the two tasks.
﻿
Add a comment