Persuasion - Ontonotes
Created on August 16|Last edited on August 16
Comment
Here we describe what happens when we run Persuasion and Ontonotes, trying to benefit the former via the latter.
Let's start with a baseline run: just Persuasion data (21 train instances). Optimised for the Coref + Pruner task.
For contrast, we put in what happens when we also mix 221 instances from Ontonotes there as well. TODO: when run is done, add it here.
Run set
1
Conclusions to Draw
Its clear that there are some benefits. That's not that great of a find, really. (Right?). Let's make it a bit more nuanced
Affect of Different Tasks
Now let's compare four runs:
- Persuasion: COR PRU | Ontonotes -
- Persuasion: COR PRU | Ontonotes COR PRU
- Persuasion: COR PRU | Ontontoes NER
- Persuasion: COR PRU | Ontonotes COR PRU NER
Run set
4
We ignore MUC and Ceafe here but they show similar results.
Conclusions to Draw
All three tasks combinations benefit CCPER.
The hierarchy of effect seems to be: COR + PRU ≅ COR + PRU + NER > NER > No task
It begs the question: is NER actually helpful at all? Or are we just benefiting from the model being trained on more data. As we see in the other report,
Ceafe still sucks.
Affect of Dense Layers
Dense layers are a set of two feed forward layers that we add between the Transformer Encoder and the task decoders. The layers are comprised of Linear + ReLU + DropOut. See this Notion Page - https://geraltofrivia783.notion.site/Report-01-08-2022-09-08-2022-Frozen-BERT-Dense-Layers-53783b1e758744a99d387385b6d4a9a0
# ... encoder stuffself.dense = nn.Sequential(nn.Linear(768, 768),nn.DropOut(0.2),nn.Batchnorm1D(32)nn.ReLU))# decoder stuff
We start by comparing Dense Layer enabled counterparts for each case. (PS: see the boxes below to switch between different groups)
2
COR + PRU
2
2
Across task combinations, we find that dense layer does more harm than good, surprisingly. Needs further investigation.
Affect of Ontonotes size
So far we've been working with a small subset of ontonotes. What happens if we use the entire dataset.
NER task
2
Clearly the NER task does not seem to have a lot of affect (of more Ontonotes data) when concerned with Persuasion performance.
COR + PRU
2
Coref and Pruner task seem to have a sizeable affect.
COR + PRU + NER
2
The affect here seems to lie somewhere in the middle of the two tasks.
Add a comment