Leaner Models

Created on August 16|Last edited on August 16
Comment
Using Dense layers to clamp down parameters seems like a good idea.
BERT (as well as SpanBERT) token embeddings are of 768 dimensions each. That means, that the span embedding (EACH SPAN) will be around 2400 dimensions.
Instead, if we exploit the dense layers to bring this down, not only may we avoid some memory constraints, we would end up with a leaner model. So let's see what's the downside.
﻿
Valid Coref CCPer
Valid Coref CCPer
20406080100Step00.10.20.30.40.50.6
lean-pb-peron-sr-lrs
redo-pb-peron-sr-lrs
Run set2
﻿
The results are discouraging. But only slightly so. Clearly there is a performance difference (This is B-Cubed by the way). From 0.68 to 0.64 at their respective bests.
But in terms of parameters?
Full Model:  142,079,881 
Lean Model: 112,714,257
The final results seems a bit discouraging: not a lot of parameter (and efficiency) improvement but certainly 
﻿
Add a comment