Leaner Models
Created on August 16|Last edited on August 16
Comment
Using Dense layers to clamp down parameters seems like a good idea.
BERT (as well as SpanBERT) token embeddings are of 768 dimensions each. That means, that the span embedding (EACH SPAN) will be around 2400 dimensions.
Instead, if we exploit the dense layers to bring this down, not only may we avoid some memory constraints, we would end up with a leaner model. So let's see what's the downside.
Run set
2
The results are discouraging. But only slightly so. Clearly there is a performance difference (This is B-Cubed by the way). From 0.68 to 0.64 at their respective bests.
But in terms of parameters?
- Full Model: 142,079,881
- Lean Model: 112,714,257
The final results seems a bit discouraging: not a lot of parameter (and efficiency) improvement but certainly
Add a comment