Reporting on first findings
Solid performance that we'll seek to improve by running some sweeps
Created on November 15|Last edited on November 15
Comment
Section 1
This set of panels contains runs from a private project, which cannot be shown in this report
model: "model"_________________________________________________________________Layer (type) Output Shape Param #=================================================================input_1 (InputLayer) [(None, 32, 32, 3)] 0conv2d (Conv2D) (None, 32, 32, 16) 448tf.nn.relu (TFOpLambda) (None, 32, 32, 16) 0max_pooling2d (MaxPooling2D (None, 16, 16, 16) 0)conv2d_1 (Conv2D) (None, 16, 16, 32) 4640tf.nn.relu_1 (TFOpLambda) (None, 16, 16, 32) 0max_pooling2d_1 (MaxPooling (None, 8, 8, 32) 02D)conv2d_2 (Conv2D) (None, 8, 8, 32) 9248tf.nn.relu_2 (TFOpLambda) (None, 8, 8, 32) 0max_pooling2d_2 (MaxPooling (None, 4, 4, 32) 02D)global_average_pooling2d (G (None, 32) 0lobalAveragePooling2D)dense (Dense) (None, 32) 1056dropout (Dropout) (None, 32) 0last (Dense) (None, 2) 66=================================================================Total params: 15,458Trainable params: 15,458Non-trainable params: 0_________________________________________________________________
As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5%
Add a comment