Skip to main content

Reporting on first findings

Solid performance that we'll seek to improve by running some sweeps
Created on November 15|Last edited on November 15

Section 1


This set of panels contains runs from a private project, which cannot be shown in this report

model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 32, 32, 3)] 0
conv2d (Conv2D) (None, 32, 32, 16) 448
tf.nn.relu (TFOpLambda) (None, 32, 32, 16) 0
max_pooling2d (MaxPooling2D (None, 16, 16, 16) 0
)
conv2d_1 (Conv2D) (None, 16, 16, 32) 4640
tf.nn.relu_1 (TFOpLambda) (None, 16, 16, 32) 0
max_pooling2d_1 (MaxPooling (None, 8, 8, 32) 0
2D)
conv2d_2 (Conv2D) (None, 8, 8, 32) 9248
tf.nn.relu_2 (TFOpLambda) (None, 8, 8, 32) 0
max_pooling2d_2 (MaxPooling (None, 4, 4, 32) 0
2D)
global_average_pooling2d (G (None, 32) 0
lobalAveragePooling2D)
dense (Dense) (None, 32) 1056
dropout (Dropout) (None, 32) 0
last (Dense) (None, 2) 66
=================================================================
Total params: 15,458
Trainable params: 15,458
Non-trainable params: 0
_________________________________________________________________

As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5%