Introducing Quickvision with Wide Residual Networks

An End to End reproduction of the paper titled "Wide Residual Networks" using Quickvision with the CIFAR-10 dataset. Made by Saurav Maheshkar using W&B
Saurav Maheshkar

Prior to the introduction of Wide Residual Networks (WRNs) by Sergey Zagoruyko and Nikos Komodakis, deep residual networks were shown to have a fractional increase in performance but at the cost of doubling the number of layers. This led to the problem of diminishing feature reuse and overall made the models slow to train. WRNs showed that having a wider residual network leads to better performance and increased the then SOTA results on CIFAR, SVHN and COCO.

👁 Introducing Quickvision

Quickvision is a Computer Vision Library built on Top of Torchvision, PyTorch and Lightning

It provides: -

  1. Easy to use PyTorch native API, for fit(), train_step(), val_step() of models.
  2. Easily customizable and configurable models with various backbones.
  3. A complete PyTorch native interface. All models are nn.Module, all the training APIs are optional and not binded to models.
  4. A lightning API which helps to accelerate training over multiple GPUs, TPUs.
  5. A datasets API to convert common data formats very easily and quickly to PyTorch formats.
  6. A minimal package, with very low dependencies.

Quickvision is just PyTorch!!

Do you want just a model with some backbone configuration?

Do you want to train your model but not write lengthy loops?

Do you want multi GPU training but worried about model configuration?

We'll show you how:-

  1. Quickvision allows you to bring your own Dataset, Model or Code Recipe

  2. You may use models, training functions or Dataset loading utilites from quickvision.

  3. Seamless API to connect with Lightning as well.

  4. Faster Experimentation with same control with PyTorch or Lightning.

  5. Using the wandb.log API to log metrics.

Visit us here over GitHub !

We are happy for new contributions / improvements to our package.

Quickivison is a library built for faster but doesn't compromise PyTorch Training !

‼️ Issues with Traditional Residual Networks

Diminishing Feature Reuse

A Residual block with an identity mapping, which allows us to train very deep networks is a weakness. As the gradient flows through the network there is nothing to force it to go through the residual block weights and thus it can avoid learning during training. This only a few blocks can run valuable representations or many blocks could share very little information with small contributions to the final goal. This problem was tried to be addressed using a special case of dropout applied to residual blocks in which an identity scalar weight is added to each residual block on which dropout is applied.

As we are widening our residual blocks, this results in an increase in the number of parameters, and the authors decided to study the effects of dropout to regularize training and prevent overfitting. They argued that the dropout should be inserted between convolutional layers instead of being inserted in the identity part of the block and showed that this results in consistent gains, yielding new SOTA results.

The paper Wide Residual Networks attempts to answer the question of how wide deep residual networks should be and address the problem of training.

📚 Key Takeaways

The paper highlights a method, giving a total improvement of 4.4% over ResNet-1001 and showing that:-

💪🏻Training

For this tutorial, we'll use the WideResnet model (included in the 0.2.0rc1 version). You can download the stable version of the library using

pip install quickvision

The current stable release 0.1 requires PyTorch 1.7 and torchvision 0.8.1

Quickvision Provides simple functions to create models with pretrained weights.

from quickvision.models.classification import cnn

# To create model with imagenet pretrained weights 
model = cnn.create_vision_cnn("wide_resnet101_2", num_classes=10, pretrained="imagenet")

# Alternatively if you don't need pretrained weights
model_bare = cnn.create_vision_cnn("resnet50", num_classes=10, pretrained=None)

# It also supports other weights, do check a list which are supported !
model_ssl = cnn.create_vision_cnn("resnet50", num_classes=10, pretrained="ssl")

Just like in torch we define the criterion and optimizer

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

Instead of doing something like

model = model.to(device)
for epoch in range(2):
    for batch_idx, (inputs, target) in enumerate(train_loader):
        optimizer.zero_grad()
        inputs = inputs.to(device)
        target = target.to(device)
        out = model(inputs)
        loss = criterion(out, target)
        loss.backward()
        optimizer.step()

Quickvision already implements these boring procedures for you to speed up training !

You can use the .fit() method as shown to train the model using a single line of code !!

history = cnn.fit(model=model, epochs=2, train_loader=train_loader,
        val_loader=valid_loader, criterion=criterion, device=device, optimizer=optimizer)

If you prefer more granular control you can use our train_step() and val_step() methods. We calculate commonly used metrics such as accuracy here for you.

wandb.init(project="intro-to-quickvision")

for epoch in tqdm(range(5)):
    print()
    print(f"Training Epoch = {epoch}")
    train_metrics = cnn.train_step(model, train_loader, criterion, device, optimizer)
    print()
    wandb.log({"Training Top1 acc": train_metrics["top1"], "Training Top5 acc": train_metrics["top5"], "Training loss": train_metrics["loss"]})

    print(f"Validating Epoch = {epoch}")
    valid_metrics = cnn.val_step(model, valid_loader, criterion, device)
    print()
    wandb.log({"Validation Top1 acc": valid_metrics["top1"], "Validation Top5 acc": valid_metrics["top5"], "Validation loss": valid_metrics["loss"]})

Section 8

You can also train with Lightning!

Quickly Prototype with Torch, transfer it to Lightning !

model_imagenet = cnn.lit_cnn("resnet18", num_classes=10, pretrained="imagenet")

gpus = 1 if torch.cuda.is_available() else 0

# Again use all possible Trainer Params from Lightning here !!
trainer = pl.Trainer(gpus=gpus, max_epochs=2)
trainer.fit(model_imagenet, train_loader, valid_loader)

Section 2