Since the paper Attention is all you need, published in 2017, the landscape of NLP completely shifts towards transformers based architecture.
In 2020, most computer vision models still rely solely on convolutional neural networks to detect and segments images. We predict that 2021 will be an important milestone for detection and segmentation algorithms. Convolution mixed with transformers will become the default choice for most practitioners.
Waiting for this new year, we decided to open-source a DETR (Object Detection with Transformers) Tensorflow implementation, including code for inference, finetuning, and training!