At Blue River Technology, we are building the next generation of smart machines. Farmers use our tools to control weeds and reduce costs in a way that promotes agricultural sustainability. Our weeding robot integrates cameras, computer vision, machine learning and robotics to make an intelligent sprayer that drives through fields (using AutoTrac to minimize the load on the driver) and quickly targets and sprays weeds, leaving the crops intact.
I experimented using different PyTorch solvers, Adam and SGD, and you can see my results tracked here in W&B. Using SGD and momentum the question is this: Can I find a momentum setting for SGD that beats Adam?
I used the same training and test data and compared the F1 score for plants between different runs. I set up a number of runs with SGD as the solver and sweeping through momentum values from 0–0.99 (when using momentum, anything greater than 1.0 causes the solver to diverge). I set up 10 runs with momentum values from 0 to 0.9 in increments of 0.1. Following that, I performed another set of 10 runs, this time with momentum values between 0.90 and 0.99, with increments of 0.01. After looking at these results, I also ran a set of experiments at momentum values of 0.999 and 0.9999. Each run was done with a different random seed and was given a tag of “SGD Sweep” in W&B. The results are shown below.
In the bar chart on the left, the x-axis is the F1 score and the y-axis is the experiment name. On the right, the scatter plot shows the F1 score as a function of momentum.
It's clear that larger values of momentum are increasing the f1 score. The best value of 0.9447 occurs at a momentum value of 0.999 and drops off to a value of 0.9394 at a momentum value of 0.9999. You can see the table of experiments in the charts below.
How do these results compare to Adam? To test this I ran 10 identical runs using torch.optim.Adam with default parameters. I used the tag “Adam runs” in W&B to identify these runs. I also tagged each set of SGD runs for comparison. Since a different random seed is used for each run, the solver will initialize differently each time and will end up with different weights at the last epoch. This gives slightly different results on the test set for each run. To compare them I will need to measure the spread of values for the Adam and SGD runs. This is easy to do with a box plot grouped by tag in W&B.
Here's a box plot with the spread of values for Adam and SGD. The Adam runs are shown on the left of the graph in green. The SGD runs are shown as brown (0.999), teal (0–0.99), blue (0.9999), and yellow (0.95).
You can see that I haven’t been able to beat the results for Adam by just adjusting momentum values with SGD. The momentum setting of 0.999 gives very comparable results, but the variance on the Adam runs is tighter and the average value is higher as well. So Adam appears to be a good choice of solver for our plant segmentation problem!