Visual Acuity Progress Report
June 15, 2021
Update on the state of the machine learning experiments for the Visual Acuity project.
Created on June 15|Last edited on June 18
Comment
Providing this report as an update for the Browne Lab and Pierre on the current state of the project, questions, and potential avenues for future exploration. Containing some background information on the deep learning techniques and methodologies used to provide context, if needed, for Andrew and Anderson.
Image Data
Labels
Images in the original dataset were rich with information, and I wanted to preserve that while conducting experiments without all of it being directly used in training.
Currently, the model is being trained to predict only the label Optotype, but I additionally provided these other labels during preprocessing to both preserve the rich information in the original dataset, and to use them for later experimentation or graph creation if desired.
Optotypes
59 categories
2, 3, 5, 6, 8, 9, C, D, V, K, P, E, H, O, T, R, N, S, Z, F, LregularCircle, LgradientCircle, SregularCircle, MgradientCircle, SgradientCircle, MregularCircle, grayCircle, apple, flat-square, +diamond, cake, +circle, x-diamond, x-circle, horse, circle, car, phone, house, frown-square, +blank, cup, bird, panda, flat-line, cow, x-blank, square, frown-line, +square, x-square, duck, train, tree, smile-line, smile-square, star, hand
Note on differences from original categorization
Angles
Some Optotypes from the original dataset such as C-0, C-45, ..., C-315 and E-0, E-90, ..., E-270 were categorized as C and E respectively. The angles associated with that optotype were pulled out into the Angles label.
The goal with this separation was to attempt to make the machine categorization more similar to what I imagine the human categorization is. For example, a human would be unlikely to misclassify a '3' as 'E-270.' However if it makes comparison between the human trials more cumbersome or is otherwise undesired, I can switch back to the original labels provided.
Teller
LregularCircle, LgradientCircle, SregularCircle, MgradientCircle, SgradientCircle, MregularCircle, grayCircle
Teller optotypes were included with the size. This is because the size actually refers to the thickness of the lines, which is a feature of the optotype. The image Size label for all Teller images were categorized as Small.
Angles
Used to extract the angle from the attached Optotype. If no angle is provided in the Optotype, angle is set to 0.
12 categories
0, 22.5, 45, 67.5, 90, 112.5, 135, 157.5, 180, 225, 270, 315
Non-Zero Angle Examples
- C-45, ..., C-315
- C-45 becomes Optotype:C, Angle:45
- E-90, ..., E-270
- (SML)-grad-22.5, ..., (SML)-grad-157.5
- (SML)-reg-22.5, ..., (SML)-reg-157.5
Acuities
Typeset, extracted from original MultipleChoice dataset.
15 categories
A, C, E, ETDRS, ETL-face, ETL-x, HOTV, L, NL, NPV, P, SSa, SSl, Teller, W
Augmentation
Image augmentation technique applied on top of original image, if any.
4 categories
None, grayscale, 120% brightness, 120% contrast, 80% brightness, 80% contrast
Character
Character type that the given image's optotype belongs to.
4 categories
numeric, alpha, teller, wingding
Mapping
numeric: 2, 3, 5, 6, 8, 9,
alpha: C, D, V, K, P, E, H, O, T, R, N, S, Z, F,
teller: LregularCircle, LgradientCircle, SregularCircle, MgradientCircle, SgradientCircle, MregularCircle, grayCircle,
wingding: apple, flat-square, +diamond, cake, +circle, x-diamond, x-circle, horse, circle, car, phone, house, frown-square, +blank, cup, bird, panda, flat-line, cow, x-blank, square, frown-line, +square, x-square, duck, train, tree, smile-line, smile-square, star, hand
Notes
Though currently unused, this could provide us additional data with how well certain character types performed in relation to another. For example, if we wanted to test hypotheses related to literacy, we could use the "alpha" label. Since the Teller images were unique, this is also a character category.
Distortion
Level of distortion of the given image.
2 categories
low, high
Mapping
low: 0-0-0, 0-1-0, 0-1-45, 0-1-90, 0-1-135, 0-3-0, 0-3-45, 0-3-90, 0-3-135, 0-6-0, 0-6-45, 0-6-90, 0-6-135, 2-0-0, 2-1-0, 2-1-45, 2-1-90, 2-1-135, 2-3-0, 2-3-45, 2-3-90, 2-3-135, 4-0-0, 4-1-0, 4-1-45, 4-1-90, 4-1-135
high: 2-6-0, 2-6-45, 2-6-90, 2-6-135, 4-3-0, 4-3-45, 4-3-90, 4-3-135, 4-6-0, 4-6-45, 4-6-90, 4-6-135, 6-0-0, 6-1-0, 6-1-45, 6-1-90, 6-1-135, 6-3-0, 6-3-45, 6-3-90, 6-3-135, 6-6-0, 6-6-45, 6-6-90, 6-6-135
Size
Size of image relative to the center of the frame, provided from folder of original dataset.
3 categories
S, M, L
Examples of Labels
Training/ Validation Set
Example set of images from the training/validation set with their labels.

Testing Image Set

Training/ Testing/ Validation Sets
Testing
- Size: 12,834
- S, M, and L images with high distortion.
- S, M, images with low distortion.
Run set
1
Training
- Size:
- Original: 3,753
- With augmentation: 13,608
- L images with low distortion.
- S Teller images with low distortion.
- Special Case: This is because all "Teller" images were categorized as S. See "Labels: Optotypes" for more info.
Run set
1
Training/ Validation Split
20% of the training set above was reserved for validation of the model. The remaining 80% was used for training.
Training Size
- With augmentation: 12,834
Validation Size
- With augmentation: 2,722
Data Augmentation
Data augmentation was performed on the training set in order to combat the relative low ratio of training to testing data.
Models are currently being trained with augmented training images, but if desired it can be trained without the augmented images.

Split between training and testing data before and after data augmentation.
Augmentation Techniques
Augmentation techniques were chosen carefully in order to preserve the original distortions in the dataset.
Convert to grayscale
Brightness Increased 20%
Brightness Decreased 20%
Contrast Increased 20%
Contrast Decreased 20%
Class Distribution
Before augmentation, the distribution of Optotypes is as follows:
Training Images

Training Images (with augmentation)
To attempt to provide more training data and fix the data imbalances, Augmentation was performed on targeted classes.
However, as can be seen from the graph below, the augmentation was not correctly performed because the classes are still unevenly distributed, and this needs to be re-done.

Testing Images

Experiments
Transfer Learning
To achieve results, the primary method used was Transfer Learning.
All models were pre-trained using ImageNet, a dataset of nearly 14 million images commonly used as a benchmark in machine learning.
In this initial experiment, the weights from the base model were kept set to be "untrainable", meaning, the .
Run set
7
Testing
Run set
7
Best Model Predictions
This table shows all of the predictions on the Test image dataset given by the Xception model, which quantitatively achieved the best results.
Legend
output:max_class.label : Optotype that the model predicted with the highest probability
output:score.OPTOTYPE : Probability the model predicted that the image is OPTOTYPE
Note: This table is quite large, it contains all 12,834 images in the test set. Working on making an easier-to-digest table, but for now you can look through this. By default, the images in the table are grouped by their optotype.
Run: Xception
1
Analysis
I'm currently working on getting a confusion matrix and other more useful graphs to analyze the large amount of predictions, but after going through the table here are some patterns I found.
Quality of Predictions
Though 20% overall does not sound great, many of these images are difficult to discern.
Unsurprisingly, the model tends to do best on the lower-distorted images.

Does very well on low-distortion images and wingdings.
Predicted WingDings more than humans
The model tended to predict characters from the Teller set or Wingdings quite a bit. I believe this is different from what we'd see in the human results, because an "apple" usually conjures up 3-D images of a red "real" apple and not a wingding. Similarly for the Teller optotypes, most people would not intuitively know what that means and wouldn't offer it up as a guess.

Model confused "apple" with "C", not likely to happen in human predictions, though understandable when purely comparing characters provided.

Predicting "C" when was an "apple."

Similar phenomenon between K and star.
Did well on highly distorted wingdings
For highly distorted images, such as this train, the model did perhaps a superhuman job. This could also be because as a human we do not necessarily

Superhuman Performance on Teller optotypes
An area that almost certainly will differ from the human trials, the model did an exceptional job of distinguishing Teller optotypes, even high distortion ones.

Model excelled on the Teller images such as this.
Sub-human performance on distinguishing ETL-x

Even for some of the lower distortion S/M images, the model had difficulty distinguishing between +circle or +blank, for example.

Incorrectly predicted +blank when clearly was not blank.
For this particular example, the other "+" optotypes were in the top 4 predictions.
Top 4 Probabilities
+circle: 0.2405
+diamond: 0.03841
+square: 0.1387
+blank: 0.5819
Human-Like Model Predictions
Although quantitatively, Xception produced the best results, one of the goals of the project is to analyze these results in conjunction with human results.
Generally, I would describe the way this model performs as a highly illiterate person who is good at multiple choice. I hypothesize that a literate individual would tend to more frequently predict optotypes from the "alpha" or "numeric" categories as opposed to the wingdings, because when we as humans think of "apple" or "tree", one almost certainly does not conjure up images of the wingding characters presented, but of real-life 3D objects. The machine, not incorrectly, but certainly not in a human-like manner, notices common traits of the wingdings/Teller optotypes and tends to over-predict this when presented with a distorted image. I think a literate human would tend to guess "O" or "T" in these cases, for instance, because the mental image associated with alphanumeric characters is inherently similar to the data presented.
Someone who is not literate might make some similar predictions if those wingdings are equally unfamiliar to them, however this person would still most likely face the same obstacle as described above, "tree" probably doesn't mentally equal a black-and-white text image in their mind.
Interestingly, the model did better than average on the Teller images, even on images with relatively high distortion. I find this interesting because I read that this test is designed for people with low cognitive abilities/infants?(please correct me if I'm wrong)! If so, perhaps the model "predicts" similarly to someone who is cognitively similar to someone who would take a Teller test.
If the model were a person, the results of this experiment paint a picture of someone who doesn't inherently recognize alphanumeric characters, any more than they might recognize the Teller or Wingding characters, as if this test were the first time seeing all of them.
Questions
Questions for Browne Lab Experiments
Which optotypes were commonly "confused" in the human trials?
In the our trials, the machine commonly predicted characters from the "wingdings" or "teller" set. Do you have any data on which characters have been confused most frequently in the
Can literacy affect the results of the test?
Ideas for Follow-Up Experiments
- Need to re-do this experiment with data augmentation performed to balance the classes
- This experiment only used transfer learning with no unfrozen weights--need to try fine-tuning to see if that
- Include Optotype angle in a separate, multi-class prediction.
- Test if pre-recognition of characters might equal better performance. Could run the transfer models with pre-trained weights on MNIST, for instance.
- Other??
Further Agenda/ Followup Questions
These are some things I need to fix or keep working on, keeping track of some of them here
Visualizations and Graphs
- Confusion Matrix
- More interpretable graph of the final predictions -- find out what they are envisioning for the final report.
- Optotypes and their most commonly predicted optotypes.
- Visualization with the other characters.
- What else would be useful for them?
Experiments
- Unfreeze some of the weights for the models
- Need to look into why Resnet was so low--might need to reconfigure the TF preprocessed images
- Try running without data augmentation and change the weights as Pierre suggested
Add a comment