Skip to main content

ChatGPT vs Grammarly: Who Can Fix Grammar Better?

Created on March 27|Last edited on March 29
This paper tests ChatGPT on grammar error correction (GER) and compares its performance to Grammarly.

Grammar Error Correction

GER is the task of correcting errors ranging from spelling, grammar, punctuation, and word choice.
The table below shows different types of GER errors.

A few benchmark datasets have been made to test GEC techniques.
CoNLL-2014 : short English text written by Non-English speakers that are to be corrected by GEC techniques
BEA-2019 : same as CoNLL-2014 but with a new corpus of data called Write&Improve+LOCNESS which has more diverse non-native English text
JFLEG : represents a broad range of English proficiency texts; holistic edits to not just correct text but make it sound native

Method

They evaluated on the CoNLL-2014 dataset with 3 metrics: precision, recall and f0.5f_{0.5} score (which prioritizes precision more than recall). They compared ChatGPT against 2 other systems on varying sentence lengths. They found that the prompt "Do grammatical error correction on all the following sentences I type in the conversation." worked the best for ChatGPT.

Results



They also tested GECToR and ChatGPT with Grammarly and the number of over-corrections, under-corrections, and mis-corrections.

In short, they found the following:
  • ChatGPT is comparable to existing GEC methods
  • it tends to over-correct more than under-correct or mis-correct
  • it struggles more with longer sentences

Conclusion

In the future, grammar correction systems like Grammarly will most likely adopt and leverage LLMs more if they haven't already. Instead of building narrow AI applications to fix specific aspects like grammar, punctuation, or word choice, companies like Grammarly may just have 1 enormous AI backbone like ChatGPT that could attend to all of these tasks at the same time.

References

Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.