New Study Challenges Hype Around KAN's

Are KAN's actually better than MLP layers?
Created on July 31|Last edited on July 31
Comment
Kolmogorov–Arnold Networks (KANs) have garnered significant attention in the AI community for their innovative use of learnable B-spline activation functions. However, a new study from the National University of Singapore titled "KAN or MLP: A Fairer Comparison" offers a more nuanced perspective, suggesting that the hype may be somewhat misplaced. The researchers, Runpeng Yu, Weihao Yu, and Xinchao Wang, provide a comprehensive and fair comparison between KANs and traditional Multi-Layer Perceptrons (MLPs), challenging the prevailing enthusiasm.
MethodologyTo ensure a fair comparison, the researchers controlled the number of parameters and floating-point operations (FLOPs) between KANs and MLPs. They conducted experiments across various domains, including machine learning, computer vision, NLP, audio processing, and symbolic formula representation. The models were trained using the Adam optimizer with specific batch sizes and learning rates on an RTX3090 GPU. Ablation studies were performed to dissect the impact of different architectural components and activation functions.
Key FindingsSymbolic Formula Representation:
KANs outperform MLPs in symbolic tasks due to their B-spline activation functions. This unique feature allows KANs to handle symbolic formulas more effectively, achieving lower root mean square error (RMSE) in these tasks. The study demonstrated that when MLPs are equipped with B-spline activation functions, their performance in symbolic formula representation improves significantly, often matching or surpassing that of KANs.
Machine Learning, Computer Vision, NLP, and Audio Processing
Contrary to the hype around KANs, the study reveals that MLPs consistently outperform KANs in machine learning, computer vision, natural language processing (NLP), and audio processing tasks. In controlled experiments across multiple datasets, MLPs showed higher accuracy and more robust performance. For instance, in computer vision tasks, the inherent computational complexity and additional parameters of KANs did not translate into better performance, making MLPs the superior choice.
Activation Functions and Architectural Differences
The primary advantage of KANs lies in their use of B-spline activation functions. The researchers hypothesized and confirmed that this difference is the main factor behind KANs' performance in symbolic tasks. When MLPs used B-spline activation functions, they performed equally well in symbolic tasks but did not see the same benefits in other domains. This indicates that the structure and order of operations in MLPs are better suited for general machine learning tasks.
Continual Learning
The study also explored the performance of KANs and MLPs in continual learning scenarios. In class-incremental setups using the MNIST dataset, KANs exhibited more severe forgetting issues compared to MLPs. After training on subsequent tasks, KANs' accuracy on earlier tasks dropped significantly, whereas MLPs retained better performance across all tasks. This suggests that MLPs are more reliable for applications requiring continual learning and memory retention.
Implications for Future ResearchThe findings of this study provide critical insights for future research on neural network architectures. The performance differences highlight the importance of activation functions and suggest that further exploration into different activation functions could yield significant improvements in various tasks. Additionally, addressing the forgetting issues in KANs could make them more viable for continual learning applications.
ConclusionWhile KANs have shown promise in specific areas, this study suggests that the traditional MLP remains a more versatile and effective choice for a broad range of machine learning tasks. The hype surrounding KANs may need to be tempered with these new insights, guiding future research towards optimizing neural network architectures for diverse applications.
﻿
The paper: https://arxiv.org/pdf/2407.16674v1﻿
﻿
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.