TIG Logo

Challenge

Optimiser for Neural Network Training

Implement a performant optimizer that accelerates training while meeting rigorous test-loss thresholds.

Overview

The recent surge in Artificial Intelligence (AI) has been largely driven by deep learning—an approach made possible by vast datasets, highly parallel computing power, and neural network frameworks that support automatic differentiation. Neural networks serve as the fundamental building blocks for these complex AI systems.

The training process of a neural network is often incredibly resource-intensive, requiring massive amounts of data and computational power, which translates to significant financial cost. At the heart of this training process lie optimization algorithms based on gradient descent. These algorithms systematically adjust the network's parameters to minimize a "loss function," effectively teaching the model to perform its designated task accurately.

Even small gains in optimisation efficiency translate into shorter training times, lower energy usage, and significant cost savings, underscoring the value of continued research into better optimizers.

Applications

Neural networks now underpin everything from chatbots to self-driving cars, and their training efficiency dictates cost, speed, and energy use. Since nearly all of that training hinges on gradient-descent methods, even small optimizer improvements ripple across AI—and into other fields.

Below are some of the highest-impact domains where faster, more reliable training already yields real-world gains:

  • Large language and multimodal models – Massive chatbots and image generators with hundreds of billions of parameters can shave weeks and millions of dollars off training runs when optimizers become just a few percent more efficient [1].
  • Protein structure prediction & drug discovery – Leaner training pipelines let researchers fold larger protein databases and explore more drug candidates under tight compute budgets [2].
  • Autonomous driving & robotics – Rapidly retraining perception and planning nets on fleets’ weekly data drops delivers safer software to vehicles and robots sooner [3].
  • Real-time recommendation engines – Sharper optimizers cut data-centre power, hardware spend, and user-facing latency for the personalised feeds that dominate the web [4].
  • Global weather and climate forecasting – Neural surrogates now rival traditional supercomputer models; better training enables higher resolution and faster refresh cycles [5].

References

  1. OpenAI. “GPT-4 Technical Report.” arXiv (2023).
  2. Jumper, J. et al. “Highly accurate protein structure prediction with AlphaFold.” Nature 596 (2021): 583-589.
  3. Tesla AI Day 2022. “Training FSD on 10 000 GPUs.”
  4. Naumov, M. et al. “Deep Learning Recommendation Model for Personalization and Recommendation Systems.” arXiv (2019).
  5. Lam, R-M. et al. “Learning skillful medium-range global weather forecasting with GraphCast.” Science 382 (2023): 109-115.