Challenge

Optimiser for Neural Network Training

Implement a performant optimizer that accelerates training while meeting rigorous test-loss thresholds.

Overview

The recent surge in Artificial Intelligence (AI) has been largely driven by deep learning—an approach made possible by vast datasets, highly parallel computing power, and neural network frameworks that support automatic differentiation. Neural networks serve as the fundamental building blocks for these complex AI systems.

The training process of a neural network is often incredibly resource-intensive, requiring massive amounts of data and computational power, which translates to significant financial cost. At the heart of this training process lie optimization algorithms based on gradient descent. These algorithms systematically adjust the network's parameters to minimize a "loss function," effectively teaching the model to perform its designated task accurately.

Even small gains in optimisation efficiency translate into shorter training times, lower energy usage, and significant cost savings, underscoring the value of continued research into better optimizers.

Our Challenge

TIG’s neural network optimizer challenge asks innovators to implement an optimizer that plugs into a fixed CUDA-based training framework and trains a multi-layer perceptron (MLP) on a synthetic regression task. The goal is to obtain a low test loss.

  • Task: Minimize MSE on a held-out validation set during training; final acceptance is based on test loss. The data has the following split: Train = 1000, Validation = 200, Test = 250.
  • Model: MLP with batch normalization. Hidden layer width is fixed at 256; the number of hidden layers num_hidden_layersnum\_hidden\_layers is configurable and determines the size of the problem.
  • Training budget: Batch size 128, up to 1000 epochs, early stopping with patience 50.

Applications

Neural networks now underpin everything from chatbots to self-driving cars, and their training efficiency dictates cost, speed, and energy use. Since nearly all of that training hinges on gradient-descent methods, even small optimizer improvements ripple across AI—and into other fields.

Below are some of the highest-impact domains where faster, more reliable training already yields real-world gains:

  • Large language and multimodal models – Massive chatbots and image generators with hundreds of billions of parameters can shave weeks and millions of dollars off training runs when optimizers become just a few percent more efficient [1].
  • Protein structure prediction & drug discovery – Leaner training pipelines let researchers fold larger protein databases and explore more drug candidates under tight compute budgets [2].
  • Autonomous driving & robotics – Rapidly retraining perception and planning nets on fleets’ weekly data drops delivers safer software to vehicles and robots sooner [3].
  • Real-time recommendation engines – Sharper optimizers cut data-centre power, hardware spend, and user-facing latency for the personalised feeds that dominate the web [4].
  • Global weather and climate forecasting – Neural surrogates now rival traditional supercomputer models; better training enables higher resolution and faster refresh cycles [5].

References

  1. OpenAI. “GPT-4 Technical Report.” arXiv (2023).
  2. Jumper, J. et al. “Highly accurate protein structure prediction with AlphaFold.” Nature 596 (2021): 583-589.
  3. Tesla AI Day 2022. “Training FSD on 10 000 GPUs.”
  4. Naumov, M. et al. “Deep Learning Recommendation Model for Personalization and Recommendation Systems.” arXiv (2019).
  5. Lam, R-M. et al. “Learning skillful medium-range global weather forecasting with GraphCast.” Science 382 (2023): 109-115.