Show HN: Train CIFAR10 to 94% in under 10 seconds on a single A100, world record https://ift.tt/NfkJziQ
Show HN: Train CIFAR10 to 94% in under 10 seconds on a single A100, world record Hi, My career is currently in this field, and I created this project as (effectively, among other things) a living resume, and to also be a really great workbench for hacking/experimenting on different methods. Testing and getting a feel for how different methods work within this framework is truly a delight, and quite simple/fast. Additionally, generally speaking, many of the mathematical concepts should transfer, so this (for me) has been a really great proving grounds in testing out how something might work in a different place in the real world. We hope to get under 2 seconds of training time (for 94%) within about two years or so, so stay tuned for updates as we continue to push more changes that take us faster and faster than our starting point of ~18.1 seconds or so. By the way, this architecture and training hyperparameters do indeed scale well, just increase epochs from 10->80 and base_depth from 64->128 and you'll have about 95.77% accuracy in about 188 seconds or so (just over 3 minutes :D). That alone is a huge boon! Great to see scaling laws working well within this very, very tight hyperparameter resolution. Feel free to let me know if you have any questions, Hacker News always seems to get me the most traffic. I really love talking about this project, and can't really seem to find anyone to nerd out about it with. This is very, very cool stuff! So feel free to leave a comment, and I'd love to jump in and chat about it! :D :) <3 <3 :)))) https://github.com/tysam-code/hlb-CIFAR10 January 30, 2023 at 02:58AM
No comments