Despite its exceptional performance, DeepSeek-V3 demands only 2. 788M H800 GPU hrs for its complete training. Throughout typically the entire training procedure, we did not experience any irrecoverable damage spikes or carry out any rollbacks. We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model taught via large-scale encouragement learning (RL) without supervised …