Training vs Inference: Two Phases of AI Emissions
AI emissions fall into two phases. Training is the process of building a model by processing vast datasets on thousands of specialised GPU chips over weeks or months. Training GPT-3 consumed an estimated 1,287 MWh of electricity and produced roughly 552 tonnes of CO2, according to research from the University of Massachusetts Amherst. GPT-4 and newer models are estimated to have required 5-10 times more compute, placing training emissions in the range of 3,000-10,000 tonnes CO2. Inference is when the trained model responds to user queries. While each individual query uses far less energy than training, the cumulative impact of inference is enormous due to scale: ChatGPT alone handles hundreds of millions of queries daily. Industry estimates suggest that inference accounts for 60-90% of total AI compute over a model's lifetime, meaning the ongoing cost of serving AI to users far exceeds the one-off training cost.