top of page

Frugal AI & Climate: Uncovering the True Carbon Footprint

  • Apr 22
  • 8 min read

Updated: Apr 24

AI Hits the Energy Wall: The Datacenter Boom

The International Energy Agency (IEA) sounded the alarm in 2024: the electricity consumption of global datacenters could double by 2026. This massive surge, largely driven by artificial intelligence, brings the tech industry face-to-face with a physical and ecological wall.

I. The AI & Climate Paradox

Artificial intelligence presents a dual face in the climate emergency. On one hand, it acts as a powerful problem solver, optimizing energy grids, accelerating the discovery of new materials for the energy transition, and refining climate forecasts. On the other hand, it is a concerning accelerator of emissions.

Google's 2024 Environmental Report clearly illustrates this tension: a 13% increase in greenhouse gas emissions year-over-year, culminating in a 48% jump since 2019, directly linked to the integration of AI into its core products. Beyond carbon, the impact is also measured in water. The study by Shaolei Ren et al. (2023) highlighted the massive water footprint of training large models, which require millions of liters of freshwater to cool data center infrastructure.

II. What is Frugal AI?

Faced with these facts, the concept of Frugal AI, or "Green AI", emerges as a necessity. The goal is not to abandon AI, but to rethink its design and deployment to minimize its ecological footprint while maintaining high performance.

Frugal AI relies on several optimization techniques:

  • Quantization: reducing the precision of the numbers used in the model weights (e.g., dropping from 16-bit to 8-bit or 4-bit) to drastically lower memory usage and computation.

  • Distillation: training a smaller model (the student) to mimic the predictions of a massive model (the teacher), retaining most of the capabilities at a fraction of the cost.

  • Small Language Models (SLMs): designing architectures that are natively smaller and specialized, often proving far more efficient for targeted tasks than generalized behemoths.

  • Edge Computing: shifting inference to the end-user's device rather than processing everything on distant cloud servers.

  • Mixture of Experts (MoE): activating only a small subset of the neural network's parameters for each query, heavily reducing inference energy.

III. What the Research Papers Say

Scientific literature extensively documents this urgency and the potential solutions. As early as 2019, Schwartz et al. laid the groundwork with their paper "Green AI", urging the community to treat efficiency as a primary evaluation metric alongside accuracy.

More recently, Luccioni, Jernite, and Strubell (2024) demonstrated in "Power Hungry Processing" that the nature of the task massively influences consumption: generative and complex reasoning tasks consume exponentially more than simple classification. Sasha Luccioni, via HuggingFace, actively advocates for an "AI Energy Star Rating", noting that reasoning models can consume up to 30 times more energy.

The industry is responding. DeepMind, with Gemma 2 (9B parameters), proved that a significantly reduced model could rival much heavier systems (achieving an MMLU of 71.3% against 70% for major older generation models), requiring 16 times fewer resources. In France, AFNOR published the SPEC 2314 in 2024, the first normative framework establishing a methodological standard for frugal AI.

IV. Benchmarks: Do Small Models Really Hold Up?

Academic and crowd-sourced benchmarks (MMLU, HumanEval, MT-Bench, GPQA, Arena Elo) now provide objective proof that frugal models have reached an impressive level of maturity in 2026. Far from being "watered-down" versions, these optimized architectures rival former behemoths.

Here is a comparative overview of public performances:

  • GPT-3.5 (2023 baseline): ~175B parameters | MMLU: 70% | HumanEval: 48% | Arena Elo: ~1100 (Source: OpenAI public)

  • Gemma 2 9B-IT: 9B parameters | MMLU: 71.3% | HumanEval: — | Arena Elo: ~1200 (Source: Google 2024)

  • Phi-3 Medium 14B: 14B parameters | MMLU: 78% | HumanEval: 62% | Arena Elo: ~1120 (Source: Microsoft 2024)

  • Llama 3.1 8B-Instruct: 8B parameters | MMLU: 69% | HumanEval: 72% | Arena Elo: ~1180 (Source: Meta 2024)

  • Mistral Small 22B (24.09): 22B parameters | MMLU: ~74% | HumanEval: ~68% | Arena Elo: ~1210 (Source: Mistral 2024)

  • Qwen 2.5 7B-Instruct: 7B parameters | MMLU: 74% | HumanEval: 83% | Arena Elo: ~1200 (Source: Alibaba 2024)

  • GPT-4o (2024 premium baseline): ~200B+ parameters | MMLU: ~88% | HumanEval: ~90% | Arena Elo: ~1290 (Source: OpenAI public)

This analysis reveals key points. First, models in the 7-22B class largely surpass GPT-3.5 across all mainstream benchmarks. On specific tasks (coding via HumanEval, reasoning via MMLU, or chat via the Arena), 2026's small models hold parity or even beat the former premium models with 10 to 30 times fewer active parameters.

The remaining gap with current giants like GPT-4o or Claude Opus lies primarily in long context reasoning and complex multimodal tasks. However, these use cases remain a minority in enterprise environments.

For core ESG/CSRD applications (document classification, entity extraction, structured Q&A), targeted fine-tuning on models like Gemma 2 9B or Mistral Small 22B achieves scores that are superior to a "vanilla" non-fine-tuned GPT-4o.

But raw performance doesn't tell the whole story — we also need to talk about deployment…

V. The 3 Factors of Carbon Footprint per Token

To truly understand the impact of an AI query, we must deconstruct the metric. The carbon footprint per token is not solely dictated by model size. It depends on three strict, multiplicative factors:

  1. Model Size (FLOPs/token): This is the intrinsic, "frugal" factor. Public data reveals a clear hierarchy. Small models (8-20B) require about 0.1 to 0.3 Joules per token. Medium models (70-200B) demand 1 to 3 J/token (5 to 10 times more). The massive commercial models exceeding 300B parameters consume 3 to 10 J/token. Shifting from a colossal commercial US model to an optimized 7-22B open-source EU model can slash the energy per token by 5 to 15 times.

  2. Carbon Intensity of Electricity (gCO₂eq/kWh): This is the geographical factor, determining whether the consumed energy comes from coal or nuclear.

  3. GPU Utilization Rate: This is the operational factor, consistently ignored in mainstream discussions.

VI. The Utilization Rate Trap

This is the blind spot of Frugal AI. A virtual machine equipped with a datacenter GPU, like an A100, draws between 500 and 700 W continuously the moment it is powered on. This baseline "idle" consumption exists regardless of whether the server processes one query or a thousand per second.

If your GPU server operates at a mere 10% utilization rate, the effective energy spent for every token generated is multiplied by 10. In this scenario, a "small, frugal model" deployed on an underutilized server pollutes just as much per query as a giant model shared in a massive cloud environment. At 1% utilization, the footprint is catastrophic—far worse than using an oversized commercial model.

The takeaway is blunt: a "frugal model" does not equal a "frugal deployment". Infrastructure sizing matters just as much as neural architecture.

To mitigate this, several technical levers are essential. Pooling resources by hosting multiple models on the same VM maximizes GPU utilization. Utilizing technologies like Prefix caching (vLLM) reuses computations for common system prompts, cutting energy by 30 to 70% on repetitive tasks. Finally, spinning up and down on demand trades off latency (a cold start of 2-3 minutes) for carbon savings. Continuous measurement via open tools like CodeCarbon, combined with ElectricityMaps, becomes critical to steering this efficiency.

VII. The Geography of Frugal AI

The physical location of the server radically alters the carbon impact, assuming the identical model and utilization rate. According to ElectricityMaps data for European cloud regions:

  • Sweden Central — gCO₂eq/kWh : ~15

  • France Central — gCO₂eq/kWh : ~55

  • North Europe (Dublin) — gCO₂eq/kWh : ~290

  • West Europe (Amsterdam) — gCO₂eq/kWh : ~350

  • US East — gCO₂eq/kWh : ~350-400

Hosting a model in "France Central" emits roughly 6 times less than in Amsterdam, thanks to the French nuclear and renewable energy mix. Choosing "Sweden Central" is nearly 20 times cleaner. Geography is an immediate lever for European AI carbon sovereignty.

The geography of AI: between decarbonized cloud regions and high-carbon zones.

VIII. What DT Master Does

Applying these principles yields a direct commercial ROI, not just environmental benefits. At DT Master, our approach is deeply strategic. In fine-tuning our assistant Emmy for CSRD compliance analysis (covering 13 reporting frameworks and over 10,000 document chunks), we made radical engineering choices.

We leverage a sovereign European mid-sized model (Gemma 4 run locally), deployed exclusively in cloud regions with extremely low carbon intensity. Through CodeCarbon telemetry, the footprint of every single query is tracked and documented. The result is a solution up to 10 times leaner than generic US cloud alternatives. For our clients in the ESG, chemical, energy, or financial sectors—who are mandated to report their Scope 3 emissions (which includes IT vendors)—this verifiable carbon transparency becomes a compelling ESG sales argument.

IX. The Rebound Effect (Jevons Paradox)

However, technological efficiency carries its own trap, theorized back in 1865 by economist William Stanley Jevons (the Jevons Paradox). By making AI more frugal, faster, and cheaper, we inevitably stimulate its usage.

While the carbon footprint per query undeniably decreases with frugal AI, if the total volume of queries explodes—multiplying by a hundred or a thousand—the absolute footprint of the organization will increase. Frugal AI is therefore a necessary condition, but absolutely not sufficient for low-carbon IT. Without strict usage governance, the unit gains will vanish under the weight of volume.

X. 5 Actionable Recommendations for ESG/GRC Leaders in 2026

  1. Audit Actual GPU Utilization: Don't just look at model sizes; measure your AI infrastructure's utilization rate. Under-utilization is the primary source of waste.

  2. Enforce the Geographical Criterion: Mandate that your AI deployments, or those of your vendors, are located in low-carbon cloud regions (like France or Sweden).

  3. Demand Scope 3 Transparency: Require your AI-integrated software (SaaS) vendors to provide the exact carbon footprint per query, measured by open standards like CodeCarbon.

  4. Prefer Specialization over Generalization: For targeted tasks (classification, data extraction), prioritize models under 30B parameters.

  5. Establish Frugality Governance: Integrate the principles of AFNOR SPEC 2314 into your procurement requirements and educate your teams on the Jevons paradox.

AI will not escape planetary boundaries. The shift toward sober AI is no longer merely an ethical question; it is an engineering and compliance imperative.

🤖 AI transparency: this article was drafted with the help of Lili, DT Master's AI marketing agent (powered by an advanced LLM model), then reviewed and validated by our editorial team. The underlying compliance analysis is powered by Emmy, our AI assistant specialized in CSRD, ESRS, GDPR, AI Act, DSA, and DORA compliance. In line with our ESG commitments and the European AI Act framework, we systematically document our use of AI in publications.

💬 Want your own Frugal AI model?

At DT Master, we help enterprises design and deploy their own frugal AI systems — fine-tuned on your proprietary data, hosted in low-carbon European cloud regions, with CodeCarbon telemetry baked in for Scope 3 CSRD reporting. Our AI assistant Emmy covers a broad compliance perimeter (CSRD, ESRS, GDPR, AI Act, DSA, DORA) to secure your ESG and AI deployments. Whether you need a CSRD document classifier, an ESG assistant, or a sovereign alternative to US generalist LLMs, we can scope and prototype it with you. Book a 30-min discovery call with our team →

Bibliography

  1. IEA (2024) — Electricity 2024 — https://www.iea.org/reports/electricity-2024

  2. Shaolei Ren et al. (2023) — Making AI Less 'Thirsty' — https://arxiv.org/abs/2304.03271

  3. Google (2024) — Environmental Report 2024

  4. Sasha Luccioni / HuggingFace (2024) — AI Energy Star Rating

  5. AFNOR SPEC 2314 (2024) — IA frugale référentiel français

  6. Google DeepMind (2024) — Gemma 2: Open Models Based on Gemini Technology

  7. Schwartz et al. (2019) — Green AI — https://arxiv.org/abs/1907.10597

  8. Luccioni, Jernite, Strubell (2024) — Power Hungry Processing: Watts Driving the Cost of AI Deployment? — https://arxiv.org/abs/2311.16863

  9. CodeCarbon (MILA/Hugging Face/BCG) — https://codecarbon.io/

  10. ElectricityMaps API — https://www.electricitymaps.com/

  11. Jevons, William Stanley (1865) — The Coal Question (Jevons Paradox)

  12. HuggingFace Open LLM Leaderboard — https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

  13. Artificial Analysis — Independent LLM benchmark — https://artificialanalysis.ai/

  14. Papers With Code — MMLU Leaderboard — https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu

 
 
bottom of page