GPTFootprint: Increasing Consumer Awareness of the Environmental Impacts of LLMs

Nora Graves Princeton UniversityPrincetonNew JerseyUSA08544 eg5817@princeton.edu , Vitus Larrieu Princeton UniversityPrincetonNew JerseyUSA08544 vl7131@princeton.edu , Y. Trace Zhang Princeton UniversityPrincetonNew JerseyUSA08544 yingyue@princeton.edu , Joanne Peng Princeton UniversityPrincetonNew JerseyUSA08544 jzp@princeton.edu , Varun Nagaraj Rao Princeton UniversityPrincetonNew JerseyUSA08544 varunrao@princeton.edu , Yuhan Liu Princeton UniversityPrincetonNew JerseyUSA08544 yuhanl@princeton.edu and Andrés Monroy-Hernández Princeton UniversityPrincetonNew JerseyUSA08544 andresmh@princeton.edu

(2025)

Abstract.

With the growth of AI, researchers are studying how to mitigate its environmental impact, primarily by proposing policy changes and increasing awareness among developers. However, research on AI end users is limited. Therefore, we introduce GPTFootprint, a browser extension that aims to increase consumer awareness of the significant water and energy consumption of LLMs, and reduce unnecessary LLM usage. GPTFootprint displays a dynamically updating visualization of the resources individual users consume through their ChatGPT queries. After a user reaches a set query limit, a popup prompts them to take a break from ChatGPT. In a week-long user study, we found that GPTFootprint increases people’s awareness of environmental impact, but has limited success in decreasing ChatGPT usage. This research demonstrates the potential for individual-level interventions to contribute to the broader goal of sustainable AI usage, and provides insights into the effectiveness of awareness-based behavior modification strategies in the context of LLMs.

Eco-feedback systems, environmental awareness, large language models, behavior change

^†^†journalyear: 2025^†^†copyright: rightsretained^†^†conference: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems; April 26-May 1, 2025; Yokohama, Japan^†^†booktitle: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA ’25), April 26-May 1, 2025, Yokohama, Japan^†^†doi: 10.1145/3706599.3719708^†^†isbn: 979-8-4007-1395-8/2025/04^†^†ccs: Human-centered computing^†^†ccs: Human-centered computing Empirical studies in interaction design^†^†ccs: Human-centered computing Empirical studies in visualization

Refer to caption — Figure 1. GPTFootprint displays the amount of energy and water consumed in user-friendly terms and with pictogram icons (hours powering a light bulb and cups of water) in a side panel on top of the ChatGPT interface. In addition, there is an Eco Score, which can range from 0 to 100.

1. Introduction

The rapid advancement and widespread adoption of large language models (LLMs) has transformed how the public interacts with artificial intelligence. Tools like ChatGPT handle millions of queries daily (Singh, 2025), assisting with tasks ranging from coding (Lin et al., 2024) to creative writing (Gómez-Rodríguez and Williams, 2023). However, this convenience comes at significant environmental costs that remain largely invisible to the end users. A single trained LLM has the carbon footprint equivalent to hundreds of households’ annual emissions (Ren and Wierman, 2024), processing a single query consumes ten times more energy than a standard web search (Iea, [n. d.]), and cooling data centers requires substantial water consumption (Li et al., 2023). As LLMs become increasingly integrated into daily workflows, establishing sustainable usage patterns early could significantly impact their long-term environmental footprint. Current solutions for quantifying LLM environmental impacts focus on industry-level interventions or expert-oriented tools (Lacoste et al., 2019; Schidt et al., [n. d.]), leaving a gap for consumer awareness and engagement. While several tools track query counts (Staiger, 2024; Eduardo, 2024), none provide accessible environmental impact metrics to end users.

In this paper, we present GPTFootprint, a novel browser extension that addresses this gap by displaying dynamic environmental impact metrics during ChatGPT usage. Our approach innovates in three key ways: dynamic feedback that converts technical measures into human-scale units, privacy-preserving tracking that does not access query content, and a visually engaging interface that makes a user’s impact simple to understand at a glance.

This research addresses three questions:

(1)

How can we effectively communicate the environmental impact of individual LLM usage to end users?
(2)

What metrics and visualizations most effectively help users understand their environmental impact?
(3)

To what extent does dynamic environmental impact feedback influence people’s use of LLMs?

We evaluated GPTFootprint with 9 participants who used it for 7 days. Participants reported a greater awareness of the environmental impact of LLMs, and expressed appreciation for the new understanding they gained. Participants also agreed that human-scale metrics and visualizations significantly improved understanding of their impact. Although awareness increased, behavioral change was limited by the utility of ChatGPT and sentiment about the limits of personal responsibility. These findings contribute to both environmental computing and human-computer interaction fields by demonstrating effective strategies for communicating AI environmental impact to end users and identifying barriers to behavior change.

2. Related Work

2.1. Environmental Impacts of LLMS

In the past few years, research on the flaws and dangers of LLMs has grown, warning of broad societal and environmental impacts (Bender et al., 2021). Although certain machine learning applications can help mitigate climate change (Kaack et al., 2022; Rolnick et al., 2019; Tousignant, 2021), they often also exacerbate climate change through emissions-intensive training and usage (Cottier et al., 2024; Bender et al., 2021; Li et al., 2023). Current proposed solutions (Kaack et al., 2022; Strubell et al., 2020; Bender et al., 2021; Stojkovic et al., 2024) focus on governmental policies or industry-level energy conservation techniques instead of user behavior changes. Some researchers have attempted to engage individuals’ awareness of carbon emissions from large scale computing (Lacoste et al., 2019; Schidt et al., [n. d.]), but these programs require detailed knowledge of the hardware, hours used, compute providers, geographical regions, and more. Furthermore, they exclude other crucial environmental impacts like the water data centers consume (Li et al., 2023). Although several ChatGPT extensions can count queries (Staiger, 2024; Eduardo, 2024), none of these display an estimate of the resulting environmental impact. We saw a need for consumer access to dynamic, updating statistics about the environmental impact of their personal LLM usage.

2.2. Technology for Behavior Change

Habit adjustment technology aims to increase positive habits and decrease negative ones. Awareness alone can influence actions, as with behavior-informing technology that display health statistics like step count in order to increase activity levels (Bravata et al., 2007; Consolvo et al., 2009). Because studies of physical fitness focus on promoting positive behaviors, like exercise, rather than reducing negative ones, like excessive LLM usage, we also draw from research on screen time reduction, which relies on strategies like gamification and goal setting that can effectively reduce both device and app-specific usage over time (Rahmillah et al., 2023; Rooksby et al., 2016). However, complete app lock-outs can cause frustration, which sometimes results in an ultimate failure to reduce screen time (Kim et al., 2019a, b). It may be more effective to allow users to continue using the software, but make it increasingly challenging for them to do so (Lu et al., 2024).

2.3. Eco-Feedback Technology

Eco-feedback systems often incorporate user awareness by presenting information about personal environmental impact (Froehlich et al., 2010), which can increase environmentally friendly behaviors in transportation (Tulusan et al., 2011; Froehlich et al., 2009), water usage (Arroyo et al., 2005; Froehlich et al., 2012), and more (Froehlich et al., 2010). Eco-feedback systems that contextualize energy consumption perform better than those that do not (Jain et al., 2013). Although eco-feedback has been successful in various applications, they can lack significant long-term effects (Hargreaves, 2017). Researchers have suggested that speculative design (Hargreaves, 2017) and social comparison components (Stefano De Dominicis and Schultz, 2019) can be potential solutions for the lack of long-term effects.

3. GPTFootprint

We implemented GPTFootprint as a Chrome Extension that layers on top of the ChatGPT website for ease of use. It consists of two main features: an Environmental Impact side panel (Figure 2(a)) and a Limit Reached popup (Figure 2(b)). Both link to the same Read More document, which provides further information about the environmental impact of LLMs. During user studies, the extension also included a server component, which saved information about ChatGPT and Extension usage for each participant. No version of GPTFootprint accesses or stores the contents of queries or responses. All components were developed using JavaScript and CSS.

3.1. Unit Conversion Techniques

GPTFootprint calculates energy and water consumption based on an average per-query value of 2.9Wh of energy (Iea, [n. d.]) and 16.9mL of water (Li et al., 2023). Although the size of queries and responses affects energy and water consumption (Schidt et al., [n. d.]; Lacoste et al., 2019), determining these variables would require accessing user queries. To mitigate security and privacy concerns, we chose to use per-query average values instead. Similar per-query average approaches to evaluating carbon emission costs from LLMs are standard for assessing model impacts (Luccioni et al., 2022; Gol, [n. d.]; Semianalysis, 2023), and even research that distinguishes between various query types finds that text generation and summarization, the two types relevant to GPTFootprint, have a similar per-query average energy consumption (Luccioni and Strubell, 2023).

3.2. Eco Score

Pilot studies on a previous implementation of GPTFootprint without Eco Score suggested that although participants appreciated seeing usage metrics, they typically did not enjoy using GPTFootprint, due to feelings like guilt and stress, and it elicited little to no behavioral change. Therefore, we decided to implement Eco Score as a form of gamification, which can affect behavior (Rahmillah et al., 2023) by inspiring intrinsic motivation (A and Joy, 2024; Luarn et al., 2023) and increasing positive emotions like achievement (Blohm and Leimeister, 2013). We based the Eco Score on U.S. university grading systems, with which all participants would be familiar (in this system, 90-100 is excellent, 80-89 is good, and lower scores are average, poor, or a failure (USG, [n. d.])). Such grading systems can induce motivation, due to the subconscious association with schooling (McMillan, 2007). Other sustainability measurements like Energy Star Scores (Agency, 2025) and household carbon calculators (Schidt et al., [n. d.]; Trust, 2025) use similar systems. We designed the scoring algorithm with several goals in mind, as follows (the full algorithm appears in Appendix LABEL:asec:algo).

First, an average user should see an Eco Score low enough to encourage behavior change, but not low enough to cause frustration, which can result in failure to change behavior (Kim et al., 2019a, b). Difficult but achievable goals lead to more motivation than easy or impossible goals (A and Joy, 2024), so we aimed to place an average user just below a ‘good’ score. In our pilot studies, participants queried 6 times per day on average. If these queries are each an hour apart, they will reach a score of 76, just below a ‘good’ score.

Second, the Eco Score should encourage efficient ChatGPT usage. We quantify efficient usage by the pauses in between queries, where longer pauses suggest that participants seek other resources when ChatGPT is unhelpful, and shorter pauses suggest that users rely solely on ChatGPT, even for tasks it may not be suited for. Therefore, queries in quick succession have a greater negative impact on the Eco Score than queries with longer pauses in between. Thus, all queries more than an hour apart lose 7 points each and all queries less than a minute apart lose 13 points, with several tiers in between.

Third, an efficient day of ChatGPT usage should have little to no effect on the Eco Score the next day, while a particularly inefficient day of ChatGPT querying should have a noticeable impact the following day. We accomplished this with our score increase rate of 1 point every 20 minutes. Assuming an 8 hour night (hum, 2022), the Eco Score will increase by 24 points overnight. Therefore, an average efficient user, with 6 queries each an hour apart, as described above, will start the next day with a score of 100 again, while a user with a lower Eco Score will not.

3.3. Chrome Extension Design

As soon as a user opens the ChatGPT website ¹¹1Website at https://chatgpt.com/, the extension requests current usage statistics from local Chrome storage and creates the Environmental Impact side panel. GPTFootprint uses status codes as a proxy for queries: the extension listens for a successful POST request (statusCode === 200) made to the ChatGPT API (https://chatgpt.com/backend-api/conversation), ignoring requests containing ‘init’ or ‘implicit’, which represent tasks other than queries. Each query detection updates the locally stored query count, side panel, and popup. During the user studies, the extension also externally saved information about each participant’s usage in a private Google Sheets spreadsheet. Each time the extension detected an event, such as query or popup opening, it logged the user ID, the date and time, and the event type (i.e. query, popup_opening, popup_closed, readmore_clicked). This data is de-identified with a random user ID, and Google Chrome encrypted all information transmitted between users and the server.

3.3.1. Environmental Impact Side Panel

The side panel (Figure 2(a)) initially appears on the top right of the screen, but can be dragged to another area to avoid obscuring ChatGPT’s interface. The side panel is always present, so the user cannot minimize it or drag it offscreen. With each query, the side panel updates its display to reflect the total energy and water used in two display types. First, the Eco Score graphic provides users with a glanceable metric of their environmental impact. The score is displayed on top of an image, which changes dynamically with the score, depicting a more polluted, bleak environment when the score drops and a cleaner, brighter environment when the score increases. Second, the display of human-scale metrics and corresponding pictogram charts (Art, 2025; Visions, 2025; Davis, 2025; Nguyen, 2025; Greenhill, 2025) contextualize impact in glanceable and easily understandable terms for a typical end user of ChatGPT. The energy metric dynamically shifts from hours powering a light bulb to miles driving a Tesla as energy consumption increases, while the water metric shifts from cups to bathtubs to hot tubs. At the bottom of the side panel is a button labeled Read More, which links to information about LLMs and their environmental impact.

3.3.2. Limit Reached Popup

The limit popup (Figure 2(b)) automatically appears each time the user reaches a certain energy or water limit. It cannot be moved, but will disappear once the user clicks the “Continue Using Anyway” button. The popup displays the total energy and water consumed in both user-friendly metrics and standard metrics (kWh and liters), and it includes the same Read More button. Originally, the limit was three queries, but pilot participants thought this was too frequent. We increased the limit to seven queries, which was successful during a second pilot study.

4. Methods

We evaluated GPTFootprint with a week-long user study with nine participants, all college students between ages 18-24. We recruited current ChatGPT users from university courses that allow LLM usage, and through snowball sampling. The study received IRB approval. We informed all subjects of the study and they each consented to our procedures before participating.

First, we sent users a pre-survey (Appendix D.1), asking about their current concern for the environment, along with a link and instructions for downloading and setting up GPTFootprint. For the next week, we asked them to only use ChatGPT in Chrome with the extension enabled. After the trial period ended, we sent participants an exit survey (Appendix D.2), in which they answered multiple-choice questions (using the Likert scale from 1 to 5) and wrote open-ended responses about whether they enjoyed GPTFootprint, how it made them feel, and whether it impacted their behavior. We coded each response, then grouped codes into five themes (Appendix C), finding key quotes for each theme. In addition, the exit survey contained instructions for determining ChatGPT usage before and during the trial using a Colab Notebook (Appendix E). Participants uploaded their ChatGPT history locally to the Notebook, then reported the outputted query counts to our exit survey. Thus, we systematically collected usage statistics from before and after the study period without accessing participant’s private conversation records.

5. Results

We evaluated GPTFootprint on its ability to increase user awareness of environmental impact and to decrease ChatGPT usage. In our analysis, three themes emerged: the importance of personalized information, the emotional impact of GPTFootprint, and the difficulty of decreasing the usage of a valuable tool like ChatGPT.

5.1. Participants Appreciate New Awareness of Their Personal Environmental Impact

Participants uniformly agreed that GPTFootprint was very educational, with an average response of 4.11 (SD = 0.782, median = 4) on a Likert Scale ranging from 1 (learned nothing from GPTFootprint) to 5 (learned a lot). All participants noted a new awareness of their impact, which led many to some personal reflection: “This really put my usage into perspective and made me think about whether the use of ChatGPT was worth its cost or not” [P4], “Frankly, it was more water than I drink in a regular basis, so it was eye opening” [P2]. Additionally, almost half of the participants (4 out of 9) reported caring more about the impact of their ChatGPTs after the study than before, while the rest reported no change; the average score increased from 3.0 out of 5 (SD = 1, median = 3) before the trial to 3.55 (SD = 0.882, median = 4) afterwards.

In pilot studies, participants appreciated the real-world, contextualized metrics, but found the side panel wordy, and therefore confusing, with many requests for more visuals. In this iteration of GPTFootprint, therefore, we condensed the text and added pictograms. These updates were successful. No participants reported confusion about the metrics, and many praised their real-world applicability: “things I was familiar with (cups of water and hours for a lightbulb), and not just random numbers…had a more significant effect” [P3]. The visuals, too, impacted participants: “having the visual indicator for the cost of my queries would reduce my usage” [P8].

Notably, participants discussed not only their new awareness itself, but also their appreciation for that awareness. In fact, three different participants wrote about how seeing “reminders” of ChatGPT’s environmental impact is “important” [P2, P5, P7]. Most users were likely to continue using GPTFootprint after the trial ended (often citing the importance of awareness as a reason), rating the likelihood of future use 3.44 out of 5 on average, with only one participant responding with a value below 3. This desire to understand the hidden environmental impacts of ChatGPT emphasizes the importance of GPTFootprint, and similar programs, as LLMs like ChatGPT continue to grow in popularity.

5.2. GPTFootprint Elicits Strong Emotional Responses

Another theme among participants was that GPTFootprint induced emotions such as shock, sadness, and especially guilt [P2, P3, P4, P7, P8]. Participants “felt worse” [P8] after seeing their personal ChatGPT usage connected to environmental effects, and their usage patterns “really shocked [them]” [P9]. Participants experienced heightened emotional responses for information and queries that could be accessed through Google or other means: “This extension made me feel a little guilty for turning to ChatGPT… I felt like the energy I could put into doing the research or work myself would be better spent rather than using ChatGPT.” [P2]. Participants noted connections to current environmental concerns and personal sustainability goals: “Recent events like the LA wildfires has also made me more conscious of my environmental impact.” [P5], “It made me realize that these things [ChatGPT] are not free, they must be paid for somewhere and this is paid in resources” [P6]. In our design, these negative emotions are the catalyst for inducing behavioral change.

Despite the negative emotions, participants enjoyed using GPTFootprint, and rated it an average of 4.22 out of 5 (SD = 0.667, median = 4) on the Likert scale, which was an increase from 3.78 (SD = 1.084, median = 3.5) during pilot studies without visuals. With the improved user interface, fewer participants responded with themes of annoyance, and they felt more inclined to continue using the extension.

5.3. Difficult to Overcome Utility of LLMs

Four participants had a decrease in ChatGPT usage during the trial period [P2, P4, P7, P9], but the rest did not: one participant experienced no change [P3], and the remaining four participants actually increased their ChatGPT usage during the trial period [P1, P5, P6, P8]. In fact, the total queries across all participants increased by 18.584% during the trial, though this is likely due to the small sample size and short trial period.

Many participants reported that the utility of ChatGPT outweighed their desire to reduce their environmental impact [P1, P3, P4, P5, P7]. Although it made them “more motivated to minimize [their] usage” their “desperation and need will still make me continue using it” [P5]. The two participants with the highest increased usage during the study (21.429% [P1] and 1,500% [P5]) both acknowledged this increase in the exit-survey, citing outside factors like “apply[ing] to jobs” [P1] and “necessary writing” [P5].

Even when users did experience a decrease in ChatGPT use, they still valued the utility of ChatGPT: using GPTFootprint was “mildly upsetting,” but “Not upsetting enough to swear off ChatGPT altogether” [P7]. Participants also cited a perceived limitation of individual actions to conserve resources, pointing out that their individual use is not a significant contributor to the overall impact of ChatGPT [P2, P4, P5], and expressing a desire for companies to take responsibility for the environmental impacts of LLMs, rather than users [P7]. One participant asked “how are we supposed to mitigate its energy consumption” from an “existential” perspective rather than a personal one [P4], though their usage did decline by 28.571% over the course of the study. Taken together, these user reactions indicate of a key limitation of our intervention style. ChatGPT can be very useful, and appears to have no direct negative impact on its users. Our intervention relies on user awareness of the direct negative impacts ChatGPT can have on users, but the user makes the final decision on whether or not to continue using ChatGPT. For many participants, the utility of ChatGPT appeared to outweigh its detrimental effects on the environmental. However, in the week-long trial period, the total resource consumption lacked the chance to increase significantly—perhaps, over a longer period, the increased resource consumption would impact users more.

Interestingly, the limit popup did occasionally seem to affect behavior, whether or not participants realized. In total, popups appeared 10 times, and participants always closed it within a minute of its opening. For 7 of those popups, another query occurred less than 10 minutes after the popup opening. However, the other 3 popups led to a longer delay, ranging from 24 minutes to over 24 hours. This pattern was even more pronounced during pilot studies, when 21.429% of popups preceded a 60+ minute query delay, and an additional 14.288% preceded a 10+ minute delay. Thus, the popup may serve as a final push when users are already considering taking a break from ChatGPT. Allowing users to personalize their limits could increase this effect in future versions of GPTFootprint.

6. Discussion

Contrary to our expectations, ChatGPT usage did not uniformly decrease during the study. One key reason we discovered in our analysis is that the utility of ChatGPT outweighs participants’ desire to reduce personal environmental impacts. Eco-feedback systems tend to discourage negative behavior that has either minimal benefit or a negative impact on participants, like using more gas when driving a car (Tulusan et al., 2011). ChatGPT, in contrast, offers significant benefits, with little direct negative impact on the user, another limitation of metric-based systems (Sanguinetti et al., 2018). Additionally, participants felt desensitized to the information displayed and the reminders, a common issue in habit-reduction systems (Hargreaves, 2017; Kim et al., 2019a, b). Although these issues limited our system’s effectiveness in changing behavior, methods besides a limit popup and sidebar might improve results, and warrant future research. However, as participants indicate that they are inclined to keep using GPTFootprint beyond the study period, it does increase user awareness even when behavior is dictated by external circumstances such as deadlines.

7. Limitations and Future Work

We explicitly designed GPTFootprint to have minimal privacy concerns, but the change in behavior it aims to elicit may have negative consequences for individuals, such as generalized skepticism and negative sentiments about emerging technology. Going forward, we could mitigate this potential concern by increasing user agency, focusing on flexible personal regulation of resource consumption, and avoiding criticism of emerging technology at large. Furthermore, increased awareness can place public pressure on companies to be more environmentally conscious, but GPTFootprint could seem to blame individuals over large corporations. We believe that despite this risk, addressing the lack of public awareness about the environmental impact of LLMs is necessary to encourage meaningful change in the industry.

We note that our sample consists entirely of university students who are regular users of ChatGPT, and are therefore inclined to query when faced with certain academic tasks and deadlines. In choosing active users, we wanted to ensure adequate testing of GPTFootprint’s features in the short study time. However, user behavior would likely be very different for those less familiar with ChatGPT, or those who use it for other tasks, such as personal or job-related matters. Furthermore, university students represent a highly educated subset of the population, but an ideal deployment of the system would occur at a broader scale. The effects of GPTFootprint on other populations, who may have less pre-existing knowledge about LLMs or differing awareness of environmental threats, warrants further research.

Future work can continue exploring what factors encourage behavioral change. While some users with high pause times reported turning to alternative sources like “Google” [P4] or “Google Scholar” [P3] instead of ChatGPT, supporting our use of pop-up persistence time as a proxy for seeking alternate resources, others with similar pause patterns did not report platform-switching. Without better metrics on search product usage during pop-up periods, measuring its impact on behavioral change remains difficult. Although participants in the full study expressed far less annoyance than during pilot studies without visuals, they did request size-adjustment capabilities for the side panel, which sometimes blocked parts of the ChatGPT interface, even after moving it around the screen. Interestingly, one participant commented “sometimes [the side panel] can get in the way…unless of course that is the point”, and another said “sometimes, the extension blocked parts of my screen, but this also made me lessen my usage of ChatGPT” [P2]. These reflect research on input manipulation—decreasing a given behavior by making that behavior more difficult (Lu et al., 2024). One method of input manipulation in GPTFootprint could be modifying the size of the side panel. Participants also suggested incorporating animations and more images [P3, P7], as they thought visuals were the most easily understandable representation of their environmental impact. Additional visuals could also encourage GPTFootprint users to compare their Eco Score and resource consumption with friends, facilitating a social comparison component that has been successful for other eco-feedback systems (Stefano De Dominicis and Schultz, 2019).

Future work must also consider a wider variety of LLMs, particularly new chain-of-thought models. Notably, these frontier models advertise more efficient training methods, but preliminary experiments suggest the interference cost per query is significant higher (O’Donnell, 2025). Currently, per-query cost estimates remain unavailable for newer models, so determining these values is a key first step towards increasing consumer awareness.

8. Conclusion

The environmental impacts of LLMs remain largely unknown to their users. In an effort to increase user awareness and help mitigate the misuse of environmental resources in LLM use, we built GPTFootprint, a novel Chrome extension that integrates into the ChatGPT website to provide live updates on personal environmental impact from querying. The extension provides live, anonymized tracking of user query activity and presents users with their individual environmental impact through dynamic, contextualized displays. We conducted a full study with nine participants over 7 days to evaluate our system. We found that the updating and contextualized statistics and visuals were valuable for user awareness. GPTFootprint elicited strong emotional reactions from users, but often failed to cause behavior changes, due to the utility of ChatGPT. Already, our system demonstrates promising increases in user awareness, and we believe that further versions, with new features like social comparison and personalized goal setting, have high potential to improve user behavior and mitigate the environmental impacts of LLMs.

References

(1)
Gol ([n. d.]) [n. d.]. Generational Growth: AI, Data Centers and the Coming US Power Demand Surge. https://www.goldmansachs.com/pdfs/insights/pages/generational-growth-ai-data-centers-and-the-coming-us-power-surge/report.pdf
USG ([n. d.]) [n. d.]. Understanding Your Grades and Transcript. https://baruch-undergraduate.catalog.cuny.edu/policies-and-procedures/understanding-your-grades-and-transcript
hum (2022) 2022. How Much Sleep Is Enough? https://www.nhlbi.nih.gov/health/sleep/how-much-sleep#:~:text=Experts%20recommend%20that%20adults%20sleep,or%20more%20hours%20a%20night.
A and Joy (2024) Ebina Justin M A and Manu Melwin Joy. 2024. Gamification, intrinsic motivation, and task performance of employees: the moderating role of goal difficulty. Behaviour & Information Technology 43, 16 (2024), 3993–4015. https://doi.org/10.1080/0144929X.2023.2297280 arXiv:https://doi.org/10.1080/0144929X.2023.2297280
Agency (2025) U.S. Environmental Protection Agency. 2025. How the Energy Star Score is Calculated. https://www.energystar.gov/buildings/benchmark/understand-metrics/how-score-calculated Accessed: 2025-01-21.
Arroyo et al. (2005) Ernesto Arroyo, Leonardo Bonanni, and Ted Selker. 2005. Waterbot: exploring feedback and persuasive techniques at the sink. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Portland, Oregon, USA) (CHI ’05). Association for Computing Machinery, New York, NY, USA, 631–639. https://doi.org/10.1145/1054972.1055059
Art (2025) Angels Art. 2025. Lightbulb Icon. https://thenounproject.com/icon/lightbulb-3194358/. Accessed: 2025-01-23.
Bender et al. (2021) Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Virtual Event, Canada) (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 610–623. https://doi.org/10.1145/3442188.3445922
Blohm and Leimeister (2013) Ivo Blohm and Jan Marco Leimeister. 2013. Gamification. Bus. Inf. Syst. Eng. 5, 4 (Aug. 2013), 275–278.
Bravata et al. (2007) Dena Bravata, Crystal Smith-Spangler, Vandana Sundaram, Allison Gienger, Nancy Lin, Robyn Lewis, Christopher Stave, Ingram Olkin, and John Sirard. 2007. Using Pedometers to Increase Physical Activity and Improve Health: A Systematic Review. JAMA : the journal of the American Medical Association 298 (11 2007), 2296–304. https://doi.org/10.1001/jama.298.19.2296
Consolvo et al. (2009) Sunny Consolvo, David W. McDonald, and James A. Landay. 2009. Theory-driven design strategies for technologies that support behavior change in everyday life. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, MA, USA) (CHI ’09). Association for Computing Machinery, New York, NY, USA, 405–414. https://doi.org/10.1145/1518701.1518766
Cottier et al. (2024) Ben Cottier, Robi Rahman, Loredana Fattorini, Nestor Maslej, and David Owen. 2024. The rising costs of training frontier AI models. arXiv:2405.21015 [cs.CY] https://arxiv.org/abs/2405.21015
Davis (2025) Ben Davis. 2025. Water Glass Icon. https://thenounproject.com/icon/water-glass-1190124/. Accessed: 2025-01-23.
Eduardo (2024) Matheus Eduardo. 2024. ChatGPT Question Couunt. Chrome Extension Store (2024). https://chromewebstore.google.com/detail/chatgpt-question-count/naokkoogmjjhnehoadkmpicliffbjllc
Froehlich et al. (2009) Jon Froehlich, Tawanna Dillahunt, Predrag Klasnja, Jennifer Mankoff, Sunny Consolvo, Beverly Harrison, and James Landay. 2009. UbiGreen: Investigating a Mobile Tool for Tracking and Supporting Green Transportation Habits. Proc. CHI 2009, 1043–1052. https://doi.org/10.1145/1518701.1518861
Froehlich et al. (2010) Jon Froehlich, Leah Findlater, and James Landay. 2010. The design of eco-feedback technology. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA) (CHI ’10). Association for Computing Machinery, New York, NY, USA, 1999–2008. https://doi.org/10.1145/1753326.1753629
Froehlich et al. (2012) Jon Froehlich, Leah Findlater, Marilyn Ostergren, Solai Ramanathan, Josh Peterson, Inness Wragg, Eric Larson, Fabia Fu, Mazhengmin Bai, Shwetak Patel, and James A. Landay. 2012. The design and evaluation of prototype eco-feedback displays for fixture-level water usage data. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Austin, Texas, USA) (CHI ’12). Association for Computing Machinery, New York, NY, USA, 2367–2376. https://doi.org/10.1145/2207676.2208397
Greenhill (2025) Greenhill. 2025. Hot Tub Icon. https://thenounproject.com/icon/hot-tub-6976573/. Accessed: 2025-01-23.
Gómez-Rodríguez and Williams (2023) Carlos Gómez-Rodríguez and Paul Williams. 2023. A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing. https://doi.org/10.48550/arXiv.2310.08433 arXiv:2310.08433 [cs].
Hargreaves (2017) Tom Hargreaves. 2017. Beyond Energy Feedback. In Building Research & Information, Vol. 46. 332–342. https://doi.org/10.1080/09613218.2017.1356140
Iea ([n. d.]) Iea. [n. d.]. Electricity 2024 – analysis. https://www.iea.org/reports/electricity-2024
Jain et al. (2013) Rishee Jain, John Taylor, and Patricia Culligan. 2013. Investigating the impact eco-feedback information representation has on building occupant energy consumption behavior and savings. Energy and Buildings 64 (09 2013), 408–414. https://doi.org/10.1016/j.enbuild.2013.05.011
Kaack et al. (2022) Lynn H. Kaack, Priya L. Donti, Emma Strubell, George Kamiya, Felix Creutzig, and David Rolnick. 2022. Aligning artificial intelligence with climate change mitigation. Nature Climate Change 12, 6 (June 2022), 518–527. https://doi.org/10.1038/s41558-022-01377-7
Kim et al. (2019a) Jaejeung Kim, Hayoung Jung, Minsam Ko, and Uichin Lee. 2019a. GoalKeeper: Exploring Interaction Lockout Mechanisms for Regulating Smartphone Use. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 1, Article 16 (March 2019), 29 pages. https://doi.org/10.1145/3314403
Kim et al. (2019b) Jaejeung Kim, Joonyoung Park, Hyunsoo Lee, Minsam Ko, and Uichin Lee. 2019b. LocknType: Lockout Task Intervention for Discouraging Smartphone App Use. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300927
Lacoste et al. (2019) Alexandre Lacoste, Alexandra Luccioni, Victor Schmidt, and Thomas Dandres. 2019. Quantifying the Carbon Emissions of Machine Learning. arXiv:1910.09700 [cs.CY] https://arxiv.org/abs/1910.09700
Li et al. (2023) Pengfei Li, Jianyi Yang, Mohammad A. Islam, and Shaolei Ren. 2023. Making AI Less ”Thirsty”: Uncovering and Addressing the Secret Water Footprint of AI Models. arXiv:2304.03271 [cs.LG] https://arxiv.org/abs/2304.03271
Lin et al. (2024) Feng Lin, Dong Jae Kim, Tse-Husn, and Chen. 2024. SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents. https://doi.org/10.48550/arXiv.2403.15852 arXiv:2403.15852 [cs].
Lu et al. (2024) Tao Lu, Hongxiao Zheng, Tianying Zhang, Xuhai “Orson” Xu, and Anhong Guo. 2024. InteractOut: Leveraging Interaction Proxies as Input Manipulation Strategies for Reducing Smartphone Overuse. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 245, 19 pages. https://doi.org/10.1145/3613904.3642317
Luarn et al. (2023) Pin Luarn, Chiao-Chieh Chen, and Chiu Yu-Ping. 2023. Enhancing intrinsic learning motivation through gamification: a self-determination theory perspective. The International Journal of Information and Learning Technology 40, 5 (2023), 413–424. https://login.ezproxy.princeton.edu/login?url=https://www.proquest.com/scholarly-journals/enhancing-intrinsic-learning-motivation-through/docview/2879858432/se-2 Copyright - © Emerald Publishing Limited; Last updated - 2024-12-13; SubjectsTermNotLitGenreText - Learning Motivation; Teaching Methods; Learning Processes; Educational Environment; Game Based Learning.
Luccioni et al. (2022) Alexandra Sasha Luccioni, Sylvain Viguier, and Anne-Laure Ligozat. 2022. Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model. arXiv:2211.02001 [cs.LG] https://arxiv.org/abs/2211.02001
Luccioni and Strubell (2023) Sasha Luccioni and Emma Strubell. 2023. Title of the paper as per the source (replace this placeholder). In Proceedings of the ACM Conference/Journal Name (replace this placeholder). https://doi.org/10.1145/3630106.3658542
McMillan (2007) James H. McMillan. 2007. Classroom assessment: Principles and practice that enhance student motivation and achievement. Journal of Scholarship of Teaching and Learning 7, 1 (2007), 22–33. https://files.eric.ed.gov/fulltext/EJ854925.pdf
Nguyen (2025) Richard Nguyen. 2025. Bath Tub Icon. https://thenounproject.com/icon/bath-tub-7345415/. Accessed: 2025-01-23.
O’Donnell (2025) James O’Donnell. 2025. DeepSeek might not be such good news for energy after all. https://www.technologyreview.com/2025/01/31/1110776/deepseek-might-not-be-such-good-news-for-energy-after-all/.
Rahmillah et al. (2023) Fety Rahmillah, Amina Tariq, Mark King, and Oscar Oviedo-Trespalacios. 2023. Evaluating the Effectiveness of Apps Designed to Reduce Mobile Phone Use and Prevent Maladaptive Mobile Phone Use: Multimethod Study. Journal of medical Internet research 25 (08 2023), e42541. https://doi.org/10.2196/42541
Ren and Wierman (2024) Shaolei Ren and Adam Wierman. 2024. The uneven distribution of Ai’s environmental impacts. https://hbr.org/2024/07/the-uneven-distribution-of-ais-environmental-impacts
Rolnick et al. (2019) David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, Alexandra Luccioni, Tegan Maharaj, Evan D. Sherwin, S. Karthik Mukkavilli, Konrad P. Kording, Carla Gomes, Andrew Y. Ng, Demis Hassabis, John C. Platt, Felix Creutzig, Jennifer Chayes, and Yoshua Bengio. 2019. Tackling Climate Change with Machine Learning. arXiv:1906.05433 [cs.CY] https://arxiv.org/abs/1906.05433
Rooksby et al. (2016) John Rooksby, Parvin Asadzadeh, Mattias Rost, Alistair Morrison, and Matthew Chalmers. 2016. Personal Tracking of Screen Time on Digital Devices. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 284–296. https://doi.org/10.1145/2858036.2858055
Sanguinetti et al. (2018) Angela Sanguinetti, Kelsea Dombrovski, and Suhaila Sikand. 2018. Information, timing, and display: A design-behavior framework for improving the effectiveness of eco-feedback. Energy Research & Social Science 39 (2018), 55–68. https://doi.org/10.1016/j.erss.2017.10.001
Schidt et al. ([n. d.]) Victor Schidt, Kamal Goyal, Aditya Joshi, Boris Feld, Liam Conell, Nikolas Laskaris, Doug Blank, Jonathan Wilson, Sorelle Friedler, and Sasha Luccioni. [n. d.]. CodeCarbon: estimate and track carbon emissions from machine learning computing. https://github.com/mlco2/codecarbon
Semianalysis (2023) Semianalysis. 2023. Peeling the Onion’s Layers: Large Language Models. https://semianalysis.com/2023/02/13/peeling-the-onions-layers-large-language/ Accessed: 2025-01-22.
Singh (2025) Shubham Singh. 2025. Number Of ChatGPT Users (January 2025). https://www.demandsage.com/chatgpt-statistics/
Staiger (2024) Josh Staiger. 2024. Chatterclock — a ChatGPT message tracking extension. Chrome Extension Store (2024). https://chromewebstore.google.com/detail/chatterclock-%E2%80%94-a-chatgpt/mepflplnjbngmgakdefimlgbfpmhonoj
Stefano De Dominicis and Schultz (2019) Christine M. Jaeger Stefano De Dominicis, Rebecca Sokoloski and P. Wesley Schultz. 2019. Making the Smart Meter Social Promotes Long-Term Energy Conservation. In Palgrave Commun, Vol. 5. https://doi.org/10.1057/s41599-019-0254-5
Stojkovic et al. (2024) Jovan Stojkovic, Esha Choukse, Chaojie Zhang, Inigo Goiri, and Josep Torrellas. 2024. Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference. arXiv:2403.20306 [cs.AI] https://arxiv.org/abs/2403.20306
Strubell et al. (2020) Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2020. Energy and Policy Considerations for Modern Deep Learning Research. Proceedings of the AAAI Conference on Artificial Intelligence 34, 09 (Apr. 2020), 13693–13696. https://doi.org/10.1609/aaai.v34i09.7123
Tousignant (2021) Brigitte Tousignant. 2021. This Climate Does Not Exist: Picturing impacts of the climate crisis with AI, one address at a time. Mila (October 2021).
Trust (2025) Carbon Trust. 2025. SME Carbon Footprint Calculator. https://www.carbontrust.com/our-work-and-impact/guides-reports-and-tools/sme-carbon-footprint-calculator Accessed: 2025-01-21.
Tulusan et al. (2011) Johannes Tulusan, Lito Soi, Johannes Paefgen, Marc Brogle, and Thorsten Staake. 2011. Eco-efficient feedback technologies: Which eco-feedback types prefer drivers most?. In 2011 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks. 1–8. https://doi.org/10.1109/WoWMoM.2011.5986187
Visions (2025) Iconic Visions. 2025. Cybertruck Icon. https://thenounproject.com/icon/cybertruck-7345261/. Accessed: 2025-01-23.

Appendix A Tabular Data

Participant	No. of Queries Per Week Before Trial	No. of Queries Per Week During Trial
P1	42	51
P2	34	29
P3	7	7
P4	14	10
P5	1	16
P6	0	5
P7	7	6
P8	0	3
P9	8	7

Table 1. Summary of ChatGPT usage. Red cells represent more queries during the trial period than before, and green cells represent fewer queries during the trial than before.

Participant	Care for the Environment Pre Trial	Care for the Environment Post Trial	Care about the Environmental Impact of Queries Pre Trial	Care about the Environmental Impact of Queries Post Trial	Enjoyment	Likelihood of Future Use	Learned More
P1	4	5	3	3	4	4	3
P2	4	4	2	3	4	3	4
P3	5	4	2	4	4	3	4
P4	4	4	4	4	5	5	4
P5	4	4	3	4	4	5	4
P6	3	3	2	2	5	4	5
P7	5	5	5	5	5	3	5
P8	3	4	3	4	3	1	5
P9	4	4	3	3	4	3	3

Table 2. Responses to multiple choice survey questions on a Likert scale of 1 to 5.

Participant	Gender	Age
P1	Female	18
P2	Female	21
P3	Non Binary	20
P4	Female	19
P5	Female	21
P6	Male	24
P7	Female	21
P8	Male	20
P9	Female	21

Table 3. Demographic data for participants.

Appendix B Eco Score Algorithm

Algorithm 1 Eco Score Logic for Queries. This algorithm runs each time a new query is detected. In addition to this algorithm, Eco Score will also automatically increase by 1 point every 20 minutes.

pauseLength\leftarrow currentQueryTime-previousQueryTime

pauseLength\geq 60

then

ecoScore\leftarrow ecoScore-7

else if

pauseLength\geq 30

then

ecoScore\leftarrow ecoScore-8

else if

pauseLength\geq 15

then

ecoScore\leftarrow ecoScore-9

else if

pauseLength\geq 7

then

ecoScore\leftarrow ecoScore-10

else if

pauseLength\geq 3

then

ecoScore\leftarrow ecoScore-11

else if

pauseLength\geq 1

then

ecoScore\leftarrow ecoScore-12

else

ecoScore\leftarrow ecoScore-13

end if

ecoScore<0

then

ecoScore\leftarrow 0

end if

Appendix C Codebook During Qualitative Analysis

\Description

[Image of all codes and their meanings, as used by researchers during qualitative analysis coding.]Section 1: Positive Feedback. Int: system was interesting. VP: system was visually pleasing. RM: liked the Read More document. Aw: increased awareness. HV: helpful visuals. CU: continued use of system. Section 2: Negative Feedback. Ann: annoyance during use. Fx: add compatibility with Firefox. AV: add visuals. NW: not working properly. IM: suggested improvements. Sz: desire size adjustment options. Section 3: Behavioral Impact. NBC: no behavior change. BC: behavior change. OF: outside factors. R¿I: benefits of query results matter more than their environmental impact. I¿R: environmental impact matters more than the benefits of query results. Section 4: Emotional Impact. S: sadness and guilt. P: appreciated seeing personalized impact. Con: felt more conscious of consequences. Ann: annoyance during use. Section 5: Metrics. SW: surprised by water usage. SE: surprised by energy usage . NSB: in broader context, impact is not so bad. Des: desensitized. PKn: had previous knowledge. NI: not intuitive metrics. HM: helpful metrics. HTK: hard to know how impact compares to other apps and websites.

Appendix D Surveys

D.1. Pre-Survey

D.2. Post-Survey

Appendix E Query Counting Colab Notebook

Included in the post-trial survey, users were prompted to upload their conversation data and run this notebook to count their queries. Participants are only asked to report the final numerical counts so no conversation data is ever seen by anyone other than the user. Following are the instructions participants received:

In this step, you will upload your personal ChatGPT history to this notebook. It will be saved locally and temporarily, and no one else will ever have access to it.

(1)

Open ChatGPT.com and navigate to Settings by clicking on your profile image.
(2)

Select Data Controls on the left sidebar of Settings, then click Export Data. This will send an export link to your email. Click that link, then unzip the downloaded folder.
(3)

In this Google Colab Notebook, open Files on the left sidebar. Click the Upload button, and upload the file named conversations.json from the folder you just exported.
(4)

Run the following code block. If there is an error, check that you successfully uploaded conversations.json. If you need any further assistance, contact one of the experimenters.

Checking Uploaded Data

import pandas as pd
import json

# check that the data was uploaded successfully
# this code block should run without errors
df = pd.read_json(’conversations.json’)
data = df.to_dict(orient=’records’)

Calculate Weekly Queries

Input the date you downloaded the Chrome Extension into the following code block, then run it.

Setting the Download Date

download_date_str = ’2025-01-19’ # @param {type:"date"}

Analyzing Query Counts

import datetime

# convert date string to universal time
download_date = download_date_str.split(’-’)
download_date = datetime.datetime(int(download_date[0]), int(download_date[1]), int(download_date[2]),
    tzinfo=datetime.timezone.utc)

from datetime import timedelta

# global variables
study_query_count = 0
historical_query_count = 0
pre_trial = download_date - timedelta(days=7)
during_trial = download_date + timedelta(days=7)

# loop through json
for conversation in data:
    for message_id, message_details in conversation.get(’mapping’, {}).items():
        # Check if the message is authored by the user
        message = message_details.get(’message’)
        if message and message.get(’author’, {}).get(’role’) == ’user’:
            query_time = message.get(’create_time’)
            # convert query_time from float to datetime
            query_time = datetime.datetime.fromtimestamp(query_time, tz=datetime.timezone.utc)
            # check which time frame query was in
            if pre_trial <= query_time < download_date:
              historical_query_count += 1
            elif download_date <= query_time <= during_trial:
              study_query_count += 1

print(f"Number of queries within study period: {study_query_count}")
print(f"Number of queries before study period: {historical_query_count}")

Appendix F Side Panel Graphics

The picture on the side panel cycles through five different images, reflective of the Eco Score. For each 20 point bracket of the Eco Score (100-80, 79-60, etc.) the image will change to reflect a deteriorating environment.

Appendix G Installation Instructions

Participants were prompted to install their custom extensions linked with their user ID for the study from a ZIP file.

\Description

[Email instructions sent to participants]This screenshot contains the following text: 1. Fill out the following consent form. 1a. Copy this file: (link). 1b. Sign your name as “subject”. 1c. Respond to this email with a copy of that file. 2. Confirm that you are willing to be paid through Zelle. When you email back with the consent form, also send the email or phone number linked to your Zelle account, so that we can pay you upon completion of the study. Note: if this email or phone number is incorrect, and does not link to your Zelle account, you will not receive payment. 3. Complete this pre-survey (link). 4. Install your Chrome Extension. 4a. Download the file in the following link, and unzip it on your computer: (link). 4b. Open Chrome and open Extensions. 4c. Enable developer mode. 4d. Click ”load unpacked” and select the folder you just unzipped. 4e. Make sure this new extension appears in your Chrome Extensions, and that it is turned on. 4f. Open ChatGPT (you may or may not see anything popup right away). 5. For the next 7 days after installing our Extension, please only use ChatGPT on Chrome, with the extension turned on.

Appendix H Read More Document

The Read More button in GPTFootprint links to the following document of additional information.

\Description

[Screenshot of Read More Document]This partial screenshot of the Read More document shows a public Google Docs of Commonly asked Questions, including “How much does it cost to train frontier models?” and “What amount of training cost is actually energy cost?” There are links to sources, as well as brief answers with graphs.

Appendix I Server Participant Tracking

The following is a segment of the data logged by the extension during user studies. The full spreadsheet is private, and accessible only to researchers.

\Description

[Screenshot of spreadsheet of logged queries and popups]This is a sample screenshot of the log of usage available to researchers during the trial period. One column lists participant IDs (i.e. user_09, user_05, etc). The second column lists the date and time of the logged event. The third column lists the label of the logged event (i.e. query, popup_opening, popup_closed, or readmore_clicked).