On Video Game Balancing: Joining Player- and Data-Driven Analytics

Johannes Pfau jopfau@ucsc.edu 0000-0002-8760-5023 and Magy Seif El-Nasr mseifeln@ucsc.edu 0000-0002-7808-1686 University of California, Santa Cruz1156 High StSanta CruzCaliforniaUSA95064

Abstract.

Balancing is, especially among players, a highly debated topic of video games. Whether a game is sufficiently balanced greatly influences its reception, player satisfaction, churn rates and success. Yet, conceptions about the definition of balance diverge across industry, academia and players, and different understandings of designing balance can lead to worse player experiences than actual imbalances. This work accumulates concepts of balancing video games from industry and academia and introduces a player-driven approach to optimize player experience and satisfaction. Using survey data from 680 participants and empirically recorded data of over 4 million in-game fights of Guild Wars 2, we aggregate player opinions and requirements, contrast them to the status quo and approach a democratized quantitative technique to approximate closer configurations of balance. We contribute a strategy of refining balancing notions, a methodology of tailoring balance to the actual player base and point to an exemplary artifact that realizes this process.

video game balancing, survey

^†^†ccs: Information systems Massively multiplayer online games^†^†ccs: General and reference Surveys and overviews^†^†ccs: Human-centered computing User centered design

1. Introduction & Background

Video game balancing is one of the most controversial topics of modern video games, especially in the context of (online) multiplayer games that undergo regular updates, both for competitive Player-versus-Player (PvP) as well as for collaborative Player-versus-Environment (PvE) games and modes. Besides the introduction of new content and bug fixes, balancing adjustments are among the driving factors of game update patches, as the often exploding complexity of choices, the diversity of player expectations and the hardly predictable impact of changes onto the actual adaption and reception of the player base render the problem of finding a well-balanced game state almost impossible. Games of all popular genres employ never-ending balance patch paradigms even years after the launch of a game. These can operate on various layers of in-game choices, such as the playable Champions¹¹1https://www.leagueoflegends.com/en-us/news/tags/patch-notes/ of the Multiplayer Online Battle Arena (MOBA) League of Legends (Games, 2009), equippable weapons²²2https://liquipedia.net/counterstrike/Patches of the First-Person Shooter (FPS) Counter-Strike: Global Offensive (Valve, 2012) or active and passive changes to classes, traits and skills³³3https://en-forum.guildwars2.com/forum/6-game-update-notes/ of the Massively multiplayer online role-playing game (MMORPG) Guild Wars 2 (ArenaNet, 2012).

In contrast to ordinary bug fixes that most of the time have a well-defined optimal solution, working on balancing issues is a steady and repetitive task suffering from the unpredictability and inertia of the player base accommodating to the new game configuration (“a lack of balance […] only becomes apparent after many months of play”)(Hullett et al., 2012), as well as from the diverging and controversial opinions and perceptions from both player base and developers (“after each patch, often the discussion begins again, factoring in new balancing or abilities for each class”)(Lewis and Wardrip-Fruin, 2010).

What makes this process even harder to solve is that there is no clear or accepted definition of what balancing means in the context of video games, or even what a well-balanced state or configuration of a game constitutes. Becker and Görlich recently stressed this incongruity, emphasized that this controversial but highly important topic is barely investigated by academia and contrasted various definitions from designers, developers and other practitioners (Becker and Görlich, 2020). They conclude that no two of these authors share identical understandings of game balancing and even if some concepts overlap, no undisputed fundamental concepts of game balancing protrude.

From an industrial perspective, a game might be balanced “if a reasonably large number of options available to the player are viable — especially, but not limited to, during high level play by expert players” according to Sirlin (Sirlin, 2001). This however necessitates a finer distinction on when reasonably large begins and how sharp viable can be defined. While the former is arguably highly game- and context-related, he refers to viable options as offering “meaningful decisions between promising alternatives”. A similar, yet slightly toned-down definition comes from Burgun, who asserts that game balancing is the craft of “keeping game elements relevant” - indicating that dominating strategies or choices (i.e., decisions that are by all means better than their alternatives) are the most harmful factors for the state of balance in a game (Burgun, 2011). Felder approves this take by claiming that these dominating choices express “broken gameplay, which consists of strategies or even singular actions rendering a lot of decisions meaningless”, which becomes detrimental for the overall player experience (Felder, 2015).

An intuitive implementation of viable options across the board would thus be symmetry, i.e. the identical power or expected performance of choices. As such, players are exposed to equal starting conditions, a game becomes fair by definition and only (or predominantly) individual skill matters. While this might be desired in some contexts (and design aspects as flair or variation still can enhance symmetrical game setups), a perfectly symmetrical game bears the risk that there are no interesting choices beyond proven strategies anymore and that the player’s decision actually does not matter (Brown, 2019b; Felder, 2015). This lack of impact on the own performance could again be harmful for a player’s intrinsic motivation. In that sense, even if a certain deviation from perfect symmetry creates a dominating strategy, this could be countered by implementing intransitivity between options in competitive situations, so that single choices display strengths and weaknesses when opposed to different other choices (“while a balanced game can easily lead to a stagnation of strategy discovery, […] slight differences in power between game elements encourage players to constantly search for new solutions against currently popular strategies”) (Becker and Görlich, 2020; Portnow, 2012). Provided that these can be approached with counter-strategies, Fender finds smaller power gaps as a “critical part of game balancing” (Felder, 2015).

From another viewpoint, the balance of options could be assessed by their probability to lead to victory in competitive play (Sylvester, 2013). Especially in these PvP scenarios, objective as well as subjective fairness can be a strong indicator for the perceived balance of a game (Adams, 2014). Industrial approaches of these do not always draw on parameter balancing for the underlying options, but especially in competitive games, matchmaking between players of similar skill is a promising strategy to lead to 50/50 win rates (DeCoster and Rubin, 2019). Yet, Claypool et al. discovered that even if winning chances can be closely approximated to 50%, such as in the highly populated matchmaking of League of Legends, players still subjectively perceive these as unbalanced (Claypool et al., 2015).

In contrast, ample scientific work highlights that difficulty can be the driving factor for balancing PvE scenarios and/or single-player games (Adams, 2014). The field of dynamic difficulty adjustment (DDA) – sometimes also referred to as dynamic difficulty balancing – mainly pushes the understanding of balancing as the adequate (automatic) regulation of difficulty (parameters) in order to keep players within the desirable flow state between mental under- and overload (Csikszentmihalyi, 1990; Hunicke, 2005), or ideally between “too hard” and “too boring” (Lomas et al., 2017). In this respect, if perfect matches are not attainable, mental overload is still seen as producing higher enjoyment than boredom (Klarkowski et al., 2016). Andrade et al. claim that “game balancing aims at providing a good level of challenge for the user” (Andrade et al., 2006), while Volz et al. keep it more general by defining the goal of the adjustment to make sure that “the resulting gameplay is as entertaining as possible” (Volz et al., 2016). While this “modification of parameters of the constitutive and operational rules of a game” (Schreiber, 2010) follows technical procedures similar to the adjustment of parameters for adjusting the viability, symmetry or fairness of in-game choices or strategies, the underlying agenda and purpose considerably differs.

Apart from these dimensions, other perspectives look into the balance between “skill and luck” (Schell, 2008), where skill is the main desired factor for outcome, but chance still makes up for an important mechanic in many games (Adams, 2014).

Conclusively, different fields, communities and purposes announce and use different definitions of balance, the act of balancing and what it means for a game to be (well-) balanced. With this amount of discordance, open issues and even conflicting definitions and understandings, polishing video games towards maximized player satisfaction becomes apparently difficult. Even more, Brown stresses that balance has to take part on every conceivable level of a game, for “all options a game offers, including singular actions and strategies” and maybe even more importantly, “the players’ perception of balance is just as important as the actual balance” (Brown, 2019a). This importance is elevated by Schell who claims that “to identify the right middle ground [of balancing], one has to take the audience into account” (Schell, 2008), Schreiber stating that “players have differences regarding their […] expectations” (Schreiber, 2010) and DeCoster and Rubin proclaiming that developers should be “open about balancing discussions to their players so they can expect solutions to existing problems” (Becker and Görlich, 2020; DeCoster and Rubin, 2019).

The divisiveness of defining balancing notions paired with the reliance on the satisfaction of the player base for the success of the title and the convinced mindset of experts regarding player requirements strongly calls for an integrated player-driven approach of elevating game balancing, which is largely under-investigated in academia. Even though many developers listen to professional players or communities and/or make use of recorded play data in some sense, player expectations are often not matched to balancing decisions and badly implemented update patches can lead to involuntary shifts of player behavior and/or eventual churn (Wang et al., 2020; Tyack et al., 2016; Hyeong et al., 2020). Prominent industrial examples of these can be found in the 1.05 patch of Uncharted 2: Among Thieves (Naughty Dog, 2009) that spoiled multiplayer balance for a major part of the player base as most weapons’ stats became too homogenuous (mismatching symmetry); Tribes: Ascend (Hi-Rez Studios, 2012) (patch 1.0.1103.1) that slowed down (and therefore eased) gameplay while the community was explicitly craving the high speed factor of the game (thus mismatching difficulty); or Street Fighter V (Capcom, 2016) significantly losing players after a balance update (3.5) that buffed already well-performing characters and even nerfed niche fighters (diverging from viability). This is only emphasized by the highly negative perceptions ( $83\%$ ) of the Guild Wars 2 balance update in this work that raised global discontent among the community and led to a decrease of average daily playing time by $13.4\%$ when compared to the previous patch era.

In sum, balancing (online) video games aims at improving the viability of different playable elements (among other factors). This often backfires when balance perceptions mismatch between developers and players, which can happen due to conflicting requirements, unawareness about how players experience game dynamics or because of the many substantially different definitions of balancing. These disagreements can lead to dissatisfaction, impaired player experience and churn, rendering the proper handling and optimization of balance a problem for the industry, games user research and human-computer interaction in general. To counteract this and fortify the understanding between developers (or researchers) and players, we propose to push player-driven game balancing to tightly couple requirements and opinions of existing player communities into the balancing process of a game. While this player-driven approach aggregates the subjective view of a player base (with all of its advantages and drawbacks), we argue that joining it with the similarly under-investigated objective assessment of balance states in video games through data-driven techniques can overcome its limitations by grounding opinions in empirical foundations. This leads to the examination of the following research questions:

•

(RQ1): With so many conflicting theoretical definitions of balancing, how can a game understand and cater to the requirements of its players?
•

(RQ2): How can data-driven analytics help grounding the objectivity of this pool of opinions?

To answer these questions and assess the feasibility of this procedure, we conducted a case study with ( $n=680$ ) novice up to professional players of the MMORPG Guild Wars 2, capture their mindset towards Becker and Görlich’s balance criteria of viability, symmetry, fairness and difficulty, and contrast this to the direction of actual balancing implementations. To validate the appropriateness of these opinions, we draw on data-driven analytics encompassing ( $n_{a}=154,145$ ) unique player accounts and ( $n_{l}=4,318,009$ ) atomic in-game combat logs of the game.

Eventually, by delivering a grounded, systematic approach of determining detailed balance requirements of players - exemplified through an ambitious community case study that is likely to generalize to other games within and without the genre - we contribute to the fields of player-centric design and development, games user research, game evaluation and strive to closer connect academia, industry and the very players.

2. Related Work

For other application cases, player-driven approaches have already delivered promising solutions that exceeded the capabilities of complete in-house implementations. One example would be Da Silva et al.’s utilization of the collective power of a community to retrieve a widely nested narrative and story background merely from player input (Da Silva and Tomimatsu, 2013). Concerning procedural content generation, Shaker et al. highlighted the capabilities of player-centric approaches, including personalization of in-game maps or experiences (Shaker et al., 2012). Partlan et al. utilized participatory design in order to assess requirements and develop design-driven features for co-creative game AI design tools (Partlan et al., 2021). Even completely player-driven game development cycles have been shown to result in novel experiences, dynamic design procedures and central game features that are inherently tailored to the actual target audience (Lessel et al., 2019). Canossa and Drachen argue that increasing the players’ agency and influence on the development process can lead to enhanced experiences and immersion when introducing play-personas for customized gameplay (Canossa and Drachen, 2009) – which arguably extends similarly to continual balancing updates. Eventually, player-driven paradigms might even scale to the large magnitudes of communities that popular modern games accumulate, as Ma et al. indicate in their work on user innovation evaluation strategies, incorporating over 21,000 players that produced novel and sufficiently complex ideas and suggestions (Ma et al., 2019).

However, purely player-driven balancing decisions might still be warped by diverging opinions, missing empirical knowledge about the actual state of the game and the gap of experience (and requirements) between novice and expert players. The arguably most objective measures to counter lacks of knowledge and to condense how games actually play out are empirical data-driven methods (El-Nasr et al., 2013; Wallner, 2019). The majority of these data-driven approaches are stemming from and/or focusing on delivering insights for the game industry (e.g. data mining, classification or prediction) (Drachen and Canossa, 2009), targeting measures against churn (Hadiji et al., 2014), facilitating content generation (Risi and Togelius, 2019), or easing the burden of testing (Albaghajati and Ahmed, 2020) (among other areas). Most of the remaining approaches follow academic interests concerning similar topics or fundamental (psychological or technical) regularities, structures and concepts (Drachen et al., 2018; Yee, 2006), often employing visualizations to gather insights for researchers or analysts (Bowman et al., 2012). Prominent implementations target spatio-temporal movement (Moura et al., 2011; Wallner et al., 2019; Wallner and Kriglstein, 2012; Ahmad et al., 2019), decision making of individual or aggregate players (Loh et al., 2016; Nguyen et al., 2015) or higher-level metrics and statistics (Drachen et al., 2012). Effectively, certain academic approaches already addressed the balancing of viable game options, such as automatic symmetric and intransitive player modeling approaches from Pfau et al. (Pfau et al., 2020), asymmetric Monte-Carlo balancing from Beau and Bakkes (Beau and Bakkes, 2016), Jaffe et al.’s maximization of fair and useful card game cards (Jaffe et al., 2012) or Leigh et al.’s reduction of dominant strategies through coevolution (Leigh et al., 2008). Even if most of these draw on simulation or calculation towards well-balanced game states and (to the best of our knowledge) no academic work included the players’ perspective so far, it is reasonable to hypothesize that similar balancing solutions can follow or implement opinionative inputs.

Ultimately, we want to empower and harness player-driven balancing conceptions informed by data-driven methods. Making video game data transparent, explainable and applicable to its players is already one of the driving topics within the areas of game-related explainable AI and player modeling (Zhu and El-Nasr, 2021; Lucero et al., 2020; Wells and Bednarz, 2021), and comparably holds in the context of balancing. For this reason, we developed, published and populated the player- and data-driven Guild Wars 2 analytics platform Guild Wars 2: Wingman⁴⁴4https://gw2wingman.nevermindcreations.de/, constructed and refined over a 18-month participatory development cycle (Pfau and Seif El-Nasr, 2023b).

In the following sections, we briefly introduce the environment of Guild Wars 2, report on the acquisition and interpretation of the opinions and requirements of its community, ground these conceptions with empirical data, and extract quantifiable factors that led to the development of a democratic player-driven balancing instrument.

3. Game Environment

The game used for the subsequent case study (Guild Wars 2) is a prototypical MMORPG featuring single- and multiplayer content in storylines, open world events, various PvP modes and endgame encounters such as raid bosses, fractals or strike missions. The latter make for a large share of players’ time spent in game and as they are set in fixed scenarios with only few probabilistic factors and established group compositions, strategies and roles, they enable very comparable benchmarks. Based on these, large communities formed around discussions and optimizations to overcome these challenges from which balancing discrepancies can become apparent quickly.

As Guild Wars 2 is specifically designed to not feature power creep mechanics such as increasing level caps or item qualities over time, the vast majority of players of this endgame content participates on an identical or very similar character and equipment attribute level, and performance is mainly influenced by in-game proficiency, mastery of the classes, encounter knowledge, strategy and group composition, which further adds to the comparability of the data with regards to balancing. Still, when accepting certain degrees of noise or acknowledging these factors of variance by clustering players into equipment tiers or approximately subtracting out these confounding variables, the balancing assessment as presented here is likely to produce similarly powerful insights for instanced (group) PvE content in general, such as raids, dungeons, trials or ultimate encounters in World of Warcraft(Blizzard Entertainment, 2004), The Elder Scrolls Online(ZeniMax Online Studios, 2014) or Final Fantasy XIV(Square Enix, 2013).

Guild Wars 2 features a variety of playable class options (from now on referred to as professions), where each of the nine core professions can be expanded by particular specializations that can add further capabilities or open up new roles for this profession (e.g. the Ranger profession can be augmented with the Druid specialization to add a support dimension to the character or with the Soulbeast specialization to increase the damage potential). In theory, the professions of Guild Wars 2 allow abundant combinations of character constellations, such as individually distributed equipment attributes, chosen passive character traits and active weapon type and skill choices. In that way, players can represent and play out different roles within their party, such as dealing direct damage, damage over time, offering support or different degrees of mixtures of these factors. However, for the sake of optimization and role compression within the group, most of the time, these builds are min-maxed towards the roles of maximal damage per second (dps), full support (heal and buff application) or offensive support (dps and buff application). The majority of players uses builds and equipment that maximizes their functionality in one of these roles with only situational variation, following community guides and recommendations. Thus, the data-driven procedure in this paper automatically classifies recorded players into buckets of full support, offensive support, direct damage and damage over time. All professions are (in theory) capable of fulfilling all of these roles, yet their viability and efficiently greatly diverge (and differ with respect to the combated encounters), which constantly raises balancing gaps between builds.

For further terminology, the appendix lists explanations for all game-specific terms used across this work.

4. Survey

To sufficiently understand and represent the mindset of a player community regarding balancing, we distributed a mixed-methods survey among all major important communication channels of the affiliated game. These included the official Guild Wars 2 forums⁵⁵5https://en-forum.guildwars2.com/, the /r/GuildWars2 subreddit⁶⁶6https://www.reddit.com/r/Guildwars2/ and several Discords and community portals of beginner/training as well as speedrun groups, such as Snow Crows⁷⁷7https://snowcrows.com/, Lucky Noobs⁸⁸8https://lucky-noobs.com/, Discretize⁹⁹9https://discretize.eu/, Hardstuck¹⁰¹⁰10https://hardstuck.gg/, The Crossroads Inn and other interested communities. In order to capture the essence of players’ stances on balance and balancing discussions, the survey followed a major game update patch (June 28, 2022)¹¹¹¹11https://wiki.guildwars2.com/wiki/Game_updates/2022-06-28 that impacted the viability and performance of almost all available classes and builds. We specifically focused on the requirements to balance professions for the endgame PvE modes of the game (raids, fractals and strike missions), as these are highly comparable between players and groups, and we do not want to confuse potentially differing notions of balancing between PvP and PvE for this article. These instanced endgame modes are executed on the maximal level of the game, require comparable equipment and are designed on an either 5- or 10-player basis. To keep the questionnaire as understandable for the player base as possible, it frequently uses in-game terminology and game-specific notions that are explained in the appendix (A) of this work for comprehension.

4.1. Construction

To transfer Becker and Görlich’s concepts of balancing (Becker and Görlich, 2020) to actual in-game metrics and mechanisms of Guild Wars 2, we composed a questionnaire of items that unravel these concepts into particular statements. Combining the authors’ expertise with the help of the community around the player-driven analytics tool presented in previous work (Pfau and Seif El-Nasr, 2023b), these statements could be refined and successfully harmonized with in-game observations and metrics, as listed in the following section. Resuming the findings of this upfront discussion, a viable option in the challenging endgame content of Guild Wars 2 is a choice (of a profession or build) that increases the chances of a group’s success and/or increases a group’s efficiency (items V1-3, cf. Table 1). Increasing success is also referred to as providing utility (e.g. heals, buffs, debuffs) for the group, while the efficiency of a profession (or player) can be closest measured by their contribution of damage per second (dps) against the boss or enemies. Thus, items of the questionnaire frequently refer to the theoretical or realistic contribution of dps or utility of a player choosing a specific profession (V4-7). Symmetry could be found if all professions would deal equal amounts of dps and/or utility (S1). Opposed to that, unique buffs, i.e. utility skills or traits that are exclusively provided by specific professions, diverge from symmetry, which carried a special meaning to players in this case, as they were removed with said update patch (S2,3). Fairness means that no matter which option a player chooses, they have a fair chance of achieving success. This can imply that dps values of that profession are competitive to the performances to other players, or that the chance of getting accepted in a group (and not getting kicked) might rather depend on a player’s personal skill than on the profession they choose to play (F1,2). As common in online games, more advanced players and groups publish the most efficient strategies, builds and rotations to play professions and/or bosses - which might impact how players look upon these optimized builds and alternatives to these (F3,4). Eventually, the difficulty of a playable profession or build depends on a variety of design decisions, game mechanics and dynamics bound to that class, which can unfold in the survivability of a class, the speed and complexity of the ideal rotation (i.e. the most efficient skill sequence regarding individual dps; D1-3) or the difficulty of maintaining constant damage uptime throughout the various strategies against the bosses (e.g. the ability of attacking from range versus melee combat, the freedom of being able to move while executing skills versus being animation-locked, or being dependent on movement or action patterns of the bosses; D4). Besides other questions of interest, Section 4.2 outlines the specific items that were used in the actual following survey.

4.2. Measures

We recorded prior experience in the game as well as its main endgame content modes in years, and their satisfaction with the recent update and their agreement with the detailed balancing statements on 7-point Likert scales. To support the validity of our measures, we calculated the discriminant validity between those scales, which resulted in very distinct scales (cf. Table 1; overall $r_{d\_overall}=0.06$ ). We omitted convergent validity in this case, as we did not add similar assessment scales for the sake of survey brevity. Survey responses that consisted of only repeating (or empty) quantitative answers or showed obviously non-serious qualitative replies were excluded from the analysis.

In accordance with the game’s design, the items explicitly concerned builds fully focusing on dps, as support builds are principally dps builds that only sacrifice some damage potential for utility. Above that, participants could indicate which professions needed a buff (viability increase) or nerf (viability decrease) in their opinion and were able to comment on their requirements and perceptions qualitatively. Table 1 outlines all items regarding balancing opinions, attributed to the formerly introduced concepts, whereas Table 2 lists the set of qualitative questions.

	#	Item	$\|r_{d}\|$
Viability	(V1)	Every profession should be viable in endgame PvE (with at least one specialization).	$0.02$
	(V2)	Every specialization should be viable in endgame PvE.	$0.14$
	(V3)	Every profession should have a viable option for power, condition and support.	$0.08$
	(V4)	Power and condition builds should be equally important for endgame PvE.	$0.07$
	(V5)	Selfish dps builds (that do not support the squad otherwise) should exist in the game.	$0.12$
	(V6)	If selfish dps builds exist, they should reach higher dps than builds that contribute elsewise to the squad.	$0.07$
	(V7)	Low-opportunity-cost utility and support skills should lower the overall dps output of a build.	$0.09$
Symm.	(S1)	Every profession build should do equally high dps on average.	$0.07$
	(S2)	Unique buffs construct class identity and lead to more diversity in squad building.	$0.01$
	(S3)	Unique buffs limit the freedom of squad formation and thus are detrimental for class diversity.	$0.01$
Fairness	(F1)	The raid community rejects players playing classes with lower dps benchmarks.	$0.06$
	(F2)	The fractal community rejects players playing classes with lower dps benchmarks.	$0.08$
	(F3)	Speedrun guilds unnecessarily raise the expectations of the endgame PvE community with optimized
		rotations and guides.	$0.12$
	(F4)	Speedrun guilds help defining standards that makes raiding easier for new players in the first place.	$0.1$
Difficulty	(D1)	Less complex builds and rotations should be able to perform decently (to enable beginner-friendly entry
		to endgame PvE).	$0.02$
	(D2)	Ideal execution of more complex rotations/builds should be rewarded with higher outgoing dps (than easy
		rotations/builds).	$0.00$
	(D3)	Poor execution of more complex rotations/builds should be punished with less outgoing dps (than easy
		rotations/builds).	$0.03$
	(D4)	Easier damage uptime (e.g. by attacking from range) should be less dps-rewarding than harder-to-
		optimize damage uptime (e.g. by relying on melee attacks, ground target skills or bigger hitboxes).	$0.04$

Table 1. Questionnaire items of the players’ understanding of balance, tailored to the use case of Guild Wars 2 PvE. All statements were answered on 7-point Likert scales from ”Strongly Disagree” to ”Strongly Agree” and show small discriminant validity (

|r_{d}|

(Q1)	What do you think are the most important factors to consider for balancing Guild Wars 2’s endgame PvE?
(Q2)	How would your ideal configuration of balance look like?
(Q3)	Can you think of balancing decisions that were well-intentioned but unwelcome in the player base?
(Q4)	What do you like about the Summer Balance Patch?
(Q5)	What do you dislike about the Summer Balance Patch?
(Q6)	Do you have any additional remarks or opinions?

Table 2. Qualitative open-ended questions asked subsequently to the questionnaire.

4.3. Procedure

The survey was published on the same day of the formerly mentioned balance update patch and kept open for four weeks to allow a broad range of recruitment and players to get used to the shift in in-game balance, in case they would develop (positive or negative) opinions on implemented changes that correspond to the statements of the questionnaire. It was released over the most popular communication channels and platforms as mentioned in Section 4 and participants were asked to share it with peers and communities in order to reach as much of the audience as possible, while covering the largest parts of the player expertise spectrum. After completing the quantitative parts about their requirements for and understanding of balance (cf. Table 1), they had the opportunity to mention particular builds and professions that needed balancing in either direction, and could comment on their decisions, opinions and mindset through open-ended questions (cf. Table 2). After the collection of survey responses and the compilation of data-driven evidence through in-game combat logs, the outcomes were communicated back to the community in a conservative and as objective as possible manner, which again encouraged discussions and reflections on the notions of balancing.

4.4. Participants

Guild Wars 2’s player base is estimated to exceed 18 million users from which approximately 350,000 players log in and play on a daily basis ¹²¹²12https://mmo-population.com/r/guildwars2. Even if not all of them participate in the modes of challenging endgame PvE, calculating the minimum sample size for this population yields at least 385 responses (assuming a confidence level of 95% and a margin of error of 5%) (Cochran, 1977). After the recruitment period of four weeks, we collected responses from ( $n=680$ ) players in total. As initially anticipated, these stem from the complete spectrum of experience from 0.5 to 10 years in-game ( $M{=}6.6,SD{=}3$ ), 0 to 10 years in 5-player endgame PvE content ( $M{=}3.9,SD{=}2.7$ ) and 0 to 7 years in 10-player raids ( $M{=}2.9,SD{=}2$ ), where the latter only existed for 7 years at this point of time.

5. Survey Results

The following section reveals quantitative outcomes of the balance survey and qualitative comments on the participant’s decisions and requirements. Subsequently, specific professions and builds are highlighted that deemed to be imbalanced by the community, before data-driven imbalance metrics are adduced for empirical comparison in the next section.

Refer to caption — Figure 1. Distribution of survey responses regarding the participants’ stances on the balancing concepts viability, symmetry, fairness and difficulty. Answers to the questions of Table 1 ranged from 1 (”Strongly Disagree”) to 7 (”Strongly Agree”), boxplots indicate means (–), standard deviations (boxes), range (whiskers) and frequencies of an answer (dots).

5.1. Balance Survey Items

Figure 1 visualizes the outcome of the community’s opinion across the game-specific factors of the different balance concepts (cf. Table 1). Regarding viability, players strongly agree that each of the nine professions should be viable in endgame PvE ( $V1:M{=}6.7,SD{=}0.8$ ), but only slightly when it comes to the 27 specializations ( $V2:M{=}4.9,SD{=}1.8$ ). It would be desirable if every profession has a viable build for the main roles of the game, i.e. power damage, condition damage and support ( $V3:M{=}5.1,SD{=}1.9$ ), but even more important, the roles of power and condition builds should be balanced looking at their viability at the different encounters ( $V4:M{=}6.2,SD{=}1.2$ ). Extreme builds without any convenient utility should be part of the game ( $V5:M{=}5.8,SD{=}1.4$ ) and having these drawbacks should definitely be rewarded in the dps outcome ( $V6:M{=}6.1,SD{=}1.3$ ), while bringing utility to the group should considerably lower the dps potential of a build ( $V7:M{=}5.4,SD{=}1.4$ ).

On the other hand, players rather disagree on desiring a perfect symmetry among the playable options when it comes to dps ( $S1:M{=}3.4,SD{=}1.8$ ). This similarly holds for support builds, as having unique professions to provide certain boons is rather desired for class diversity ( $S2:M{=}5.1,SD{=}1.9$ ), which inherently means the opposite of symmetry. Yet, there are mixed opinions on whether this uniqueness actually makes the formation of groups more limited, which would be unfavorable for balancing ( $S3:M{=}3.8,SD{=}1.9$ ).

When asked about the fairness of choice in playing builds and professions at one’s pleasure, players might be potentially concerned if other players would allow their character in a group, even if it provides fewer dps or utility than its alternatives. This is slightly disagreed for both 10-player content ( $F1:M{=}2.9,SD{=}1.8$ ), as well as 5-player content ( $F2:M{=}3.4,SD{=}1.8$ ), the latter being a bit more restrictive on the role coverage. Most build options (the choice of equipment, active skills and passive straits) are heavily influenced by optimized guides written mostly from the experienced speedrun community. This might impact how fair players perceive their own decision making and preferences, yet the community slightly disagrees that this unnecessarily influences fairness of choice ( $F3:M{=}3.3,SD{=}2.2$ ), but rather adjudges that these guides help new players in the first place ( $F4:M{=}5.3,SD{=}1.8$ ).

Finally, the community has a strong opinion towards difficulty and its role in the balance between professions and builds. They welcome that easier options should at least be viable in endgame PvE ( $D1:M{=}5.4,SD{=}1.5$ ), but that higher difficulty should absolutely come along with higher performance ( $D2:M{=}6.5,SD{=}1.1$ ), while failing to execute a more difficult build should equally be punished harder in dps outcome than for easier alternatives ( $D3:M{=}5.3,SD{=}1.6$ ). The same holds for rotations where the damage uptime is harder to realize ( $D4:M{=}5.4,SD{=}1.6$ ).

With respect to the associated game update patch, the community denied to be satisfied with the changes regarding balancing ( $M{=}2.2,SD{=}1.3$ ). There was no significant correlation between overall in-game experience (or of any game mode) and this satisfaction ( $p>0.05$ ), nor did the previously asked balancing opinion scales differ between players that focused on raids, fractals or strike missions.

5.2. Qualitative Analysis

Following structured content analysis (Mayring et al., 2004), we categorized qualitative responses deductively (with labels mainly derived from the formerly denoted balance concepts viability, symmetry, fairness and difficulty) (Becker and Görlich, 2020) and reflect on the most prevalent opinions across the open-ended questions (cf. Table 2). Two members of the research team carried out this labeling independently, before comparing and discussing their outcomes. In cases of conflict, underlying literature was consulted which in the most cases produced consensus between the annotators eventually (judging from the high inter-rater agreement of Fleiss’ $\kappa=0.92$ (Fleiss, 1971)).

Regarding viability, players highlight that they want all classes to be viable (36 mentions), “not every profession has to perform equally but every profession should be useful” (P461) – as long as this is not “homogenizing all professions” (P57), i.e. choices do not become symmetrical. This could be realized by e.g. “aiming for an average bench of 37k [dps] for most classes […] with a 2-3k margin in either direction depending on difficulty of class or on the amount of utility they provide […] This would leave almost everything viable for the majority of groups” (P66). Apart from this, they state that replacing viable options with stronger new options (“power creep”) is a dangerous step as “we want horizontal not vertical movement with elite specs. This is exceedingly important to keep content relevant” (P440).

Perfect symmetry was strongly opposed (45 mentions) and reportedly “destroys the reason why you play a certain spec or even profession. I do not think that a homogeneous cluster of 9 different classes is what GW2 should look forward to” (P57). The surveyed community rather opposes “creating a uniformity” (P75) which would “lead to a very stale raiding environment” (P163) as “making everyone do the same thing, […] removing the flavour from the game is not going to be the solution for making the end game more accessible, it’s only going to remove the uniqueness and the feeling of every class” (P300). This especially holds (but is not constrained to) profession-specific special buffs that “provided diversity and that will be sorely missed” (P582), “[were] more senseful […] to make professions and specs more unique” (P57) and removing this factor “goes against the core of a roleplaying game” (P486). It has to be remarked though that players do not want these factors of a class seen as mandatory, which would restrict the choice of options again, but “unique buffs are good as long as they are situational and not required” (P122).

Participants did not comment extensively on criteria of fairness (17 mentions), but occasionally felt treated unfair when balance decisions went against their own perceptions, e.g. “I was shocked to see mechanist receiving buffs after it was already one of the best specs in endgame PvE. I would love to see more variety in the game” (P142). On the contrary, certain voices express that ideal and optimal viabilities are not the primary factor of playing, as “most of the community prefer playing the class they love and getting the reward, rather than doing a very fast kill with a class they don’t enjoy” (P300) – which is yet challenged by the emergence of a magnitude of players that do not feel rewarded and “there would be no reason whatsoever” (P65) playing a build whose performance is overshadowed by most alternatives.

From the extent of univocal opinions and convictions, difficulty turned out to be the most important factor in estimating the quality of balance in a build, profession or the entire game (83 mentions). Participants agree that a “low barrier to entry is good [and] after entering, skill expression should be rewarded well” (P483), “decently difficult rotations [should lead to] increased DPS, [provided] some low effort average dps classes for newer players [exist]” (P276) and “easier builds/rotations for beginners [are] viable, but high end builds should have more complicated rotations that are rewarded with higher damage or utility” (P297). Participants made sure that this should not primarily preserve a gap between low- and high-skilled players, but the incentive to self-improve and develop should be significant, as games should “teach players […] how to use the combat at higher skill levels” (P184) and “bring back the fun in having something to break your mind into a better player optimization world” (P191). “It should actually be worth it to learn more complicated builds/rotations and performing them well should have a noticable impact” (P297) and “high end difficulty should be kept high to encourage players to learn and reward those who put in the time and effort to understand the game design” (P321). This can also be expressed as bringing ”the skill floor down and not the skill ceiling” (P132), where the skill floor would denote entry levels of performance (“easy-to-learn”) and the skill ceiling stands for ideally “hard-to-master” gameplay that produces formidable and rewarding outcomes. Balancing that fails to acknowledge this requirement “gives people no reason to learn difficult classes but instead to faceroll with easier classes” (P184), as “simplifying all the specs just makes this game boring and unenjoyable” (P457) and even more critically, player experience and perceived fairness would again be impaired (“players not wishing to understand the game or work at improving should not be catered to at the expense of players who want to engage with game mechanics”) (P321). Eventually, players summarize their requirements on the role of difficulty within balancing (and specifically regarding performance) as “classes that can only deal dps [should] deal max possible dps, any boon/heal/support/range/cleave/mobility/etc it has reduces this max dps” (P31) – or, to quantify this statement – “the DPS output of a class should be a function of rotational complexity, boons it can give, self-sustain/squishiness, range/safety, CC contribution, and so on” (P540).

Apart from the already discussed balancing concepts, players certainly acknowledge that “only changing values won’t cut it” (P232), so that flair, in-game dynamics and play styles have an equally important impact on players’ choices of professions; they desire “design notes that clearly state reasoning behind changes” (P393) to follow the intentions of developers and are convinced that player-driven input can positively impact balancing decisions: “Please […] consider the opinions of the community before making balancing decisions” (P383), and “bring the community back into game” (P674).

5.3. Particular Perceptions of Imbalance

To go deeper into the community’s perception of what professions or builds they feel needed a nerf, buff or were quite balanced (before the patch), we additionally asked their opinions so that these could be compared to the actually implemented changes later. Figure 2 displays the most desired changes of specific builds indicated by the survey participants. Most notably, the general direction suggested for condition classes to be nerfed and power classes to be buffed even before the update patch hit. The only exception was the Power Mechanist, which already outshined supporters of a similar role, and Power Catalyst, which although rarely played had a drastically high performance potential on the upper end of the skill ceiling. To give some context, this Catalyst and the Condition Mirage build are the only exceptions among the mentions in Figure 2(a) that demand highly difficult and complex gameplay in order to produce decent performances, while all remaining professions in that column had high dps potentials while only requiring minimal effort and player skill. On the other hand, players desired buffs for professions that are rarely played and/or heavily underperform in comparison to their complexity (cf. Figure 2(b)). For the sake of brevity, we exclude players’ perceptions on well-balanced classes, as only mechanisms to detect imbalances are discussed in the later stages of the paper.

6. Data-Driven Foundation

The former quantitative and qualitative analyses of the opinions of the player community revealed a series of insights that mostly went into a similar direction. While these could be utilized to inform balance decisions already, opinionative findings could be warped or biased by the reached sample, diverging experiences and preferences or the influence the recent game update patch might have had on their perceptions of balance. While we claim that a major shift in game balance does not necessarily change the players’ perceptions of balance itself, but rather point out and highlight one’s own requirements, beliefs and understandings of balance, we still declare that empirical foundations should be consulted to evaluate and ideally solidify the community’s views. To approach this endeavor, we adduce data and tools from the largest analytics platform for Guild Wars 2 endgame PvE (Pfau and Seif El-Nasr, 2023b). At the time of evaluation, it comprised ( $n_{l}=4,318,009$ ) recorded atomic boss logs from ( $n_{a}=154,145$ ) unique players and keeps track over the state of balancing and the game in general for the last five years. We deploy measures about the overall usage of different professions throughout the game modes, as well as empirical (dps) performances on actual boss encounters across all available builds.

6.1. Profession Popularity

While not producing detailed assertions on the efficiency, utility, performance or viability of a profession or build, the popularity or usage of a class over longer periods of time can already indicate and immediately highlight trends in balance shifts. Figure 3 visualizes the popularity of each profession and their respective specializations in 10-player raids over significant balance patch updates of the years 2017 until 2022. The patch that accompanied the survey of this work is elevated in yellow.

At the time of closing the survey (four weeks after the launch of the June 2022 balance patch), we calculated the change in popularity of the specific specializations. The biggest shifts could be found in the usage of the Mechanist ( $+7.1\%$ ), Virtuoso ( $+2.6\%$ ) and Soulbeast ( $+1.5\%$ ), as well as decreasing usage of Chronomancers ( $-1.5\%$ ), Renegades ( $-2.5\%$ ) and Berserkers ( $-6\%$ ). Similar trends with the same affiliated professions could be found for the content of 10-player strike missions and 5-player fractals. The following update implemented end of August 2022 even steepened this development with almost every third player ( $32.7\%$ ) playing Mechanist since then.

6.2. Data-driven balance concept assessment

While the former section already indicated a dominant strategy that threatens to overrule the viability of alternative choices, it lacks expressive power to estimate or explain changes in efficiency or utility. To assess shifts in the performance of particular (damage) builds, we thus add empirical data from the official api of the accompanied analytics platform¹³¹³13https://gw2wingman.nevermindcreations.de/api. Judging from the qualitative statements about their requirements, players (among other things) uttered that balancing should not only revolve around the tip of the iceberg expert players – but incorporate the proficiency spectrum of the whole player community. For this reason, we delve deeper into the potentials of empirically data-driven analyses and consider performance distributions instead of mere top performances. Returning to the formerly discussed balance concepts, we hypothesize that at least symmetry, difficulty and viability of professions or builds can be measured based on empirically recorded evidence, entailing their respective understandings from the community. This section shortly outlines the theoretical quantification of these balance concepts from damage performance distributions, before we apply them to the community’s logs during the study period.

6.2.1. Theoretical Assessment

Figure 4 visualizes the theoretically quantifiable realizations of these concepts in the data of performance distributions. Following the initial definition of symmetry (i.e. identical expected performance of choices), Figure 4(a) depicts perfectly symmetrical performance distributions between the four example options A, B, C and D. Deviations from this symmetry can be quantified by distance metrics as basic as mean squared errors (from assumed equal performance) – and plotted over time (or balance patch eras) to track the impact of these patches onto the empirical performance symmetry. This similarly holds not only for performance distributions, but also formerly mentioned popularity proportions.

With respect to the difficulty of a specific profession or build, we previously identified that most importantly the proficiency gap in mastering its complexity, as well as the challenge of realistically executing it on a efficient level (influenced by factors of damage uptime and boss-related mechanics), determine its outcoming damage performance. This entails that builds with low complexity pose lower risks of producing worse performances, and also that builds with easy damage uptime produce better performances with a higher probability across the distinct boss encounters and player proficiencies. As exemplified in Figure 4(b), builds with higher difficulty would thus lead to an increased variance within performance distributions, i.e. option A depicts example outcomes of a complex profession with hard-to-master damage uptime, while option D represents a simplistic build that is able to consistently deliver its damage across encounters. Even if all of the example distributions are theoretically able to reach the same top values, the impact of difficulty on their realistic performances cannot be neglected.

Assuming that difficulty varies and different builds fluctuate around different median performances, Figure 4(c) showcases differences in viability that can be intuitively visualized and detected. Even if option B is showing significantly lower performances as A, it still should not considered as necessarily unviable, as either some good players of B are still producing better outcomes as some of A or some encounters just favor the usage of B over A, which would open a niche that renders B viable in that case. In contrast, option D strongly dominates B, as all of their performances are strictly higher than B, no matter the player proficiency, boss encounter, group constellation or other factors. This essentially prevents it from being a viable choice, as there would always be a choice that makes B a meaningless decision (assumed it does not support the combat in any other way). Despite C having the lowest median and minima, it would not be considered unviable after this conception, as (depending on player skill or situational usage) there are situations it can shine.

There is no quantifiable measure of fairness in the performance data we consulted yet, as players’ conceptions of fairness rather involve subjective perceptions not covered in these logs. Yet, if following the initial notions of Becker and Görlich (Becker and Görlich, 2020), fairness can manifest in the probability of succeeding with the option a player prefers, which can be empirically validated by measuring the impact and contribution a build makes on the success rate (i.e. proportion of boss kills versus failed attempts) and compared across alternatives. As this work mainly focuses on complying with player-driven understandings, we omit the corresponding analysis for the sake of brevity and focus.

6.2.2. Empirical Assessment

Bringing these empirical assessments together, we tackle the measurement of the balancing concepts on actual recorded data. As bosses in Guild Wars 2 are fundamentally different, they afford the usage and preference of various builds and professions, and performance distributions are therefore inherently disparate, these assessments are executed on a per-boss basis and aggregated eventually. Figure 5 displays the performance distributions of the most prominent builds for one boss during the weeks the survey was active. With respect to this encounter, we can among other things confirm that the high complexity of the Condition Untamed expresses in the comparably high variance of performances, indicating difficulty differences. While the best players of this class achieve almost unparalleled values, these records are also reached from a lot larger share of the Condition Daredevil players, suggesting a lower difficulty. Barring outliers, the missing viability of Chronomancer or Tempest also becomes apparent, as more optimal choices such as Condition Daredevil or Condition Mechanist completely dominate these in this context.

When deployed on the full set of the game’s 24 raid, 9 fractal and 10 strike mission bosses tracked by our platform (through $n_{l}=4,318,009$ combat logs), this approach ranks Condition Untamed, Condition Mirage and Power Catalyst as the currently most difficult (damage) builds of the game, while the Condition Firebrand, Condition Virtuoso and Power Herald are located on the lower end of the difficulty spectrum. This aligns with subjective difficulty ratings of the particular builds composed by experienced professional players of the game¹⁴¹⁴14https://snowcrows.com/builds^,¹⁵¹⁵15https://lucky-noobs.com/builds/condition-daredevil-dps.

Additionally, we accumulated the occurrences of being dominated by another option across all bosses (within the study period, $n_{l}=146,230$ logs). Unsurprisingly, this viability ranking showed most of the game’s core professions (without any beneficial specialization) as least viable. After these, Power Druid, Power Scourge and Condition Chronomancer have shown to be most unviable in the recent endgame PvE of Guild Wars 2, which is attributable to the opposite damage type design of these classes.

7. Discussion

Balancing video games can follow a multitude of different definitions, approaches and takes. To understand why players are satisfied or frustrated with current states of balance and balancing decisions, we collected notions and understandings of academia and industry, carried out a player-centric survey to assess the game-specific nuances and community-driven requirements based on the former categories, and deployed empirical assessments grounded on an abundance of atomic in-game data logs. Bringing these worlds together, the player base of Guild Wars 2 wants all of its nine professions to be viable, with at least some of their currently three choosable specializations, ideally being able to take the role of a (power or condition) damage dealer or support (questions V1-4, cf. Figure 1). This viability can partially be reflected in the popularity, or usage, these professions show in comparison to their alternatives (cf. Section 6.1). Seeing little to no usage of a profession suggests that their viability is threatened, which most commonly correlates with it being not efficient enough, implying a play style that is too complex or unrealistic to execute in actual boss scenarios, or being dominated by alternative choices that are superior in all or most respects. Some voices expressed that players often rather choose their class in endgame PvE on a preference, flair and style basis, but the significant shifts in profession popularity clearly indicate that balancing decisions and the resulting viability of builds are the driving force of what players bring into raids (and other content) how frequently. This can partly be explained by the optimal performance these professions can achieve under ideal circumstances, but balancing after only top players and performances neglects the vast majority of the player base which should be considered to retain satisfaction throughout the community. Thus, we deployed data-driven assessments of which classes are really lacking viability across the board, based on the performances of hundreds of thousands of players within the weeks of the study period (cf. Section 6.2). Admittedly, viability does not stop at afflicting as much damage as possible to the target encounter, but is highly influenced by the amount of utility a build can provide that potentially increases the offensive or defensive capabilities of other players and/or leads to increases in the success rate of the combat. Nevertheless, when striving for as fast and clean boss kills as possible (which is the most common take players seek to implement), this utility can be either theoretically calculated (by its contribution to the overall group efficiency/dps) or empirically measured (contrasting it to logs without this utility). Thus, the discussed viability assessment strategies (based on dps) do not differ systematically from viability based on dps plus the player’s contribution on group performance. This then reflects the community’s requirements on the equilibrium between “selfish” dps builds and those providing extra utility (questions V5-7).

While players have a high demand for viability across options, they clearly disapprove the idea of symmetry (all professions being equally efficient) (question S1), as this could lead to meaningless decisions, boring and stale gameplay, no room for optimization and the loss of identity. This holds for the damage potentials of the various builds, as they rather like to see spectra of difficulty demands, use cases and applications, but also for the utility builds can provide. Referring to the latter, having unique utility effects is a desirable way of adding identity to classes and diversity to group, as long as it does not limit or constrain the group formation (questions S2,3).

The perception of fairness within the surveyed community does not completely align with the understandings from literature. Summarized, players want to be able to choose builds and professions after personal preference and not necessarily after ideal or optimized performances of other players, even if the former is likely to be impacted by the latter. In a sense, the fair probability of succeeding in the game (Becker and Görlich, 2020) can be loosely transferred to having a just chance to be accepted in other player’s groups, and to being able to defeat a boss encounter with the option of one’s preference. However, this rather subjective topic presumably rests on the perception of viability within the community and for the case of Guild Wars 2, players rather find that the community allows non-optimal builds (questions F1,2) and that expectations on single players usually do not have to compete with high-end professional players (questions F3,4).

Judging from both quantitative as well as qualitative responses, difficulty should be one of the most important, if not the major impact factors for balancing. We note that the community’s understanding of this connection slightly differs from academic perspectives, as for most related work, difficulty is a parameter altered by (automatic) balancing, most often to maximize the player’s perception and duration of flow. In this context, difficulty emerges from the complexity of the play style of the profession or build, its (ideal) rotation, the hardness of maintaining damage uptime and similar factors. In that respect, the output to balance is rather the outgoing damage performance of a player, which preferably should be consonant with this difficulty to incentivize rewarding experiences. To a certain extent, both definitions still align if a game provides enough choosable options with different degrees of complexity and accordingly compensating damage performances, so that by choosing the most suitable option, players balance themselves into the proper sweet spot between boredom and frustration (flow). The community desires accessible gameplay for players throughout the entire proficiency spectrum (question D1), but easier builds should be outperformed by more complex ones (question D2) if players executing those are actually capable of fulfilling the higher demands of difficulty and realizability (questions D3,4). Detecting differences in this conception of difficulty of a class turned out to be suitable when quantifying the variance across performance distributions of single builds (cf. Section 6.2). This assessment is based on the observation and statements about inefficient players producing lower performances when playing more complex builds, compared to executing more forgiving and easy rotations.

Ultimately, the major balance patch released in June 2022 was primarily perceived as a deterioration of the state of balancing, judging from the majority (83%) of negative satisfaction responses to this survey as well as from the overall reception displayed in the previously mentioned communication channels. Interpreting the qualitative and quantitative answers from this community sample and contextualizing these with the introduced changes, the balance patch contradicted the requirements of the player base in multiple ways. This can be attributed to a shift in symmetry, as effects that made a number of classes unique were removed, leading to less diversity in group formation, which is not what players were looking forward to. The same change also impacted a series of power classes higher than condition builds, which significantly decreased their viability and performance outcome distributions, up to the point where (both damage and support builds) were completely dominated by other choices. On top of that, opinions on builds that should have been nerfed were either not fulfilled or even reversed, e.g. the unwelcome buff of the Power Mechanist or the Power Chronomancer nerf that went against their expectations (cf. Section 5.3).The biggest perception of unfairness in balancing was attributed to the mismatch of difficulty and outcome. This becomes apparent when observing the popularity across professions (cf. Figure 3) in which the Mechanist, Firebrand and Virtuoso alone make up the absolute majority of the used options – consistently these builds who are easy to play while afflicting decent damage with high uptime and/or providing massive group utility. This is not only reflected in their popularity, but also in the empirical performance assessment. In the meantime, more difficult professions lost usage as under realistic circumstances, they could no longer compete with performances of simplistic builds and rewarding playing experiences diminished.

Nevertheless, the abundance of qualitative and quantitative feedback from the player community motivated the developers to increase the viability of a series of neglected or negatively affected builds in promptly following balancing updates. This again emphasized the vital importance – but also their validity and appropriateness – of the opinions and requirements of a game’s community. Throughout this work, we compiled the advantages and possibilities of player-driven game balancing from related academic work and qualitative as well as quantitative responses of a player base, as long as they are backed up from empirical evidence data. In immediately succeeding work, we utilized the now extracted insights and measures for the construction of a tool that seeks to overcome occurrences of imbalance (Pfau and Seif El-Nasr, 2023a).

8. Synopsis & Generalization

In the following, we will reiterate the methodology of connecting player-driven surveys, data-driven assessments and deriving an instrument towards balancing video game elements, while answering the priorly posed research questions with the help of a case study targeting balance in Guild Wars 2.

Balancing (online) video games adjusts different playable elements for equality, viability, popularity or other factors. In related scientific literature, myriads of (even conflicting) definitions and conceptions of balance and balancing exist (Becker and Görlich, 2020), every industrial game company decides for their own personal strategy (Scacchi, 2017) and above that, balance perceptions of the eventual player bases can deviate from these as well (Schreiber, 2010). This disagreement frequently backfires when balance perceptions of developers do not align with opinions and requirements of the players, which harms player experience, satisfaction, retention and critical as well as economical success of games. Thus, we approach the answering of (RQ1): “With so many conflicting theoretical definitions of balancing, how can a game understand and cater to the requirements of its players?” by a mixed-methods survey grounded in concepts of scientific literature. As the game elements, mechanics and dynamics essentially predetermine which balance concepts are important for a game, we first identified these crucial features with the help of a subset of experienced players (in the case of Guild Wars 2, this resulted in viability, symmetry, fairness and difficulty, cf. Section 4.1). The next step connected these higher-level concepts to in-game terms and ideas that players can understand (cf. Table 1). Together with qualitative assessments, surveys of these can give in-depth insights about how and why a target player base demands, rejects or is indifferent to specified balance concepts (cf. Sections 5.1, 5.2). Regarding our case study, a representative sample of the Guild Wars 2 community deemed multiple viability criteria as important, but strongly opposed that every playable class should produce symmetrical performance outputs or follow similar designs. This equally bears implications for recent scientific work that explicitly balanced towards symmetrical outcomes in the same or similar game genres (Pfau et al., 2022). Above that, in contrast to academic literature on difficulty balancing (Andrade et al., 2006; Tijs et al., 2008; Volz et al., 2016), we observed a clear trend in the community that sees difficulty not as the variable to adjust. Rather, they desire a variety of playable options that differ in difficulty – while the adjusted metric is rather the damage potential (and variance) it can offer, based on that difficulty. Details of these insights, paired with specific perceptions of imbalanced elements (cf. Section 5.3) thus reveal requirements that do not implicitly follow from literature (or even contradict those), which can help developers to understand the needs of their player base and researchers to extend existing notions of balance.

Nevertheless, one of the biggest drawbacks of subjective or opinionative assessments remains in the fact that player perceptions can easily deviate from actual circumstances in the game. This can happen due to unawareness of the big picture, inexperience of the game, effects of echo chambers when steadily playing with the same set of people and obviously because player-accessible higher- and lower-level analytics are not the standard, but rather uncommon for most games (Wallner, 2018). With this in mind, we want to answer (RQ2): “How can data-driven analytics help grounding the objectivity of this pool of opinions?” twofold. First, we present how (subjective) player-driven insights produced by the previous step can be consolidated by means of (objective) data-driven techniques. This can be seen in the changes within popularity of in-game elements (cf. Section 6.1) that go along with players’ perceptions of reduced viability. When locating dominant strategies moreover, relying on theoretical (or ideal) peak performances is a popular method that can give rough first insights, but yet does not always translate to realistic performances in actual combats. More importantly, empirical balance assessments should go even beyond that and incorporate distributions of performances which can reflect measures of difficulty and symmetry from the data (cf. Section 6.2). If these align with the former subjective outcomes, they arguably support prior insights with empirical evidence, fortifying the conclusions for suitable balancing decisions. Second, we aim to support answering this research question by reducing the gap between subjectivity and objectivity of players, so that requirements and opinions follow informed understandings instead of instinctive estimates. To assess how to educate players or entire communities, we introduce an empirical analytics tool for Guild Wars 2 that follows player-driven design and embed the balancing assessments of this work into it (Pfau and Seif El-Nasr, 2023b). This succeeds (position) papers that call for further research on approaches to player-centric analytics for the benefit of both players and science (Kleinman and El-Nasr, 2021; Wallner et al., 2021).

Having subjective as well as objective measures of in-game imbalances and well-balanced target states, we evaluate the integration of both into an interactive instrument in parallel work (Pfau and Seif El-Nasr, 2023a). While the final approach of assessing a player’s opinion of balance (based only on one visualization about performance distributions) is still shallow and an educated perception of balance arguably longs for experiencing the state of the game by playing it a fair amount of time, we still claim to support the assessment and regulation of balance, and strive to evaluate the impact of player-driven balance implementations in live patches. In order to showcase the applicability of this endeavor and to give a hands-on example that player communities are interested in and capable of being incorporated in balancing, we utilized Guild Wars 2 as a fitting, popular and contemporary use case. Yet, the proposed methodology would certainly also work for instanced (group) PvE content in general, such as raids, dungeons, trials or ultimate encounters in World of Warcraft(Blizzard Entertainment, 2004), The Elder Scrolls Online(ZeniMax Online Studios, 2014) or Final Fantasy XIV(Square Enix, 2013). The underlying balance concepts (not limited to viability, symmetry, difficulty and fairness) and the aggregation of a community’s performance requirements however are arguably generalizable and similarly applicable for single-player or competitive PvP settings such as in balancing champions of Multiplayer Online Battle Arena (MOBA) games (e.g. League of Legends (Games, 2009) or Dota 2 (Valve, 2013)) or class-based first-person shooters (such as Valorant (Riot Games, 2020) or Borderlands 3 (Gearbox Software, 2019)) – which yields great potential for player-driven balancing.

9. Limitations & Future Work

The endeavor of balancing, especially with the multitude of playable options, classes and builds of modern online games, is as complicated as finding a convincing definition of balance in the first place. Thus, the presented work undergoes a number of limitations, partly caused by the disparity of the conceptions and partly because of realistic restrictions within the methodology. Finding balanced configurations in a popular world-class video game that would suit requirements of hundreds of thousands to millions of active players without disappointing anyone is arguably impossible. To this respect, we could only reach a comparably minor part of the entire audience of players, even if the sample size is large enough to enable drawing conclusions. In order to accumulate this sample, we broadcasted this survey through the most popular communication channels (outside the game) and made sure to reach players on the full spectrum of game experience, but admit that those players interested in balancing (and in expressing their opinion) were more likely to respond to this survey and might have skewed the results. The topicality of the survey, as directly accompanying a major balance patch, might have activated more players that are unsatisfied with the changes and thus introduced a further bias. However, the gist of this work focuses on the underlying and basic understandings of balancing itself, which should not be systematically changed, but rather clarified within players by conceiving contemporary examples of what should or should not happen in proper balancing procedures. Findings related to the very patch are somewhat discussed, but placed back in favor for implications of (player-driven) balancing in general.

Developers are moreover responsible to push novelty, enjoyment and shifts out of rigidly stuck constellations, in order to keep their game innovative, interesting and economically competitive. We neglected factors of flair, intrinsic motivation of play styles and further variables not inherently related to performance for now, but acknowledge that player choices and preferences are not completely rational (with respect to efficiency and strategy optimization). The significant changes in profession popularity yet indicate that power and viability of choices do have a drastical impact on what players play in endgame PvE.

In their meta review on the definitions of balancing, Becker and Görlich list even further possible concepts appearing in related work, such as chance, (in)transitivity, positive/negative feedback, economies, costs, rewards or static versus dynamic balancing. While some of them turned out to be inapplicable to the domain of endgame PvE in Guild Wars 2, certain factors such as positive and negative feedback can be identified when attributed to the success of playing out an ideal rotation. Future work will look into the intricacies of single builds and rotations to pinpoint the impact of these points onto balancing perceptions instead of involving them simply inside of difficulty.

The biggest critique against utilizing player-driven balancing opinions came from some players themselves that proclaimed that “players do not know what is good for them anyway”. While this might hold to a certain extent and the driving mechanical, dynamical and design decisions should undoubtedly stem from the mindsets of developers, this work counted on the wisdom of crowds of a community, which (when validated with data-driven evidence) produces reasonable and thoughtful insights - often in accord with parallel scientific research. Eventually, balance decisions impact player experience, satisfaction and the critical and economical success of a game over longer terms, so tailoring it to the needs of the actual audience cannot be evaluated enough.

For future work, we mainly seek to extend and unravel the nuances of play styles, builds and implications for balancing. These will be incorporated in the lastly introduced player-driven balancing tool (Pfau and Seif El-Nasr, 2023a) to portray ideal perceptions of balance versus current empirical constellations down to the lowest possible detail. Even though we evaluated this tool in terms of balance perceptions regarding theoretical aggregated configurations, the implementation of such produced balancing decisions into actual gameplay and evaluation of the subsequent player experiences is an important open endeavor. In pursuing this path, we strive to tighten the connection between players and developers – as well as between industry and academia.

10. Conclusion

Balancing choosable or even customizable options is an interminable, controversial and considerable process for a game, its developers and players alike. Along with the introduction of new content and fixes for bugs, balancing changes are one of the major causes of update patches for online games, display never-ending experience optimization problems and highly impact how players play a game altogether. Scientific efforts to study balance are divided into several understandings of the term, even partly conflicting. When it comes to this definition of adjusting for equated or appropriate adjustments of in-game options, related work is largely under-investigated and mostly regards simulation- or computation-based approaches to even out viability across choices. The role, perception and requirements of the player or even the actual player community have not been considered in academia so far, despite bearing considerable implications for games user research, game design and human-computer interaction in general. For these reasons, we aggregated notions of balance from adjacent research and industrial perceptions, recruited ( $n=680$ ) players of the MMORPG Guild Wars 2 (as a popular representative of online games undergoing constant rebalancing) and refined a game-specific understanding through quantitative as well as qualitative survey items. Analyzing and interpreting the players’ requirements, opinions and mindsets explained their reactions on recently implemented balance changes, enabled collateral data-driven assessments, paved the way for finding community agreements on balancing through an interactive democratic tool and entails (or approves) the following implications that likely hold for comparable games and larger concepts of balancing:

•

Players crave diversity of in-game choices and a high viability across these choices - Balancing should identify and address dominating choices that render alternatives irrelevant.
•

Perfect symmetry between the outcome of choices would implicitly entail balanced viability, but this risks leaving decisions meaningless - so different options should display different strengths, weaknesses, use cases and challenges.
•

Improper balance can evoke feelings of unfairness when common factors such as flair or preference influence players’ decisions - this might lead to decreasing satisfaction or potentially even churn.
•

While most academic approaches manipulate difficulty to balance a player’s flow state, players’ perceptions of difficult choices in online games strongly presuppose rewarding incentives - this does not have to be extrinsic, but should lead to an increased (perception of) efficiency and/or competence.
•

In order to optimize satisfaction and experience, players desire to be part of the balancing process - investigating and quantifying balance requirements in this player-driven process elevates the very player’s role and trailblazes novel approaches of engaging, binding and tailored games.

Acknowledgments

We thank all participants of the survey for their contribution on player-driven balancing analytics, as well as all users of our platform for sharing their extensive playing histories, constant feedback and ongoing discussions. We moreover are grateful for ArenaNet for developing Guild Wars 2. Guild Wars 2 and all associated logos, designs, and composite marks are trademarks or registered trademarks of NCSOFT Corporation or ArenaNet, LLC, respectively. ©2021 NCSOFT Corporation, ©2021 ArenaNet, LLC. All rights reserved.

References

(1)
Adams (2014) Ernest Adams. 2014. Fundamentals of game design. Pearson Education.
Ahmad et al. (2019) Sabbir Ahmad, Andy Bryant, Erica Kleinman, Zhaoqing Teng, Truong-Huy D Nguyen, and Magy Seif El-Nasr. 2019. Modeling individual and team behavior through spatio-temporal analysis. In Proceedings of the Annual Symposium on Computer-Human Interaction in Play. 601–612.
Albaghajati and Ahmed (2020) Aghyad Mohammad Albaghajati and Moataz Aly Kamaleldin Ahmed. 2020. Video game automated testing approaches: An assessment framework. IEEE Transactions on Games (2020).
Andrade et al. (2006) Gustavo Andrade, Geber Ramalho, Alex Gomes, and Vincent Corruble. 2006. Dynamic game balancing: An evaluation of user satisfaction. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 2. 3–8.
ArenaNet (2012) ArenaNet. 2012. Guild Wars 2. Game [PC]. ArenaNet, Bellevue, Washington, USA..
Beau and Bakkes (2016) Philipp Beau and Sander Bakkes. 2016. Automated game balancing of asymmetric video games. In 2016 IEEE conference on computational intelligence and games (CIG). IEEE, 1–8.
Becker and Görlich (2020) Alexander Becker and Daniel Görlich. 2020. What is Game Balancing? - An Examination of Concepts. ParadigmPlus 1, 1 (2020), 22–41.
Blizzard Entertainment (2004) Blizzard Entertainment. 2004. World of Warcraft. Game [PC]. Blizzard Entertainment, Irvine, California, USA..
Bowman et al. (2012) Brian Bowman, Niklas Elmqvist, and TJ Jankun-Kelly. 2012. Toward visualization for games: Theory, design space, and patterns. IEEE transactions on visualization and computer graphics 18, 11 (2012), 1956–1968.
Brown (2019a) Marc Brown. 2019a. How games get balanced. https://www.youtube.com/watch?v=WXQzdXPTb2A
Brown (2019b) Marc Brown. 2019b. Why are games so hard to balance? https://www.youtube.com/watch?v=K3n-Sy2Ko4I
Burgun (2011) Keith Burgun. 2011. Understanding balance in video games. Gamasutra. Available online at: https://www. gamasutra. com/view/feature/134768/understanding_ balance_in_video_. php (accessed April 7, 2020) (2011).
Canossa and Drachen (2009) Alessandro Canossa and Anders Drachen. 2009. Patterns of Play: Play-Personas in User-Centred Game Development.. In DiGRA Conference.
Capcom (2016) Capcom. 2016. Street Fighter V. Game [PC,PS4]. Capcom, Osaka, Japan..
Claypool et al. (2015) Mark Claypool, Jonathan Decelle, Gabriel Hall, and Lindsay O’Donnell. 2015. Surrender at 20? Matchmaking in league of legends. In 2015 IEEE Games Entertainment Media Conference (GEM). 1–4. https://doi.org/10.1109/GEM.2015.7377234
Cochran (1977) William G Cochran. 1977. Sampling techniques. John Wiley & Sons.
Csikszentmihalyi (1990) Mihaly Csikszentmihalyi. 1990. Flow: The psychology of optimal experience. Vol. 1990. Harper & Row New York.
Da Silva and Tomimatsu (2013) Sylker Teles Da Silva and Kiyoshi Tomimatsu. 2013. Game prototyping with community-driven narrative: Actor-network theory applied for Massively Multiplayer Online Games development. In 2013 IEEE 2nd Global Conference on Consumer Electronics (GCCE). IEEE, 376–378.
DeCoster and Rubin (2019) Rym DeCoster and Scott Rubin. 2019. PAX South 2018 – Balance in Game Design. https://www.youtube.com/watch?v=NXD8YQ7j_Qk
Drachen and Canossa (2009) Anders Drachen and Alessandro Canossa. 2009. Towards gameplay analysis via gameplay metrics. In Proceedings of the 13th international MindTrek conference: Everyday life in the ubiquitous era. 202–209.
Drachen et al. (2018) Anders Drachen, Pejman Mirza-Babaei, and Lennart E Nacke. 2018. Games user research. Oxford University Press.
Drachen et al. (2012) Anders Drachen, Rafet Sifa, Christian Bauckhage, and Christian Thurau. 2012. Guns, swords and data: Clustering of player behavior in computer games in the wild. In 2012 IEEE conference on Computational Intelligence and Games (CIG). IEEE, 163–170.
El-Nasr et al. (2013) Magy Seif El-Nasr, Anders Drachen, and Alessandro Canossa. 2013. Game analytics. Springer.
Felder (2015) Dan Felder. 2015. Design 101: Balancing Games. https://www.gamasutra.com/blogs/DanFelder/20151012/251443/Design_101_Balancing_Games.php
Fleiss (1971) Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin 76, 5 (1971), 378.
Games (2009) Riot Games. 2009. League of Legends. Game [PC]. Riot Games, Los Angeles, California, USA..
Gearbox Software (2019) Gearbox Software. 2019. Borderlands 3. Game [PC,PS4,XBoxOne]. Gearbox Software, Frisco, Texas, USA..
Hadiji et al. (2014) Fabian Hadiji, Rafet Sifa, Anders Drachen, Christian Thurau, Kristian Kersting, and Christian Bauckhage. 2014. Predicting player churn in the wild. In 2014 IEEE Conference on Computational Intelligence and Games. Ieee, 1–8.
Hi-Rez Studios (2012) Hi-Rez Studios. 2012. Tribes: Ascend. Game [PC]. Hi-Rez Studios, Alpharetta, Georgia, USA..
Hullett et al. (2012) Kenneth Hullett, Nachiappan Nagappan, Eric Schuh, and John Hopson. 2012. Empirical analysis of user data in game software development. In Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement. 89–98. https://doi.org/10.1145/2372251.2372265
Hunicke (2005) Robin Hunicke. 2005. The case for dynamic difficulty adjustment in games. In Proceedings of the 2005 ACM SIGCHI International Conference on Advances in computer entertainment technology. 429–433.
Hyeong et al. (2020) Ji Hyeon Hyeong, Kang Jun Choi, Jae Young Lee, and Tae-Hyung Pyo. 2020. For whom does a game update? Players’ status-contingent gameplay on online games before and after an update. Decision Support Systems 139 (2020), 113423.
Jaffe et al. (2012) Alexander Jaffe, Alex Miller, Erik Andersen, Yun-En Liu, Anna Karlin, and Zoran Popovic. 2012. Evaluating competitive game balance with restricted play. In Eighth Artificial Intelligence and Interactive Digital Entertainment Conference.
Klarkowski et al. (2016) Madison Klarkowski, Daniel Johnson, Peta Wyeth, Mitchell McEwan, Cody Phillips, and Simon Smith. 2016. Operationalising and evaluating sub-optimal and optimal play experiences through challenge-skill manipulation. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 5583–5594.
Kleinman and El-Nasr (2021) Erica Kleinman and Magy Seif El-Nasr. 2021. Using Data to” Git Gud”: A Push for a Player-Centric approach tothe Use of Data in Esports. (2021).
Leigh et al. (2008) Ryan Leigh, Justin Schonfeld, and Sushil J Louis. 2008. Using coevolution to understand and validate game balance in continuous games. In Proceedings of the 10th annual conference on Genetic and evolutionary computation. 1563–1570.
Lessel et al. (2019) Pascal Lessel, Maximilian Altmeyer, and Nicolas Brauner. 2019. Crowdjump: Investigating a Player-Driven Platform Game. In Proceedings of the Annual Symposium on Computer-Human Interaction in Play. 149–159.
Lewis and Wardrip-Fruin (2010) Chris Lewis and Noah Wardrip-Fruin. 2010. Mining game statistics from web services: a World of Warcraft armory case study. In Proceedings of the Fifth International Conference on the Foundations of Digital Games. 100–107.
Loh et al. (2016) Christian Sebastian Loh, I-Hung Li, and Yanyan Sheng. 2016. Comparison of similarity measures to differentiate players’ actions and decision-making profiles in serious games analytics. Computers in Human Behavior 64 (2016), 562–574.
Lomas et al. (2017) J Derek Lomas, Kenneth Koedinger, Nirmal Patel, Sharan Shodhan, Nikhil Poonwala, and Jodi L Forlizzi. 2017. Is difficulty overrated? The effects of choice, novelty and suspense on intrinsic motivation in educational games. In Proceedings of the 2017 CHI conference on human factors in computing systems. 1028–1039.
Lucero et al. (2020) Crisrael Lucero, Christianne Izumigawa, Kurt Frederiksen, Lena Nans, Rebecca Iden, and Douglas S Lange. 2020. Human-Autonomy Teaming and Explainable AI Capabilities in RTS Games. In International Conference on Human-Computer Interaction. Springer, 161–171.
Ma et al. (2019) Jifeng Ma, Yaobin Lu, and Sumeet Gupta. 2019. User innovation evaluation: Empirical evidence from an online game community. Decision Support Systems 117 (2019), 113–123. https://doi.org/10.1016/j.dss.2018.11.003
Mayring et al. (2004) Philipp Mayring et al. 2004. Qualitative content analysis. A companion to qualitative research 1, 2 (2004), 159–176.
Moura et al. (2011) Dinara Moura, Magy Seif El-Nasr, and Christopher D Shaw. 2011. Visualizing and understanding players’ behavior in video games: discovering patterns and supporting aggregation and comparison. In ACM SIGGRAPH 2011 game papers. 1–6.
Naughty Dog (2009) Naughty Dog. 2009. Uncharted 2: Among Thieves. Game [PS3]. Naughty Dog, Santa Monica, California, USA..
Nguyen et al. (2015) Truong-Huy D Nguyen, Magy Seif El-Nasr, and Alessandro Canossa. 2015. Glyph: Visualization Tool for Understanding Problem Solving Strategies in Puzzle Games. In Proceedings of the 10th International Conference on the Foundations of Digital Games (FDG 2015). Foundations of Digital Games 2015, FDG 2015 ; Conference date: 22-06-2015 Through 22-06-2015.
Partlan et al. (2021) Nathan Partlan, Erica Kleinman, Jim Howe, Sabbir Ahmad, Stacy Marsella, and Magy Seif El-Nasr. 2021. Design-Driven Requirements for Computationally Co-Creative Game AI Design Tools. In The 16th International Conference on the Foundations of Digital Games (FDG) 2021. 1–12.
Pfau et al. (2020) Johannes Pfau, Antonios Liapis, Georg Volkmar, Georgios N Yannakakis, and Rainer Malaka. 2020. Dungeons & replicants: automated game balancing via deep player behavior modeling. In 2020 IEEE Conference on Games (CoG). IEEE, 431–438.
Pfau et al. (2022) Johannes Pfau, Antonios Liapis, Georgios N Yannakakis, and Rainer Malaka. 2022. Dungeons & Replicants II: Automated Game Balancing Across Multiple Difficulty Dimensions via Deep Player Behavior Modeling. IEEE Transactions on Games (2022).
Pfau and Seif El-Nasr (2023a) Johannes Pfau and Magy Seif El-Nasr. 2023a. Balancing Video Games: A Player-Driven Instrument. In Companion Proceedings of the Annual Symposium on Computer-Human Interaction in Play (CHI PLAY ’23 Companion). 1–8.
Pfau and Seif El-Nasr (2023b) Johannes Pfau and Magy Seif El-Nasr. 2023b. Player-Driven Game Analytics: The Case of Guild Wars 2. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–14.
Portnow (2012) James Portnow. 2012. Perfect imbalance–why unbalanced design creates balanced play. https://www.youtube.com/watch?v=e31OSVZF77w
Riot Games (2020) Riot Games. 2020. Valorant. Game [PC]. Riot Games, Los Angeles, California, USA..
Risi and Togelius (2019) Sebastian Risi and Julian Togelius. 2019. Procedural content generation: from automatically generating game levels to increasing generality in machine learning. arXiv preprint arXiv:1911.13071 (2019).
Scacchi (2017) Walt Scacchi. 2017. Practices and technologies in computer game software engineering. IEEE Software 34, 1 (2017), 110–116.
Schell (2008) Jesse Schell. 2008. The Art of Game Design: A book of lenses. CRC press.
Schreiber (2010) Ian Schreiber. 2010. Game Balance Concepts. A continued experiment in game design and teaching. https://gamebalanceconcepts.wordpress.com/2010/07/07/level-1-intro-to-game-balance/
Shaker et al. (2012) Noor Shaker, Georgios N Yannakakis, and Julian Togelius. 2012. Towards player-driven procedural content generation. In Proceedings of the 9th conference on Computing Frontiers. 237–240.
Sirlin (2001) David Sirlin. 2001. Balancing Multiplayer Games. https://www.sirlin.net. https://www.sirlin.net/articles/balancing-multiplayer-games-part-1-definitions
Square Enix (2013) Square Enix. 2013. Final Fantasy XIV. Game [PC,OSX,PS3]. Square Enix, Tokyo, Japan..
Sylvester (2013) Tynan Sylvester. 2013. Designing games: A guide to engineering experiences. ”O’Reilly Media, Inc.”.
Tijs et al. (2008) Tim JW Tijs, Dirk Brokken, and Wijnand A IJsselsteijn. 2008. Dynamic game balancing by recognizing affect. In International Conference on Fun and Games. Springer, 88–93.
Tyack et al. (2016) April Tyack, Peta Wyeth, and Daniel Johnson. 2016. The appeal of moba games: What makes people start, stay, and stop. In Proceedings of the 2016 annual symposium on computer-human interaction in play. 313–325.
Valve (2012) Valve. 2012. Counter-Strike: Global Offensive. Game [PC,PS,XBox]. Valve, Bellevue, Washington, USA..
Valve (2013) Valve. 2013. Dota2. Game [PC]. Valve, Bellevue, Washington State, USA..
Volz et al. (2016) Vanessa Volz, Günter Rudolph, and Boris Naujoks. 2016. Demonstrating the feasibility of automatic game balancing. In Proceedings of the Genetic and Evolutionary Computation Conference 2016. 269–276.
Wallner (2018) Günter Wallner. 2018. Automatic generation of battle maps from replay data. Information Visualization 17, 3 (2018), 239–256.
Wallner (2019) Günter Wallner. 2019. A brief overview of data mining and analytics in games. Data analytics applications in gaming and entertainment (2019), 1–14.
Wallner et al. (2019) Günter Wallner, Nour Halabi, and Pejman Mirza-Babaei. 2019. Aggregated visualization of playtesting data. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.
Wallner and Kriglstein (2012) Günter Wallner and Simone Kriglstein. 2012. A spatiotemporal visualization approach for the analysis of gameplay data. In Proceedings of the SIGCHI conference on human factors in computing systems. 1115–1124.
Wallner et al. (2021) Günter Wallner, Marnix Van Wijland, Regina Bernhaupt, and Simone Kriglstein. 2021. What Players Want: Information Needs of Players on Post-Game Visualizations. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–13.
Wang et al. (2020) Qi Wang, Yi Yang, Zhengren Li, Na Liu, and Xiaohang Zhang. 2020. Research on the influence of balance patch on players’ character preference. Internet Research (2020).
Wells and Bednarz (2021) Lindsay Wells and Tomasz Bednarz. 2021. Explainable ai and reinforcement learning—a systematic review of current approaches and trends. Frontiers in artificial intelligence 4 (2021), 550030.
Yee (2006) Nick Yee. 2006. The psychology of massively multi-user online role-playing games: Motivations, emotional investment, relationships and problematic usage. In Avatars at work and play. Springer, 187–207.
ZeniMax Online Studios (2014) ZeniMax Online Studios. 2014. The Elder Scrolls Online. Game [PC,OSX,PS4,XboxOne]. ZeniMax Online Studios,Hunt Valley, Maryland, USA..
Zhu and El-Nasr (2021) Jichen Zhu and Magy Seif El-Nasr. 2021. Open Player Modeling: Empowering Players through Data Transparency. In Proceedings of AIIDE Workshop on Experimental AI in Games (ExAG ’21).

Appendix A Game Terminology

Table 3. Terms especially used in Guild Wars 2, the genre of MMORPGs or their analytics

atomic	(as in atomic actions): Logs of the game utilized in the data-driven evaluation part are recorded on the
atomic	lowest-level possible, i.e. down to skill usage and character movement on a frame-by-frame logging basis.
boons	Temporary positive effects that increase character stats or yield utility, most often provided by support builds.
buff	Increase in damage, utility or general viability of a profession or build caused by balancing adjustments.
buff	Opposed to nerf.
build	The customizable configuration of equipment, skills and traits of a profession. Most builds target the optimization
build	of power or condition damage output, maximizing utility or hybrid versions of these.
class	The overarching archetype for each character, referred to as profession in Guild Wars 2.
condition	Temporary negative effects that deal damage over time on an enemy or weaken their stats.
condition	Some builds are optimized to deal condition damage in contrast to direct power damage.
damage uptime	The ability of a build (or player) to consistently deliver damage, mainly influenced by factors such as survivability,
	range of skills, freedom of movement while executing skills, adaptability of the ideal rotation to live combat, and
	dependence on other factors such as the size of the enemy’s hitbox, attack delay or movement patterns.
dps	(damage per second): The theoretical or empirical damage a build or player afflicts onto their target(s).
endgame	In Guild Wars 2, PvE endgame content is mainly carried out in instanced dungeons for five players (fractals of the
	mists) or ten players (raids and strike missions). As it features no power creep or item spiral and most players
	follow builds and rotations from community guides, combat logs for single bosses are highly comparable between
	groups and players and differences in efficiency are mainly attributed to the proficiency of players.
log	For the platform used in this work, single bosses or encounters are recorded in atomic detail, representing the
	full combat replay and dps, heal, boon, condition among other statistics at every single point in time for up to
	ten players (Pfau and Seif El-Nasr, 2023b).
nerf	Decrease in damage, utility or general viability of a profession or build caused by balancing adjustments.
nerf	Opposed to buff.
performance	The quantified outcome of a player at a given situation, e.g. for one boss fight. Mostly expressed as dps values.
power	(as in power damage): Direct damage as opposed to condition damage (damage over time). Some builds are
power	optimized to deal power damage.
profession	Guild Wars 2’s notion of character classes. It features nine core professions that can be extended with one of
profession	three specializations each for more build diversity.
PvE	(Player versus Environment): Guild Wars 2 features single- and collaborative multi-player modes.
PvE	This work focuses on the group-based endgame content.
PvP	(Player versus Player): Guild Wars 2 features small- and large-scale competitive PvP modes, yet to keep the
PvP	assessment as concise as possible, we focused on balancing PvE in this work.
rotation	The sequence of skills players execute, often looping for ideal rotations (optimizing dps) within a build.
skill	(as in executable skills): Single actions players activate to deal damage and/or provide utility by pressing
skill	the corresponding button.
skill	(as in player skill): The proficiency of a player (on a specific build or profession), quantifiable in the amounts
skill	of dps or utility they can provide.
stats	In-game character attributes that influence damage, utility or survivability potentials of a build.
specialization	Professions can be added one of three specializations (27 in total) that affect the mechanics,
specialization	damage and/or utility potentials of a build.
support	As opposed to power or condition damage builds, support roles mainly provide utility.
trait	Customizable passive perks of a build that mostly increase damage or utility potentials, as opposed to
trait	active customizations (skills).
utility	Beneficial value a build can provide for itself and/or other players in the group (apart from dps), such as heal,
utility	buffs, movement skills, conditions, crowd-control or the ability to resurrect fallen players.