{textblock}

15(0.3,0.2) \textblockcolourwhite ©2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Modeling the Energy Consumption of the HEVC Software Encoding Process using Processor events

Abstract

Developing energy-efficient video encoding algorithms is highly important due to the high processing complexities and, consequently, the high energy demand of the encoding process. To accomplish this, the energy consumption of the video encoders must be studied, which is only possible with a complex and dedicated energy measurement setup. This emphasizes the need for simple energy estimation models, which estimate the energy required for the encoding. Our paper investigates the possibility of estimating the energy demand of a HEVC software CPU-encoding process using processor events. First, we perform energy measurements and obtain processor events using dedicated profiling software. Then, by using the measured energy demand of the encoding process and profiling data, we build an encoding energy estimation model that uses the processor events of the ultrafast encoding preset to obtain the energy estimate for complex encoding presets with a mean absolute percentage error of 5.36% when averaged over all the presets. Additionally, we present an energy model that offers the possibility of obtaining energy distribution among various encoding sub-processes.

Index Terms— video coding, energy efficiency, HEVC, x265 presets, and energy estimation.

1 Introduction

The convenience of the Internet and mobile devices has increased Internet video traffic lately [1]. In addition, video-focused social networking services are growing, accounting for further video traffic increases, and [1] emphasizes the enormous storage costs, space needs, and increased server-side energy consumption for video content creation. Furthermore, the compression methods used for encoding have evolved immensely in recent years. Modern codecs provide many compression methods, increasing the encoders’ processing complexity [2], resulting in a considerable increase in the energy demand.

For several reasons, research on the energy consumption of video encoders is helpful. Firstly, portable devices are limited in terms of battery capacity. The energy requirements for video encoding are significant, which is a problem for battery-powered devices, where the battery drains fast due to increased energy requirements [3]. Secondly, the total energy consumption of current encoding and decoding systems is globally significant, as online video contributes to $1\%$ of global $\mathrm{CO}_{2}$ emissions [4]. According to [4], there is a considerable energy demand for using various digital equipment in the video processing pipeline and producing such devices [4]. Thirdly, most video-on-demand services use massive server farms in data centers for encoding [5]. A recent study states that such data centers, which predominantly perform CPU encoding, currently consume about 3% of global power consumption, which is expected to reach more than 1,000 TW by 2025, equivalent to 1 trillion tons of coal [6]. To enable energy-aware video-based services in modern video communication applications, we need robust and energy-efficient video codecs that optimize energy.

Few works have explicitly addressed the encoder’s processing energy, such as [7], which establishes a relationship between the quantization, spatial information, and coding energy for the intra-only High Efficiency Video Coding (HEVC) encoder. However, it did not consider the presets that quantify the encoding speed and compression performance. A study on detailed energy consumption of the x265 encoder is presented in [8]. In [9], energy-rate-quality trade-offs of state-of-the-art video codecs were studied. Furthermore, Mercat et al. measured the energy of a software encoder for different sequences and encoding configurations and presented various energy reduction opportunities[10]. In a carbon footprint-limited future, the energy efficiency of video coding is more relevant. The complex and laborious nature of energy measurements is a limitation in searching for energy-efficient algorithms. Therefore, we need simple energy estimation models to overcome the drawback of time-consuming measurements. There is extensive literature on estimating the energy demand of the decoding process, such as [11, 12, 13]. However, only some recent works explicitly addressed the estimation of the energy demand of the encoding process. For instance, [7] performs encoding energy estimation using quantization parameter (QP), albeit restricted to all-intra encoding.

Furthermore, [14] introduces a model that uses the encoder processing time to estimate the energy consumption of the x265 encoder. However, the energy encoding process has to be performed on a target device for estimation. Moreover, [14] also introduces a practical and lightweight encoding time-based encoding energy estimator, which uses the ultrafast (UF) preset’s encoding time to enable prior estimation of the encoding energy demand of the other presets. However, the estimation error is more than 15%, especially for complex presets. Lastly, [15] introduces a bit stream feature-based energy estimator that uses the compressed video bit stream features (BSF) obtained using a bit stream analyzer [15] after encoding to estimate the energy using bit streams. Even though this model makes accurate estimations, it is limited to specific applications, such as estimating the energy of crowd-sourced video data, since the bit stream is required for estimation. To this end, this work explores the feasibility of accurately estimating the HEVC software encoders’ energy demand using processor events (PEs).

In order to achieve this, we perform energy measurements and obtain processor events (PEs) using a dedicated open-source profiling software [16] and then study the relationship between PEs and encoding energy, followed by a study of feasibility for estimating the HEVC software encoder’s energy demand. Ultimately, we propose two simple models to efficiently estimate encoding energy without measuring it directly before and after the encoding process using certain PEs obtained using existing open-source profiling tools. The models proposed in this work have two practical applications: to estimate the energy for CPU encoding, for example, in data centers, and study the energy distribution of the encoding process, i.e., to obtain the energy of encoding sub-processes.

The rest of this paper is structured as follows: Section 2 presents the experimental setup used to measure the energy demand of the encoding process, along with sequences used, as well as encoding configurations, then explains the profiling, and further examines the relation between PEs and energy consumed during the encoding process. Further, Section 3 introduces the proposed encoding energy estimation models. Section 4 introduces the evaluation method, discusses the results, and presents an examplary application of the proposed models. Lastly, Section 5 concludes this work.

2 Experimental Setup and Analysis

Our work uses the x265 encoder implementation [17] on an Intel Xeon processor to perform multi-core encoding. We encode the first eight frames of each sequence at various x265 presets ultrafast, superfast, veryfast, faster, fast, medium, slow, slower, veryslow and various Constant Rate Factor (CRF) values, 18, 23, 28, 33. In contrast to the previous works [14], and [15], the x265 encoder used in this work is built using Netwide Assembler (NASM), i.e., it uses Single Instruction Multiple Data (SIMD) instructions.

We consider 22 sequences from the JVET common test conditions [18] with various sequence characteristics such as frame rate, resolution, and content. Ultimately, we generated 792-bit streams and further used them to evaluate the models. In addition, we record the QP, encoding time, and bit stream features to enable a comparison with the estimation models from the literature [7], [14], and [15]. As in [13], we describe the energy demand of the encoding process using two consecutive measurements. First, the total energy consumed during the encoding process is described as follows:

\small E_{\mathrm{total}}=\int_{t=0}^{T}P_{\mathrm{total}}(t)dt,\vspace{-0.15cm}

(1)

where $T$ is the duration of the encoding process and $P_{\mathrm{total}}$ is the total power consumed while encoding. Then, the energy consumed in idle mode over the same encoding duration $T$ is described as follows:

\small E_{\mathrm{idle}}=\int_{t=0}^{T}P_{\mathrm{idle}}(t)dt,\vspace{-0.15cm}

(2)

where $P_{\mathrm{idle}}$ is the power consumed by the device in idle mode. Ultimately, the encoding energy $E_{\mathrm{enc}}$ is the difference between these two measurements. In this work, we used a desktop PC with an Intel Xeon CPU with 16 cores and CentOS 8 as an operating system (OS). We employed the integrated power meter in the Intel CPUs, Running Average Power Limit (RAPL) [19], that directly returns aggregated energy values $E_{\mathrm{total}}$ and $E_{\mathrm{idle}}$ . Additionally, we perform the confidence interval test proposed in [20], to ensure the statistical significance of the measured encoding energies as in [14].

Table 1: Recorded PEs by Valgrind with the Pearson Correlation Coefficient (PCC), a measure of the linear relation between recorded PEs and encoding energy.

ID	PEs	PCC
1	I cache reads, Ir	0.996
2	D cache reads, Dr	0.995
3	D cache writes, Dw	0.994
4	I1 cache read misses, I1mr	0.985
5	D1 cache read misses, D1mr	0.977
6	D1 cache write misses, D1mw	0.970
7	LL cache instruction read misses, ILmr	0.710
8	LL cache data read misses, DLmr	0.664
9	LL cache data write misses, DLmw	0.702
10	Conditional branches executed, Bc	0.994
11	Conditional branches mispredicted, Bcm	0.982
12	Indirect branches executed, Bi	0.986
13	Indirect branches mispredicted, Bim	0.980

Furthermore, we employed dedicated open-source profiling software, Valgrind, to obtain the PEs [16]. Valgrind analyzes any desired process for instructions, memory accesses, memory leaks, cache misses, and other processor events. In this work, we take the HEVC encoding process and profile it with the Cachegrind tool of the Valgrind framework [21]. The Cachegrind simulates encoding processes’ interaction with the machine’s cache hierarchy, independent first-level instruction, and data caches (I1 and D1) backed by a second-level cache (L2) [21]. Yet, some modern machines have three or four levels of cache, and Cachegrind can auto-detect the cache configuration for these machines. Hence, Cachegrind simulates the first-level (I1 and D1) and last-level (LL) caches. This is because the last-level cache influences runtime, as it masks access to the main memory. Also, the L1 caches often have low associativity, so simulating them can detect processes’ bad interaction with the cache. Even though the application of this profiling tool is straightforward, it slows the original process by a factor of 5, depending on the encoding preset. Therefore, for energy measurements, we used the encoding process without profiling. All the recorded PEs are tabulated in Table 1.

To illustrate the relationship between the PEs and the encoding energy, we calculate the Pearson Correlation Coefficient (PCC) proposed in [20] to express the correlation in numbers. The correlation between the encoding energy and all considered PEs (PCC) are listed in Table 1. We can observe that the encoding energy correlates to most of the PEs except the last-level cache misses, which have a low correlation. In addition, from Table 1, we can observe that the highest correlations are obtained for the number of instructions (Ir) followed by the data reads (Dr), date writes (Dw), Conditional branches executed (Bc), Indirect branches executed (Bi), Conditional branches mispredicted (Bcm), and Indirect branches mispredicted (Bim).

3 Proposed Encoding Energy Estimation Models

Based on the above observations, we propose a PE-based posterior estimation model, i.e., we estimate the encoder processing energy after the encoding process. The model exploits the linear relationship between encoding energy and PEs, such that the energy can be estimated by

\small\hat{E}_{\mathrm{enc}}=\sum_{\forall\text{{i}}}n_{\textit{i}}\cdot e_{\text{{i}}},\vspace{-0.15cm}

(3)

where i denotes the processor event ID as mentioned in Table 1, and $n_{\text{i}}$ is the number of occurrences of the respective processor event (PE). The parameter $e_{\text{{i}}}$ can be interpreted as a constant mean energy required to execute a certain PE. It must be noted that the estimates can only be obtained when the encoding process is executed once, as the encoding process needs to be profiled by Valgrind. Therefore, we adjust model (3) to allow prior estimation, i.e., energy estimation, without executing the encoder. In this context, we interpret encoding at the ultrafast preset as a preprocessing step for slower presets and investigate the modeling of the utrafast presets’ PEs on encoding energies of all the presets, as the ultrafast PEs are relatively less costly to obtain when compared to that of other presets. When we replace the PEs of respective presets with the PEs of ultrafast preset, we can also observe similar linear behavior. Therefore, we adapt the model in (3) to enable prior estimation of energy consumption of the encoding process using the PEs of ultrafast preset as the UF PE-based prior estimation model as follows:

\small\hat{E}_{\mathrm{enc}}=\sum_{\forall\text{{i}}}n_{\text{{i},UF}}\cdot e_{\text{{i}}},\vspace{-0.15cm}

(4)

where i denotes ID as mentioned in Table 1, and $n_{\text{{i},UF}}$ is the number of occurrences of the respective ultrafast PE. $e_{\text{{i}}}$ is the model parameters later obtained by training as explained in Section 4.

Depending on the operating system, CPU architecture, and encoding algorithms, the energy consumed during the encoding process and the PEs may vary. However, we have a strong indication that the linear relationship between the energy demand of the encoding process and the PEs remains the same. Therefore, the proposed models can be extended to different operating systems, CPU architecture, and encoding algorithms with differing model parameters. This claim is supported by related work on decoding energy [22] and [23], which showed that this kind of model is valid for different operating systems, CPU architectures, and codecs.

4 Evaluation

We evaluate the proposed models using the mean absolute percentage error (MAPE), as we strive to estimate the encoding energy accurately independent of the absolute energy, which varies by several orders of magnitude. Thus, we calculate the absolute percentage error of the measured encoding energy concerning the estimated encoding energy for a single bit stream $b$ , i.e., each bit stream $b$ corresponds to a single input sequence coded at a specific CRF, and preset $X$ . Then, we calculate the mean absolute percentage error ${{\epsilon}_{X}}$ for each preset $X$ over $B$ bit streams to obtain the overall estimation error as MAPE for each preset as follows:

\small{\epsilon}_{X}=\frac{1}{B}\sum_{b=1}^{B}|r_{X,b}|,\vspace{-0.15cm}

(5)

where $r_{X,b}$ is the percentage errors for a given preset $X$ and bit stream $b$ and is given as:

\small r_{X,b}=\left(\frac{\hat{E}_{\mathrm{enc}}-E_{\mathrm{enc}}}{E_{\mathrm{enc}}}\right)\cdot 100,\vspace{-0.15cm}

(6)

where $\hat{E}_{\mathrm{enc}}$ is the estimated encoding energy and $E_{\mathrm{enc}}$ the measured encoding energy. To determine the model parameters $e_{\textit{i}}$ for each preset, we perform a least-squares fit using a trust-region-reflective algorithm as presented in [24]. We use the measured energies for a subset of the sequences referred to as the training set and their corresponding variables as input, which are the PEs. As a result, we obtain the least-squares optimal parameters for the input training set, where we train the parameters such that the mean relative error is minimized. Ultimately, these model parameters validate the model’s accuracy on the remaining validation sequences. The training and validation data set are determined using a ten-fold cross-validation proposed in [25].

Table 2: MAPE for all the posterior estimation models and the confidence interval of percentage errors of the proposed model. The minimum MAPE across all models are in bold.

Encoder Preset

Posterior Estimation models

Time

[14]

BSF

[15]

(3)

MAPE,

\boldmath{{\epsilon}_{X}}

(%)

CI_{X}

(%)

ultrafast

5.96

5.05

4.38

[-1.51, 0.84]

superfast

4.23

5.03

4.65

[-1.57, 1.11]

veryfast

5.38

6.06

4.40

[-1.28, 1.16]

faster

5.19

6.37

4.47

[-1.25, 1.22]

fast

5.16

6.54

4.54

[-1.25, 1.21]

medium

4.62

7.19

4.66

[-1.46, 1.29]

slow

3.97

7.70

6.07

[-1.99, 1.43]

slower

4.35

7.75

7.31

[-2.33, 1.59]

veryslow

5.14

5.68

7.90

[-2.13, 2.13]

average

4.89

6.37

5.37

Furthermore, to assess the performance of the proposed energy estimator models, we used a confidence interval analysis on the estimation errors, i.e., percentage errors. To this end, we obtained the confidence interval $CI_{X}$ for a 95% confidence level with ${m}_{X}$ being mean and ${\sigma}_{X}$ being standard deviation of the percentage errors of all the B bit streams associated with preset $X$ , $r_{X}=[r_{X,1},...,r_{X,B}]$ . Then, for each preset $X$ , and $B$ bit streams, the confidence interval is defined as follows:

CI_{X}={m}_{X}\pm z\cdot({\sigma}_{X}/\sqrt{B}),

(7)

where z is the z-score corresponding to the chosen confidence level, which is 1.96 for 95% confidence level. In Table 2 and Table 3, we report the MAPE for each preset, the average MAPE over all the presets, and the confidence intervals of the percentage errors for all the presets of the proposed models. In subsequent subsections, we evaluate the posterior and prior estimation models, followed by a discourse on their practical applications.

4.1 Posterior Estimation Models

Firstly, we evaluate the proposed PE-based posterior estimation model (3) in comparison with the models from the literature [14] and [15]. The PE-based posterior estimation model achieves a MAPE of less than 10% for all the presets. In addition, the confidence intervals of the percentage errors include zero, suggesting that the proposed model’s estimations are unbiased, i.e., the model does not systematically underestimate or overestimate the encoding energy values. The MAPE values in Table 2 show that the PE-based proposed posterior estimation model outperforms the BSF model but outperforms the time-based posterior estimation model only in faster presets. PE-based and time-based posterior estimation models perform best. However, the time-based posterior estimation model is generally better because of the lower complexity, as the runtime of the PE-based posterior estimation model is more than that of the time-based posterior estimation model. Nevertheless, the posterior PE-based estimation model can be used to investigate the energy consumption of various encoding sub-processes in detail, which, in turn, aids in obtaining function-specific energy demand, as measuring such function-specific energy demand is practically impossible.

\psfrag{0000000000}[cl][cl]{\color[rgb]{0.000,0.000,0.000}Entropy}\psfrag{0000000000000000000001}[cl][cl]{\color[rgb]{0.000,0.000,0.000}Quant and Transform}\psfrag{0000000002}[cl][cl]{\color[rgb]{0.000,0.000,0.000}In-loop}\psfrag{00000000000000000003}[cl][cl]{\color[rgb]{0.000,0.000,0.000}Intra Mode Search}\psfrag{0000000000000000004}[cl][cl]{\color[rgb]{0.000,0.000,0.000}Intra-Prediction}\psfrag{0000000000000000005}[cl][cl]{\color[rgb]{0.000,0.000,0.000}Framelevel Init.}\psfrag{000000000000006}[cl][cl]{\color[rgb]{0.000,0.000,0.000}Global Init.}\psfrag{000000000000000007}[cl][cl]{\color[rgb]{0.000,0.000,0.000}CTU CU Preproc.}\psfrag{020}[tc][tc]{\color[rgb]{0.150,0.150,0.150}CRF}\psfrag{021}[bc][bc]{\color[rgb]{0.150,0.150,0.150}Estimated Energy ($\hat{E}_{enc}$ ) in J}\psfrag{008}[ct][ct]{\color[rgb]{0.150,0.150,0.150}$18$}\psfrag{009}[ct][ct]{\color[rgb]{0.150,0.150,0.150}$23$}\psfrag{010}[ct][ct]{\color[rgb]{0.150,0.150,0.150}$28$}\psfrag{011}[ct][ct]{\color[rgb]{0.150,0.150,0.150}$33$}\psfrag{012}[rc][rc]{\color[rgb]{0.150,0.150,0.150}$0$}\psfrag{013}[rc][rc]{\color[rgb]{0.150,0.150,0.150}$2$}\psfrag{014}[rc][rc]{\color[rgb]{0.150,0.150,0.150}$4$}\psfrag{015}[rc][rc]{\color[rgb]{0.150,0.150,0.150}$6$}\psfrag{016}[rc][rc]{\color[rgb]{0.150,0.150,0.150}$8$}\psfrag{017}[rc][rc]{\color[rgb]{0.150,0.150,0.150}$10$}\psfrag{018}[rc][rc]{\color[rgb]{0.150,0.150,0.150}$12$}\psfrag{019}[rc][rc]{\color[rgb]{0.150,0.150,0.150}$14$}\includegraphics[width=227.90501pt]{Randplot.eps}

Fig. 1: The estimated energy

\hat{E}_{\mathrm{enc}}

using (3), with the energy distribution for various sub-processes, for a single frame class B sequence, ”BasketballDrive,” using x265 encoder, at medium preset and CRF values of 18, 23, 28, and 33.

The proposed PE-based posterior estimation model has a significant application in analyzing the energy demand distribution among different encoding sub-processes. By employing the proposed PE-based posterior estimation model (3), we can obtain the PE-specific energies $e_{\text{{i}}}$ of the encoding process from its corresponding profile events and encoding energy. Furthermore, Valgrind, in its profiling capacity, not only furnishes global profiling outcomes but also executes profiling for individual encoding sub-processes. Consequently, by combining these sub-process-specific PEs with PE-specific energies $e_{\text{{i}}}$ , we can achieve a comprehensive understanding of the energy requirements across different encoding sub-processes. Figure 1 illustrates the energy distribution of encoding a single frame from a class B sequence, ”BasketballDrive,” using the x265 encoder at medium preset and CRF values of 18, 23, 28, and 33.

Further analysis of the energy distribution of intra-coding, illustrated in Figure 1, reveals that the intra-mode search constitutes the most energy-intensive aspect, contributing between 28.5% and 33.19% of the total energy demand across the CRF range from 18 to 33. Following closely, intra-prediction accounts for 23.86% to 30.42% of the total energy demand. Subsequently, entropy coding contributes 20.6% of the total energy demand at CRF 18. In addition, its contribution decreases with increased CRF values, as the number of coefficients to be entropy-coded decreases with higher CRF values. Then, the quantization and transform coding consume 15% and 10% of total encoding energy for CRF 18 and 33, respectively. Energy consumption by in-loop filters ranges from 4.62% to 6.11% of the total energy demand across the CRF range of 18 to 33. Additionally, global initialization, as well as CTU and CU level preprocessing, on average, account for approximately 4%, 3%, and 1.3% of the total energy consumption, respectively.

In summary, using (3) to obtain energy distribution of the encoding process offers a versatile set of applications. It can help identify major and significant energy contributors in the encoding process, compare the energy distribution of the encoding process of different encoding presets, and compare the difference in energy distribution for different sequences.

Table 3: MAPE for all the prior estimation models and the confidence interval of percentage errors of the proposed model. The minimum MAPE across all models are in bold.

Encoder Preset

Prior Estimation Models

[7]

UF Time

[14]

UF PE

(4)

MAPE,

\boldmath{{\epsilon}_{X}}

(%)

CI_{X}

(%)

ultrafast

17.98

superfast

22.30

5.68

3.89

[-1.3, 0.97]

veryfast

21.00

6.90

4.70

[-1.51, 1.06]

faster

21.54

6.82

4.64

[-1.5, 1.05]

fast

19.34

5.83

4.61

[-1.49, 1.08]

medium

23.67

10.14

5.34

[-1.71, 1.12]

slow

24.02

13.35

6.44

[-2.07, 1.28]

slower

25.81

24.96

6.94

[-2.62, 0.96]

veryslow

29.17

28.82

7.33

[-2.71, 1.02]

average

22.65

12.05

5.36

4.2 Prior Estimation Models

Secondly, we evaluate the proposed UF PE-based prior estimation model, which allows estimation of the encoding energy without performing actual encoding for presets slower than ultrafast. The proposed UF PE-based estimation model (4) yields a MAPE of 5.36% and, when the preset is known, outperforms the QP-based model and UF time-based model with a MAPE of 22.65% and 12.05% from the literature. Additionally, the confidence intervals of the percentage errors of the proposed model, as shown in Table 3, indicate that the proposed model’s estimations are unbiased, similar to the observation from Table 2. Regarding computational costs, the proposed UF PE-based prior estimation model performs better than the PE-based posterior estimation model for practical estimation. Unlike the proposed posterior estimation model, this prior estimator requires the PEs of the lightweight encoding configuration UF preset and can be obtained with less complex preprocessing overhead compared to the other presets. Therefore, such a model could be used for practical energy estimations. While the PE-based prior estimation model yields accurate estimations, the time-based prior estimation model offers lower computational complexity. Specifically, the runtime of the UF PE-based prior estimation model is five times longer than that of the UF time-based counterpart. Nonetheless, for precise estimation of encoding energy consumption, the UF PE-based prior estimation model proves most beneficial, particularly for slower presets such as medium, slow, slower, and veryslow.

\psfrag{000000000000000000000000000}[cl][cl]{\color[rgb]{0.000,0.000,0.000}Measured Encoding Energy}\psfrag{00000000000000000000000000000000000001}[cl][cl]{\color[rgb]{0.000,0.000,0.000}Estimated Encoding Energy (UF Time)}\psfrag{000000000000000000000000000000000002}[cl][cl]{\color[rgb]{0.000,0.000,0.000}Estimated Encoding Energy (UF PE)}\psfrag{018}[tc][tc]{\color[rgb]{0.150,0.150,0.150}x265 presets}\psfrag{019}[bc][bc]{\color[rgb]{0.150,0.150,0.150}\begin{tabular}[]{@{}c@{}}~{}Energy~{}per~{}100k~{}pixels{}{}{}{}{}{}\\ ~{}~{}~{}~{}in~{}Joules{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}\end{tabular}}\psfrag{003}[ct][ct]{\color[rgb]{0.150,0.150,0.150}$1$}\psfrag{004}[ct][ct]{\color[rgb]{0.150,0.150,0.150}$2$}\psfrag{005}[ct][ct]{\color[rgb]{0.150,0.150,0.150}$3$}\psfrag{006}[ct][ct]{\color[rgb]{0.150,0.150,0.150}$4$}\psfrag{007}[ct][ct]{\color[rgb]{0.150,0.150,0.150}$5$}\psfrag{008}[ct][ct]{\color[rgb]{0.150,0.150,0.150}$6$}\psfrag{009}[ct][ct]{\color[rgb]{0.150,0.150,0.150}$7$}\psfrag{010}[ct][ct]{\color[rgb]{0.150,0.150,0.150}$8$}\psfrag{011}[ct][ct]{\color[rgb]{0.150,0.150,0.150}$9$}\psfrag{012}[rc][rc]{\color[rgb]{0.150,0.150,0.150}$0$}\psfrag{013}[rc][rc]{\color[rgb]{0.150,0.150,0.150}$0.5$}\psfrag{014}[rc][rc]{\color[rgb]{0.150,0.150,0.150}$1$}\psfrag{015}[rc][rc]{\color[rgb]{0.150,0.150,0.150}$1.5$}\psfrag{016}[rc][rc]{\color[rgb]{0.150,0.150,0.150}$2$}\psfrag{017}[rc][rc]{\color[rgb]{0.150,0.150,0.150}$2.5$}\includegraphics[width=227.90501pt]{energyEstimationNew.eps}

Fig. 2: The measured encoding energy

E_{\mathrm{enc}}

, and estimated encoding energy

\hat{E}_{\mathrm{enc}}

using UF Time [14], and estimated encoding energy

\hat{E}_{\mathrm{enc}}

using UF PEs (4), for 100000 pixels, averaged over all the sequences, for the CRF value 23, where

{1,2,3,4,5,6,7,8,9}

corresponds to the x265 presets.

The proposed PE-based prior estimation model has a significant application in accurately estimating the energy demand of the encoding process compared to existing literature models. In Figure 2, the encoding energy is illustrated: blue bars represent measured values, orange bars represent estimates using the UF (Ultrafast) Time-based prior estimation model [14], and yellow bars represent estimates using the UF PE-based prior estimation model. These measured energy and estimates are provided for various x265 presets, including ultrafast, superfast, veryfast, faster, fast, medium, slow, slower, and veryslow, numbered 1 through 9 respectively. The analysis of the measured encoding energy reveals relative differences in energy demand between presets: superfast consumes 75% more energy than ultrafast; veryfast consumes 27% more energy than superfast; faster requires 0.1% more energy than veryfast; fast demands 7% more energy than faster; medium requires 15% more energy than fast; slow consumes 102% more energy than medium; slower consumes 213% more energy than slow; and veryslow requires 52% more energy than slower. The data depicted in Figure 2 illustrates that both the UF-time based prior estimation model [14] and the UF PE-based prior estimation model offer precise estimates for presets ranging from ultrafast to fast. However, when it comes to slower presets such as medium, slow, slower, and veryslow, the UF PE-based prior estimation model outperforms the UF-time based model, providing the more accurate estimations. Consequently, for encoding energy estimates, the UF Time-based estimation model suffices for faster presets such as ultrafast, superfast, veryfast, faster, and fast, whereas the proposed prior estimation model proves beneficial for slower presets.

5 Conclusion

Energy measurements are pivotal in developing energy-efficient video coding algorithms. Nonetheless, such measurements are complex and costly. Thus, we need valid and simple energy estimation models. This work demonstrates that, for the HEVC software encoding process, a lightweight PE-based posterior estimation model provides an accurate prior estimation of encoding energy, exhibiting a mean absolute percentage error of 5.36%. Moreover, this work also presents the PE-based posterior modeling approach, which yields the average sub-process-specific energies, aiding in analyzing the energy demand of encoding sub-processes. In addition, we presented an examplary energy consumption analysis of the HEVC encoding process on a functional level using the PE-based posterior model. In the future, we plan to extend the proposed models to other encoders and extend the energy consumption analysis of the encoding process on a functional level, such as motion estimation and compensation.

References

[1] Cisco Systems, Inc., “Cisco Annual Internet Report (2018-2023),” Tech. Rep., Cisco Systems, Inc., 2020.
[2] F. Bossen, K. Sühring, A. Wieckowski, and S. Liu, “VVC complexity and software implementation analysis,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3765–3778, 2021.
[3] Y. O. Sharrab and N. J. Sarhan, “Aggregate power consumption modeling of live video streaming systems,” in ACM Multimedia Systems Conference, New York, NY, USA, 2013, p. 60–71.
[4] The Shift Project, “Climate crisis: The unsustainable use of online video,” Tech. Rep., (The Shift Project), 2019.
[5] C. Herglotz, M. Kränzler, R. Schober, and A. Kaup, “Sweet streams are made of this: The system engineer’s view on energy efficiency in video communications,” 2022.
[6] “Huawei Releases Top 10 Trends of Data Center Facility in 2025,” Tech. Rep., 2020.
[7] R. Rodríguez-Sánchez, M. T. Alonso, J. L. Martínez, R. Mayo, and E. S. Quintana-Ortí, “Time and energy modeling of an intra-only HEVC encoder,” in Proc. Visual Communications and Image Processing (VCIP), Singapur, Dec 2015, pp. 1–4.
[8] D. Silveira, M. Porto, and S. Bampi, “Performance and energy consumption analysis of the x265 video encoder,” in Proc. European Signal Processing Conference (EUSIPCO), 2017.
[9] A. Katsenou, J. Mao, and I. Mavromatis, “Energy-rate-quality tradeoffs of state-of-the-art video codecs,” in Proc. 2022 Picture Coding Symposium (PCS), 2022, pp. 265–269.
[10] A. Mercat, F. Arrestier, W. Hamidouche, M. Pelcat, and D. Menard, “Energy reduction opportunities in an HEVC real-time encoder,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, USA, March 2017, pp. 1158–1162.
[11] X. Li, Z. Ma, and F. C. A. Fernandes, “Modeling power consumption for video decoding on mobile platform and its application to power-rate constrained streaming,” in Proc. Visual Communications and Image Processing (VCIP), San Diego, USA, Nov. 2012.
[12] P. Raoufi and J. Peters, “Energy-efficient wireless video streaming with H.264 coding,” in Proc. IEEE International Conference on Multimedia and Expo Workshops (ICMEW), San Jose, USA, July 2013, pp. 1–6.
[13] C. Herglotz, D. Springer, M. Reichenbach, B. Stabernack, and A. Kaup, “Modeling the energy consumption of the HEVC decoding process,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 1, pp. 217–229, Jan. 2018.
[14] G. Ramasubbu, A. Kaup, and C. Herglotz, “Modeling the HEVC Encoding Energy Using the Encoder Processing Time,” in IEEE International Conference on Image Processing (ICIP) Bordeaux, 2022.
[15] G. Ramasubbu, A. Kaup, and C. Herglotz, “A bit stream feature-based energy estimator for hevc software encoding,” in Proc. Picture Coding Symposium (PCS) San Jose, 2022.
[16] “Valgrind instrumentation framework,” http://valgrind.org/.
[17] “x265: H.265 / HEVC video encoder application library,” https://www.videolan.org/developers/x265.html.
[18] F. Bossen, J. Boyce, X. Li, V. Seregin, and K. Sühring, “JVET common test conditions and software reference configurations for SDR video,” AHG Report, JVET-N1010, Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, Jan 2017.
[19] K. N. Khan, M. Hirki, T. Niemi, J.K. Nurminen, and Z. Ou, “RAPL in action: Experiences in using rapl for power measurements,” ACM Trans. Model. Perform. Eval. Comput. Syst., vol. 3, no. 2, mar 2018.
[20] J.S. Bendat and A.G. Piersol, Random Data: Analysis and Measurement Procedures, John Wiley & Sons, Inc., 1971.
[21] J. Weidendorfer, M. Kowarschik, and C. Trinitis, “A tool suite for simulation based analysis of memory access behavior,” in Computational Science - ICCS 2004, Berlin, Heidelberg, 2004, pp. 440–447, Springer Berlin Heidelberg.
[22] C. Herglotz and A. Kaup, “Video decoding energy estimation using processor events,” in in Proc. IEEE International Conference on Image Processing (ICIP), Beijing, China, Sep 2017.
[23] M. Kränzler, A. Kaup, and C. Herglotz, “Estimating software and hardware video decoder energy using software decoder profiling,” in 2023 36th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI), 2023.
[24] T.F. Coleman and Y. Li, “An interior trust region approach for nonlinear minimization subject to bounds,” SIAM Journal on optimization, vol. 6, no. 2, pp. 418–445, 1996.
[25] M. Zaki and W Meira, Data mining and analysis, Cambridge Univ. Press, 1 edition, 2014.