This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Novel Reversible Data Hiding Scheme Based on Asymmetric Numeral Systems

Na Wang,  Chuan Qin,  Sian-Jheng Lin Na Wang and Chuan Qin are with the School of Optoelectronic Information and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China. E-mail: wna@usst.edu.cn; qin@usst.edu.cn. Sian-Jheng Lin is with the Theory Laboratory, Huawei Technologies Company Ltd., Hong Kong, China. E-mail: lin.sian.jheng1@huawei.com.
Abstract

Reversible data hiding (RDH) has been extensively studied in the field of information security. In our previous work [1], an explicit implementation approaching the rate-distortion bound of RDH has been proposed. However, there are two challenges left in our previous method. Firstly, this method suffers from computing precision problem due to the use of arithmetic coding, which may cause the further embedding impossible. Secondly, it had to transmit the probability distribution of the host signals during the embedding/extraction process, yielding quite additional overhead and application limitations. In this paper, we first propose an RDH scheme that employs our recent asymmetric numeral systems (ANS) variant as the underlying coding framework to avoid the computing precision problem. Then, we give a dynamic implementation that does not require transmitting the host distribution in advance. The simulation results show that the proposed static method provides slightly higher peak signal-to-noise ratio (PSNR) values than our previous work, and larger embedding capacity than some state-of-the-art methods on gray-scale images. In addition, the proposed dynamic method totally saves the explicit transmission of the host distribution and achieve data embedding at the cost of a small image quality loss.

Index Terms:
Reversible data hiding, Gray-scale signals, Embedding capacity, PSNR, Asymmetrical numeral systems.

I Introduction

Reversible data hiding (RDH) aims at secure transmission of secret data through communication media. This is achieved through embedding of secret data into the host media, and it has the property that the receiver can losslessly reconstruct the host media and secret data from the received steganographic (abbreviated as stego) media. A media used to embed data is defined as the host media, including image, text file, audio, video and multimedia, and the media that has embedded secret data is called the stego media. RDH is widely adopted in medical imagery [2], image authentication [3] and law forensics, where the host media is so precious that cannot be damaged.

In the past two decades, many RDH works have emerged and they are categorized mainly into plain domain and encrypted domain. Plain domain approaches of RDH are designed to achieve high embedding capacity and low distortion without any protection of the host media, where embedding capacity is defined as a ratio between the number of secret data bits and host signals. The purpose of RDH in encrypted domain is to embed additional data into the encrypted media without revealing the plaintext data. This paper focuses on the RDH methods in the plain domain. Popular plain domain works roughly categorized as i) difference expansion (DE) [4, 5, 6, 7], ii) histogram shifting (HS) [8, 9, 10, 11] and iii) lossless compression [12, 13, 14, 15]. In the DE-based RDH methods, the differences of each pixel group are expanded, e.g., multiplied by 22, so that the lowest significant bits (LSB) of the differences are all zero and can be used to embed data. In [5], Tian introduced the first difference expansion method based on integer transform, but it suffers from low embedding capacity and its overhead cost was almost double the amount of hiding data. To remedy this problem, A.M. Alattar [16] gave a generalized integer transform, and pixel block of arbitrary size was considered instead of pixel pairs, to improve the embedding capacity. However, this method is applicable only for color images.

Histogram is a graphical representation of data points. The maximum and minimum number of occurrence of elements in the set of data points are called peak point and zero point, respectively. Histogram shifting (HS) is an effective method for RDH, it improves the embedding capacity by enlarging the peak pixels. When images are selected as the host media, the peak signal-to-noise ratio (PSNR) is often used as a metric to measure the performance of an RDH scheme, and the higher the PSNR, the better the stego image quality. In 2006, Ni et al. [17] chosen the bins with the highest frequencies in the histogram to embed the secret bits. In this method, each pixel value is modified at most by 1, thus the PSNR of stego image is guaranteed. Yet, this approach suffers from the issues of multiple zero points, which requires the storage for the coordinate of zero points. In [18], Fallahpour et al. improved the embedding capacity of Ni et al.’s method by applying HS on blocks instead of the whole image. However, this technique bears zero and peak point information that needs to be transmitted to the receiver for data retrieval. Tai et al. [19] presented an RDH scheme based on histogram modification. In this method, the binary tree structure was exploited to solve the issue of multiple pairs of peak points for communication. But, its pure embedding capacity unexpectedly decreases with increasing tree level and the overhead information increases as well. In [20], Ou et al. proposed an RDH method based on multiple histogram modification to achieve high embedding capacity, by using multiple pairs of bins for every prediction error histogram. However, the location map as overhead information is necessary to preserve reversibility, and it can be large even after compression.

The lossless compression algorithm is extensively explored in RDH technique. The main idea is to save space for hiding additional data by performing lossless compression on the host media. Celik [12] introduced a generalization least significant bit (G-LSB) method to enhance the compression effectiveness by using arithmetic coding. In this method, the embedding capacity was enhanced as higher level pixel was considered. However, when the higher level was used, the distortion of host media also increased. In [21], Manikandan employed run-length encoding and a modified Elias gamma encoding to embed some additional bits during the encoding process itself. Moreover, one natural problem is what is the upper limit of the embedding capacity for a given host media and distortion constraint. This problem was solved by Kalker et al. [22] for independent and identically distributed (i.i.d.) host sequence. In [22], Kalker and Willems formulated the rate-distortion function, i.e., the upper bound RmaxR_{max} of the embedding rate under a given distortion constraint Δ\Delta as follows:

Rmax=maximize{H(Y)}H(X),R_{max}=\text{maximize}\left\{H(Y)\right\}-H(X), (1)

where XX and YY denote the random variables of host media and stego media respectively, and H()H(\cdot) is the Shannon entropy of a discrete random variable. The entropy is maximized over all transition probabilities PY|X(y|s)P_{Y|X}(y|s) satisfying the distortion constraint

sX,yYPX(s)PY|X(y|s)D(s,y)Δ,\sum_{s\in X,y\in Y}{P_{X}(s)P_{Y|X}(y|s)D(s,y)}\leq\Delta, (2)

where PX(s)P_{X}(s) is the probability mass function (pmf) of host sequence XX, and the distortion metric D(s,y)D(s,y) is usually defined as the square error distortion, i.e., D(s,y)=(sy)2D(s,y)=(s-y)^{2}. Therefore, to evaluate the embedding capacity of an RDH scheme, the optimal transition probability (OTP) PY|X(y|s)P_{Y|X}(y|s) needs to be calculated. For some specific distortion metrics D(s,y)D(s,y), such as the square error distortion D(s,y)=(sy)2D(s,y)=(s-y)^{2} or L1-Norm D(s,y)=|sy|D(s,y)=|s-y|, it has been proved that the OTP has a Non-Crossing-Edges (NCE) property [23]. Based on the NCE property, our previous work [1] found that the optimal solution on PY|X(y|s)P_{Y|X}(y|s) can be derived from the analysis of pmfs PX(s)P_{X}(s) and PY(y)P_{Y}(y). Because PX(s)P_{X}(s) is available from the given host sequence XX, this means that the encoder and decoder only need to calculate the PY(y)P_{Y}(y) for the distortion metric that satisfies the NCE property.

In our previous work [1], we proposed a backward and forward iterative (BFI) algorithm to estimate the optimal pmf PY(y)P_{Y}(y). Also, we gave an arithmetic coding [24]-based scheme that can approach the rate-distortion bound (1). After that, Hu et al. [25] introduced a fast algorithm to estimate the PY(y)P_{Y}(y) in our previous work. However, there are still two challenges unresolved in our previous work [1]. Firstly, we adopted arithmetic coding as the underlying coding framework in [1] to embed a message in the i.i.d. host sequence. In the arithmetic coding, there is a limit to the precision of the number that can be encoded, which constrains the number of symbols to encode within a codeword. As a result, our previous work [1] suffers from the computing precision problem in some special cases. For example, if the coding interval in the arithmetic coding is narrowed to zero, then further data embedding is impossible. Secondly, both the embedding and extraction processes of methods [1], [25] require transmitting the host pmf PX(s)P_{X}(s) as parameters (for the purpose of estimating PY(y)P_{Y}(y)), which costs quite additional overhead, and the overhead size is proportional to the alphabet size. This is usually unacceptable for extremely memory-constrained scenarios, and it has certain limitations in practical applications.

To fully address the above two challenges left in our previous work [1], we first propose a code construction scheme that employs our recent asymmetric numeral systems (ANS) variant [26]. ANS [27, 28, 29, 30] is a family of entropy coding introduced by Jarek Duda in 2009. It has been popular and incorporated into many compressors, including Facebook Zstandard, Apple LZFSE and Google Draco 3D, because it provides the compression ratio of arithmetic coding with a processing speed similar to that of Huffman coding [31]. To the best of our knowledge, this paper is the first that employs ANS coding for RDH technique. More importantly, unlike arithmetic coding, ANS coding is not subject to the computing precision. Therefore, by using ANS coding as the underlying coding framework, our proposal can avoid the precision problem that may occur in our previous work [1]. Second, we give a dynamic embedding/extraction implementation that does not require to explicitly transmit the host pmf, thus saving additional overhead costs.

This paper considers the host media composed of i.i.d. gray-scale samples. The contributions of this paper are summarized as follows.

  1. 1.

    A code construction scheme that employs our recent ANS variant is proposed to address the computing precision problem in our previous work [1]. To the best of our knowledge, this paper is the first that employs ANS coding for RDH technique.

  2. 2.

    A dynamic implementation is given that does not require prior transmission of the host pmf during embedding and extraction.

  3. 3.

    The simulations are given.

    1. (a)

      For the host sequences drawn from discrete normal distribution: the proposed static method provides similar embedding performance to our previous work; the proposed dynamic implementation saves the additional overhead of explicitly transmitting the host pmf at the cost of a slight embedding distortion.

    2. (b)

      For the gray-scale images: the proposed static method provides slightly higher PSNR values than our previous work and larger embedding rates than some state-of-the-art RDH methods; the proposed dynamic method employs predicted pmf instead of true pmf to save additional overhead and produces a small image quality loss.

The rest of this paper is organized as follows. Section II lists the notations and related works. Section III introduces the proposed method to achieve the rate-distortion bound. In Section IV, we give a dynamic embedding/extraction implementation without the need to transmit the host pmf. Section V shows the simulation results and Section VI concludes this work.

II Preliminaries

II-A Notations

Throughout this paper, the host sequence X=(x1,x2,,xN)X=\left(x_{1},x_{2},\cdots,x_{N}\right) is composed of i.i.d. samples drawn from the probability mass function (pmf) PX={PX(s)|sΣB}P_{X}=\left\{P_{X}(s)|s\in\Sigma_{B}\right\}, where the set ΣB={0,1,,B1}\Sigma_{B}=\left\{0,1,\cdots,B-1\right\} is an alphabet of size BB. Let PCX={PCX(s)=i=0sPX(i)|sΣB}P_{CX}=\left\{P_{CX}(s)=\sum_{i=0}^{s}P_{X}(i)|s\in\Sigma_{B}\right\} denote the cumulative pmf of XX. It is noted that we have PCX(1)=0P_{CX}(-1)=0 and PCX(B1)=1P_{CX}(B-1)=1. Let fX(s)f_{X}(s) denote the number of occurrences of signal ss in the sequence XX, and cX(s)i=1s1fX(i)c_{X}(s)\triangleq\sum_{i=-1}^{s-1}f_{X}(i) is the cumulative frequency of signal ss, where fX(1)=0f_{X}(-1)=0 by default. The entropy function H(X)H(X) of a discrete random variable XX is defined as H(X)=i=0B1PX(i)log2PX(i)H(X)=-\sum_{i=0}^{B-1}P_{X}(i)\log_{2}{P_{X}(i)}. The random variable WW represents a message uniformly distributed in ΣM={0,1,,M1}\Sigma_{M}=\left\{0,1,\cdots,M-1\right\}. This paper considers embedding binary secret data into gray-scale signals, so we have B=256B=256 and M=2M=2.

Given a binary stream SS, we construct a LIFO (Last In First Out) stack accordingly. The operation

QReadS(i)Q\leftarrow Read_{S}(i)

pops ii bits {Si}i=0i1\left\{S_{i}\right\}_{i=0}^{i-1} from the bottommost position in the stack that contains SS, and these ii bits form an integer Q=S0×2i1++Si2×2+Si1Q=S_{0}\times 2^{i-1}+\cdots+S_{i-2}\times 2+S_{i-1}. The operation WriteS(Q,i)Write_{S}(Q,i) pushes the least significant ii bits of an integer QQ to the stack.

In RDH, LL bits of message W=(w1,w2,,wL)W=\left(w_{1},w_{2},\cdots,w_{L}\right) are embedded by the encoder into the host sequence X=(x1,x2,,xN)X=\left(x_{1},x_{2},\cdots,x_{N}\right), by slightly modifying its elements to produce a stego sequence Y=(y1,y2,,yN)Y=\left(y_{1},y_{2},\cdots,y_{N}\right), where wiΣMw_{i}\in\Sigma_{M} and xjΣBx_{j}\in\Sigma_{B}, yjΣZy_{j}\in\Sigma_{Z} for 1iL1\leq i\leq L, 1jN1\leq j\leq N. This paper considers the convention that host signals and stego signals draw from the same set, i.e., ΣZ=ΣB\Sigma_{Z}=\Sigma_{B}. We denote the embedding capacity, also called embedding rate, by R=L/NR=L/N. Schemes are usually constructed to minimize some distortion measure D(s,y)D(s,y) between the host sequence XX and the stego sequence YY for a given embedding rate RR. The distortion metric D(s,y)D(s,y) in this paper is defined as the square error distortion, i.e., D(s,y)=(sy)2D(s,y)=(s-y)^{2}. The decoder can losslessly reconstruct the message WW and the host sequence XX from the stego sequence YY. Figure 1 shows the general framework of RDH on a gray-scale image XX, where YY is the stego image with some distortion after embedding WW.

Refer to caption
Figure 1: An example of RDH on a gray-scale image.

II-B Related works

II-B1 A Near Optimal Coding Method using Arithmetic Coding

In our previous work [1], we proposed a backward and forward iterative (BFI) algorithm to estimate the optimal distribution PYP_{Y} of the stego signals. The BFI algorithm assumes that the host and stego signals draw from the same set, i.e., ΣZ=ΣB\Sigma_{Z}=\Sigma_{B}. Algorithm 1 gives the details. Notably, the BFI method controls an input parameter α\alpha for various embedding rate and distortion pairs. The case α=1\alpha=1 admits the uniform distribution PY={PY(y)=1/B|yΣB}P_{Y}=\left\{P_{Y}(y)=1/B|y\in\Sigma_{B}\right\} performing the maximal embedding rate, and another case α=+\alpha=+\infty generates an unchanged distribution PY=PXP_{Y}=P_{X} with zero embedding rate.

Input: The cumulative pmf PCXP_{CX} of the host sequence XX, a real number α\alpha and the tolerance ε\varepsilon.
Output: The cumulative pmf PCYP_{CY} of the stego sequence YY.
1 Given an initial set E={ey=(y+1)/B,y=1toB1}E=\left\{e_{y}=(y+1)/B,y=-1\;\text{to}\;B-1\right\}, or the user can design an arbitrary initialization as long as 0=e1e0eB1=10=e_{-1}\leq e_{0}\leq\cdots\leq e_{B-1}=1. Declare the variable var=0var=0.
2 For yy from 0 to B2B-2, update each eye_{y} through
eynew={ey+1ey11+αD(s,y)D(s,y+1)if there existsey+ey1,(PCX(s1),PCX(s));if there existsαD(s,y)D(s,y+1)PCX(s),ey+1PCX(s)PCX(s)ey1αD(s+1,y)D(s+1,y+1);e_{y}^{new}=\begin{cases}\frac{e_{y+1}-e_{y-1}}{1+\alpha^{D(s,y)-D(s,y+1)}}&\text{if there exists}\;e_{y}\in\\ +e_{y-1},&(P_{CX}(s-1),P_{CX}(s));\\ &\text{if there exists}\\ &\alpha^{D(s,y)-D(s,y+1)}\\ P_{CX}(s),&\leq\frac{e_{y+1}-P_{CX}(s)}{P_{CX}(s)-e_{y-1}}\\ &\leq\alpha^{D(s+1,y)-D(s+1,y+1)};\end{cases}
3 After finding the new value eynewe_{y}^{new}, record the maximal offset by
var=max{var,|eyneweyold|}.var=max\left\{var,|e_{y}^{new}-e_{y}^{old}|\right\}.
4 For yy from B2B-2 to 0, update each eye_{y} and the maximal offset varvar through the same criterion in steps 2233.
If varεvar\geq\varepsilon, set var=0var=0 and then go to step 2; otherwise, output PCY={PCY(y)=ey}P_{CY}=\left\{P_{CY}(y)=e_{y}\right\}.
Algorithm 1 The BFI algorithm [1] to estimate the optimal cumulative pmf PCYP_{CY}

Based on the result of the BFI algorithm, we proposed a coding scheme in [1] to embed a message WW into the host sequence X=(x1,x2,,xN)X=\left(x_{1},x_{2},\cdots,x_{N}\right) by the given cumulative pmfs PCXP_{CX} and PCYP_{CY}. The coding framework is similar to arithmetic coding. The encoder maintains two values (l(i),u(i))[0,1]\left(l^{(i)},u^{(i)}\right)\subseteq\left[0,1\right], i=1,2,,Ni=1,2,\cdots,N to interpret the joint information of the host signal xix_{i} and the temporal information produced at the last step. At the initialization, the vector (l(1),u(1))=(PX(x11),PX(x1))\left(l^{(1)},u^{(1)}\right)=\left(P_{X}(x_{1}-1),P_{X}(x_{1})\right) represents the information of the first host signal. At the ii-th step, the encoder transforms the vector (l(i),u(i))\left(l^{(i)},u^{(i)}\right) and the next host xi+1x_{i+1} to the stego signal yiy_{i} and the next vector (l(i+1),u(i+1))\left(l^{(i+1)},u^{(i+1)}\right), expressed as (yi,l(i+1),u(i+1))=Encode(xi+1,l(i),u(i))\left(y_{i},l^{(i+1)},u^{(i+1)}\right)=Encode\left(x_{i+1},l^{(i)},u^{(i)}\right). First, we need to find the possible yiy_{i} within the interval (l(i),u(i)](l^{(i)},u^{(i)}], then yi[y1(i),y2(i)]y_{i}\in\left[y_{1}^{(i)},y_{2}^{(i)}\right] if and only if [PY(y1(i)),PY(y2(i)1))[l(i),u(i))[PY(y1(i)1),PY(y2(i)))[P_{Y}(y_{1}^{(i)}),P_{Y}(y_{2}^{(i)}-1))\subset[l^{(i)},u^{(i)})\subseteq[P_{Y}(y_{1}^{(i)}-1),P_{Y}(y_{2}^{(i)})). Second, for the case y1(i)=y2(i)y_{1}^{(i)}=y_{2}^{(i)}, the stego signal yi=y1(i)y_{i}=y_{1}^{(i)} is determined. The y1(i)y_{1}^{(i)} determines the interval [PY(y1(i)1),PY(y1(i)))[P_{Y}(y_{1}^{(i)}-1),P_{Y}(y_{1}^{(i)})), which is larger than (l(i),u(i)](l^{(i)},u^{(i)}], so for lossless recovery, the decoder needs more information, expressed as the information vector (PY(y1(i)1),l(i),u(i),PY(y1(i)))\left(P_{Y}(y_{1}^{(i)}-1),l^{(i)},u^{(i)},P_{Y}(y_{1}^{(i)})\right) for lossless decoding. One can see that in order to determine yiy_{i}, the pmf PYP_{Y} needs to be known first, and it can be derived from the cumulative pmf PCYP_{CY}, i.e., PY={PY(y)=PCY(y)PCY(y1)|yΣB}P_{Y}=\left\{P_{Y}(y)=P_{CY}(y)-P_{CY}(y-1)|y\in\Sigma_{B}\right\}. The next host signal xi+1x_{i+1} determines the interval [PX(xi+11),PX(xi+1))[P_{X}(x_{i+1}-1),P_{X}(x_{i+1})), and we scale the information vector to [PX(xi+11),PX(xi+1))[P_{X}(x_{i+1}-1),P_{X}(x_{i+1})) to obtain (l(i+1),u(i+1))=(F(l(i),xi+1,yi),F(u(i),xi+1,yi))\left(l^{(i+1)},u^{(i+1)}\right)=\left(F(l^{(i)},x_{i+1},y_{i}),F(u^{(i)},x_{i+1},y_{i})\right), where

F(k,s,y)=PX(s)PX(s1)PY(y)PY(y1)(kPY(y1))+PX(s1),F(k,s,y)=\frac{P_{X}(s)-P_{X}(s-1)}{P_{Y}(y)-P_{Y}(y-1)}\left(k-P_{Y}(y-1)\right)+P_{X}(s-1), (3)

for kk\in\mathbb{R}, s,yΣBs,y\in\Sigma_{B}. Then, we proceed to process the next host signal.

For another case y1(i)>y2(i)y_{1}^{(i)}>y_{2}^{(i)}, the yiy_{i} has several possible values corresponding to each interval of

(l(i),PY(y1(i)),,PY(y2(i)1),u(i)).\left(l^{(i)},P_{Y}(y_{1}^{(i)}),\cdots,P_{Y}(y_{2}^{(i)}-1),u^{(i)}\right).

The adaptive arithmetic decoding is applied on the binary representation of message WW to determine the value yiy_{i}. To fit the usage requirement of arithmetic coding, the two ends of the vector are scaled to 0 and 11, resulting in (0,G(PY(y1(i)),l(i),u(i)),,G(PY(y2(i)1),l(i),u(i)),1)\left(0,G(P_{Y}(y_{1}^{(i)}),l^{(i)},u^{(i)}),\cdots,G(P_{Y}(y_{2}^{(i)}-1),l^{(i)},u^{(i)}),1\right), where

G(k,l,u)=klu1,fork,l,u.G(k,l,u)=\frac{k-l}{u-1},\qquad\text{for}\;k,l,u\in\mathbb{R}.

The arithmetic decoder determines the value yiy_{i}, and the corresponding residual information that depends on the yiy_{i} is discussed below. For the case y1(i)+1yiy2(i)1y_{1}^{(i)}+1\leq y_{i}\leq y_{2}^{(i)}-1, the [PY(yi1),PY(yi))[P_{Y}(y_{i}-1),P_{Y}(y_{i})) is within the interval (l(i),u(i)](l^{(i)},u^{(i)}], so we don’t have the residual information, and the updated vector is (l(i+1),u(i+1))=(PX(xi+11),PX(xi+1))\left(l^{(i+1)},u^{(i+1)}\right)=(P_{X}(x_{i+1}-1),P_{X}(x_{i+1})) representing the next host signal. For the case yi=y1(i)y_{i}=y_{1}^{(i)}, the residual information is interpreted as the vector (PY(yi1),l(i),PY(yi),PY(yi))(P_{Y}(y_{i}-1),l^{(i)},P_{Y}(y_{i}),P_{Y}(y_{i})), which is scaled into the interval [PX(xi+11),PX(xi+1))[P_{X}(x_{i+1}-1),P_{X}(x_{i+1})) to obtain the updated vector (l(i+1),u(i+1))=(F(l(i),xi+1,yi),PX(xi+1))\left(l^{(i+1)},u^{(i+1)}\right)=\left(F(l^{(i)},x_{i+1},y_{i}),P_{X}(x_{i+1})\right). For the case yi=y2(i)y_{i}=y_{2}^{(i)}, the residual information is interpreted as the vector (PY(yi1),PY(yi1),u(i),PY(yi))(P_{Y}(y_{i}-1),P_{Y}(y_{i}-1),u^{(i)},P_{Y}(y_{i})), which is scaled into the interval [PX(xi+11),PX(xi+1))[P_{X}(x_{i+1}-1),P_{X}(x_{i+1})) to obtain the updated vector (l(i+1),u(i+1))=(PX(xi+11),F(u(i),xi+1,yi))\left(l^{(i+1)},u^{(i+1)}\right)=\left(P_{X}(x_{i+1}-1),F(u^{(i)},x_{i+1},y_{i})\right). In summary, the updated vector (l(i+1),u(i+1))\left(l^{(i+1)},u^{(i+1)}\right) is as follows:

l(i+1)=max{PX(xi+11),F(l(i),xi+1,yi)}l^{(i+1)}=max\left\{P_{X}(x_{i+1}-1),F(l^{(i)},x_{i+1},y_{i})\right\}

and

u(i+1)=min{PX(xi+1),F(u(i),xi+1,yi)}.u^{(i+1)}=min\left\{P_{X}(x_{i+1}),F(u^{(i)},x_{i+1},y_{i})\right\}.

We continue the above steps until there is no next host signal. When yNy_{N} is designated, the encoder sends out a prefix-free code value v[N~(l(N),yN),N~(u(N),yN))v\in[\tilde{N}(l^{(N)},y_{N}),\tilde{N}(u^{(N)},y_{N})) with minimal code length, where

N~(s,y)=sPY(y1)PY(y)PY(y1).\tilde{N}(s,y)=\frac{s-P_{Y}(y-1)}{P_{Y}(y)-P_{Y}(y-1)}.

More details can be found in [1].

There are two observations from the above description. First, the practical implementation cannot store the real interval (l(i),u(i))\left(l^{(i)},u^{(i)}\right) due to the machine precision. Therefore, our previous work maintained the variables L¯=(l(i)+1)×2b\overline{L}=\left(\lfloor l^{(i)}\rfloor+1\right)\times 2^{b} and H¯=u(i)×2b\overline{H}=\lfloor u^{(i)}\rfloor\times 2^{b} with length bb, where \lfloor\cdot\rfloor returns the greatest integer less than or equal to the input number, and b+b\in\mathbb{N}^{+}. As the arithmetic coding proceeds, the L¯\overline{L} gradually approaches the H¯\overline{H} and the two values may coincide in certain cases. When L¯\overline{L} and H¯\overline{H} coincide, it is impossible to embed any data. That is, the usage of arithmetic coding here may suffer from computing precision problem. Second, it is necessary to transmit the cumulative pmfs PCXP_{CX} and PCYP_{CY}, or only transmit PCXP_{CX} and use the BFI algorithm to generate PCYP_{CY}. No matter which strategy is chosen, it costs an additional overhead proportional to the alphabet size.

II-B2 A Variant of ANS Coding (ANS-Variant) [26]

The key idea of the pure form ANS coding is to encode a sequence of input symbols into a single natural number (also called state), unlike arithmetic coding that operates on a fraction. Before encoding, ANS holds a start state xx. On a high level, the ANS encoder always maintains a state that can be modified by the encoding function x=𝒞(s,x)x^{\prime}=\mathcal{C}(s,x)\in\mathbb{N}: 𝒞(s,x)\mathcal{C}(s,x) takes a symbol ss and the current state xx and produces a new state xx^{\prime} that encodes both the symbol ss and current state xx. Correspondingly, the decoding function 𝒟(x)\mathcal{D}(x^{\prime}) decodes the state xx^{\prime} to a symbol ss and the previous state xx, i.e., 𝒟(x)=(s,x)\mathcal{D}(x^{\prime})=(s,x). It is noted that the encoder and decoder in ANS coding go through the same sequence of states, just in opposite order. This paper uses the convention that the encoder processes symbols forward, from the first to the last, and the decoder recovers symbols backwards, from the last to the first.

Consider an input sequence X=(x1,x2,,xN)X=\left(x_{1},x_{2},\cdots,x_{N}\right), it is assumed that N=sΣBfX(s)N=\sum_{s\in\Sigma_{B}}f_{X}(s) is a power of two, i.e. N=2nN=2^{n} for an integer n0n\geq 0. This is a standard assumption like arithmetic coding for ANS coding to simplify arithmetic operations. In our recent work [26], we proposed a new ANS variant that forces the state xx always at a fixed interval I:=[2Tv×n,2T)I:=\left[2^{T-v\times n},2^{T}\right) by renormalization, where T,v+T,v\in\mathbb{N}^{+} as long as Tv×n0T-v\times n\geq 0. Let Is:=[fX(s)×2Tv×n,2T)I_{s}:=\left[f_{X}(s)\times 2^{T-v\times n},2^{T}\right) denote the interval corresponding to the symbol ss, which ensures that the new state after encoding symbol ss is within the interval II. That is, if xIsx\in I_{s} we have 𝒞(x,s)I\mathcal{C}\left(x,s\right)\in I; otherwise, we have 𝒞(x,s)I\mathcal{C}\left(x,s\right)\notin I. The encoding function 𝒞(x,s)\mathcal{C}\left(x,s\right) is defined as 𝒞(x,s)=x/fX(s)\mathcal{C}\left(x,s\right)=\lfloor x/f_{X}(s)\rfloor. For the ease of understanding, the encoding procedure is assumed to be implemented using a stack, where each entry in the stack is a bit. Below gives the encoding details. To encode each symbol ss, if the current state xx is within IsI_{s}, we first push nn-bit value cX(s)+(xmodfX(s))c_{X}(s)+(x\bmod{f_{X}(s)}) to the stack and produce a new state xx by x𝒞(x,s)=x/fX(s)x\leftarrow\mathcal{C}\left(x,s\right)=\lfloor x/f_{X}(s)\rfloor, where mod\bmod denotes the remainder operation. Otherwise, we first pop v×nv\times n bits from the stack, and append them to the least significant v×nv\times n bits of xx to increase it. That is, if the decimal representation of the popped v×nv\times n bits is DD, we have x(xv×n)+Dx\leftarrow(x\ll v\times n)+D. After that, we push nn-bit value cX(s)+(xmodfX(s))c_{X}(s)+(x\bmod{f_{X}(s)}) to the stack and perform 𝒞(x,s)\mathcal{C}\left(x,s\right) with the increased state xx. Notably, to ensure that there are enough bits in the stack before the first popping, we initialize the start state to be large enough, e.g., x=2T1x=2^{T}-1. We follow the above steps until all symbols are encoded. Finally, it is necessary to transmit the final state of encoding (which is also the start state of decoding) and the remaining bits in the stack to the decoder for lossless recovery.

Refer to caption
Figure 2: Encoding example: coding a sequence X=baabcX=baabc with PX(a)=PX(b)=96256P_{X}(a)=P_{X}(b)=\frac{96}{256}, PX(c)=64256P_{X}(c)=\frac{64}{256}. The encoder proceeds from bottom to top, and the decoder proceeds from top to bottom (as indicated). Both go through the exact same sequence of states and perform I/O in the same places.

Next, we discuss the corresponding decoding details of our recent ANS-Variant coding. The decoding procedure can also be implemented using a stack, where each entry in the stack is also a bit. First, in order to completely reconstruct the input sequence XX, the decoder needs to know the statistics of the input sequence, including the total number NN of symbols, the frequency fX(s)f_{X}(s) of each symbol ss and the selected fixed interval II. Let c~X(i)s\tilde{c}_{X}(i)\triangleq s if cX(s)i<cX(s+1)c_{X}(s)\leq i<c_{X}(s+1) for i[0,N)i\in[0,N) and sΣBs\in\Sigma_{B}. Given a start state xx for decoding (i.e., the final state in the encoding) and the compressed bits stored in the stack, we pop nn-bit digit dd from the stack and decode it to a symbol ss, i.e., sc~X(d)s\leftarrow\tilde{c}_{X}(d). Then, we update the state by xfX(s)×x+rx\leftarrow f_{X}(s)\times x+r, where r=dcX(s)r=d-c_{X}(s). It can be seen that after a symbol is decoded, the value of state becomes larger. As more and more symbols get decoded, its value will eventually grow beyond a bound. When the state xx exceeds the upper bound 2T2^{T} chosen during the encoding, i.e., x2Tx\geq 2^{T}, we push the least significant v×nv\times n bits of xx to the stack to make xx smaller, i.e., xxv×nx\leftarrow x\gg v\times n. We repeat the above steps, and the repetition stops when the reconstruction of input sequence XX is achieved. Figure 2 shows a worked-through example for a partial sequence “baabc" with parameters n=8n=8, v=2v=2 and T=24T=24. The start state for encoding is chosen as x=2T1=16777215x=2^{T}-1=16777215. It can be seen that the encoder and decoder go through the same state just in opposite order. Also, the order of the symbols reconstructed by the decoder is opposite to that received by the encoder, e.g., the last encoded symbol is decoded first.

III Proposed Coding Scheme

The first limitation in our previous work [1] is that it uses arithmetic coding as the underlying coding framework. Arithmetic coding assigns each symbol to a sub-interval [L¯,H¯)[0.0,1.0)[\overline{L},\overline{H})\subseteq[0.0,1.0) according to the probability of that symbol. When a symbol is encoded, the sub-interval narrows, and L¯\overline{L} and H¯\overline{H} move closer together. Because the machine system cannot represent infinite precision, as more and more symbols are encoded, L¯\overline{L} gradually approaches H¯\overline{H} and the two values eventually may coincide, then further data embedding is impossible. In this section, we first propose a coding method that employs our recent ANS variant [26] to embed a message WW into the host sequence XX. Then, the data extraction of the host sequence and the embedded message is given. Notably, this section considers the case where the statistics of the input sequence are known to the encoder and decoder, and refers to them as static data embedding and static data extraction. In the following, let xix_{i} and yiy_{i} represent the values of host signal and stego signal at position ii in the sequence, respectively.

III-A Static Data Embedding

Our recent ANS-Variant coding (see Section II-B2 for details) can provide the compression ratio close to that of arithmetic coding. More importantly, it does not suffer from the computing precision problem that may occur in arithmetic coding. As a result, we adopt ANS-Variant as the coding framework to embed a message WW into the host sequence X=(x1,x2,,xN)X=\left(x_{1},x_{2},\cdots,x_{N}\right). The framework consists of a combination of an encoder and a decoder of ANS-Variant coding, where the encoder is used to embed the data, and the decoder returns the corresponding stego signals. From the description of ANS-Variant coding, one can see that in order to encode a host signal xix_{i} into the current state xx, the ANS-Variant encoder outputs an nn-bit value cxi+(xmodfxi)c_{x_{i}}+(x\bmod{f_{x_{i}}}), and if the state xx is less than the lower bound of the selected fixed interval, the encoder reads bits to increase it. This observation can be exploited to realize data embedding, i.e., one can choose to read bits from the message WW to increase the value of the state if necessary. In addition, the BFI algorithm (described in Section II-B1) can estimate the optimal pmf of the stego sequence for a given host sequence. Therefore, we can apply the ANS-Variant decoder on the statistics of the stego sequence according to the nn-bit value cxi+(xmodfxi)c_{x_{i}}+(x\bmod{f_{x_{i}}}) to decode the corresponding stego signal yiy_{i}. That is, the nn-bit value cxi+(xmodfxi)c_{x_{i}}+(x\bmod{f_{x_{i}}}) produced by the ANS-Variant encoder serves as the temporary information to decode the corresponding stego signal yiy_{i}.

Input: A host sequence X=(x1,x2,,xN)X=\left(x_{1},x_{2},\cdots,x_{N}\right), the message WW to embed, the alphabet size BB, the cumulative pmf PCXP_{CX}.
Output: A stego sequence Y=(y1,y2,,yN)Y=\left(y_{1},y_{2},\cdots,y_{N}\right), the final state xx.
1 Get the host statistics FX={fX(s)|sΣB}F_{X}=\left\{f_{X}(s)|s\in\Sigma_{B}\right\} and CX={cX(s)|sΣB}C_{X}=\left\{c_{X}(s)|s\in\Sigma_{B}\right\} from the PCXP_{CX};
2 Use the BFI algorithm to estimate the optimal PCYP_{CY};
3 Get the stego statistics FY={fY(y)|yΣB}F_{Y}=\left\{f_{Y}(y)|y\in\Sigma_{B}\right\} and CY={cY(y)|yΣB}C_{Y}=\left\{c_{Y}(y)|y\in\Sigma_{B}\right\} from the PCYP_{CY};
4 Set a start state x1x\leftarrow 1;
5 for i=1i=1 to NN do
6       if x<fX(xi)×2Tv×nx<f_{X}(x_{i})\times 2^{T-v\times n} then
7             DReadW(v×n)D\leftarrow Read_{W}(v\times n);
8             x(xv×n)+Dx\leftarrow(x\ll v\times n)+D;
9            
10      ηX=cX(xi)+(xmodfX(xi))\eta_{X}=c_{X}(x_{i})+(x\bmod{f_{X}(x_{i})});
11       xx/fX(xi)x\leftarrow\lfloor x/f_{X}(x_{i})\rfloor;
12       yic~Y(ηX)y_{i}\leftarrow\tilde{c}_{Y}(\eta_{X});
13       xfY(yi)×x+ηXcY(yi)x\leftarrow f_{Y}(y_{i})\times x+\eta_{X}-c_{Y}(y_{i});
14       if x2Tx\geq 2^{T} then
15             call WriteW(x,v×n)Write_{W}(x,v\times n);
16             xxv×nx\leftarrow x\gg v\times n;
17            
18      
Algorithm 2 Proposed static embedding algorithm

The details of the embedding procedure are as follows. First, we require knowledge of the host statistics FX={fX(s)|sΣB}F_{X}=\left\{f_{X}(s)|s\in\Sigma_{B}\right\} and CX={cX(s)|sΣB}C_{X}=\left\{c_{X}(s)|s\in\Sigma_{B}\right\}, and get the cumulative pmf PCX={PCX(x)|xΣB}P_{CX}=\left\{P_{CX}(x)|x\in\Sigma_{B}\right\}, where PCX(x)=i=0xPX(i)P_{CX}(x)=\sum_{i=0}^{x}P_{X}(i) and PX(i)=fX(i)NP_{X}(i)=\frac{f_{X}(i)}{N}. Then, we apply the BFI algorithm on the PCXP_{CX} to estimate the optimal cumulative pmf PCYP_{CY} of the stego sequence YY. Once the distribution PCY={PCY(y)|yΣB}P_{CY}=\left\{P_{CY}(y)|y\in\Sigma_{B}\right\} is known, we can further get the stego statistics FY={fY(y)|yΣB}F_{Y}=\left\{f_{Y}(y)|y\in\Sigma_{B}\right\} and CY={cY(y)|yΣB}C_{Y}=\left\{c_{Y}(y)|y\in\Sigma_{B}\right\}, where cY(y)=PCY(y)×Nc_{Y}(y)=P_{CY}(y)\times N and fY(y)=cY(y+1)cY(y)f_{Y}(y)=c_{Y}(y+1)-c_{Y}(y). During the embedding, we process the host signals forward, i.e., from the first to the last signal. Given a start state xx, for each host signal xix_{i}, we first use an ANS-Variant encoder to encode it. Let I:=[2Tv×n,2T)I:=\left[2^{T-v\times n},2^{T}\right) and Ixi:=[fX(xi)×2Tv×n,2T)I_{x_{i}}:=\left[f_{X}(x_{i})\times 2^{T-v\times n},2^{T}\right). During the encoding, we have the following two cases. In the first, if the current state x<fX(xi)×2Tv×nx<f_{X}(x_{i})\times 2^{T-v\times n}, we read v×nv\times n bits from the message WW to increase xx. That is, if the decimal representation of the v×nv\times n bits read is DD, we have x(xv×n)+Dx\leftarrow(x\ll v\times n)+D. After that, we maintain an nn-bit temporal information ηX=cX(xi)+(xmodfX(xi))\eta_{X}=c_{X}(x_{i})+(x\bmod{f_{X}(x_{i})}) and update the state xx by x𝒞(x,xi)=x/fX(xi)x\leftarrow\mathcal{C}\left(x,x_{i}\right)=\lfloor x/f_{X}(x_{i})\rfloor. It is noted that the nn-bit temporary information produced by the host signal xix_{i} will be used in the following steps to decode the stego signal yiy_{i} at the corresponding position. In the second, if the current state xx is within IxiI_{x_{i}}, we maintain an nn-bit temporal information ηX=cX(xi)+(xmodfX(xi))\eta_{X}=c_{X}(x_{i})+(x\bmod{f_{X}(x_{i})}) and produce a new state xx by x𝒞(x,xi)x\leftarrow\mathcal{C}\left(x,x_{i}\right). Next, we utilize the temporary information ηX\eta_{X} maintained in the previous step and the stego statistics FY={fY(y)|yΣB}F_{Y}=\left\{f_{Y}(y)|y\in\Sigma_{B}\right\} and CY={cY(y)|yΣB}C_{Y}=\left\{c_{Y}(y)|y\in\Sigma_{B}\right\} to decode the corresponding stego signal yiy_{i}. Precisely, we transmit the temporary information ηX\eta_{X} to an ANS-Variant decoder to decode the stego signal yiy_{i}, i.e., yic~Y(ηX)y_{i}\leftarrow\tilde{c}_{Y}(\eta_{X}). Then, we update the state xx by xfY(yi)×x+ryix\leftarrow f_{Y}(y_{i})\times x+r_{y_{i}}, where ryi=ηXcY(yi)r_{y_{i}}=\eta_{X}-c_{Y}(y_{i}). From the description of ANS-Variant decoding in Section II-B2, one can see that the updated state xx has the following two cases. In the first, If the value of state xx exceeds the upper bound 2T2^{T} chosen during the encoding, we write the least significant v×nv\times n bits of xx to the message WW to make xx smaller, i.e., xxv×nx\leftarrow x\gg v\times n. In the second, if the updated state is exactly within the interval II, no further changes to the state are needed.

We repeat the above steps for the next host signal xi+1x_{i+1} until there is no next host signal. As can be seen, an embedding cycle consists of a combination of an encoder and a decoder of ANS-Variant coding. Among them, the encoder aims to embed the data, and the decoder is designed to obtain the corresponding stego signal. Algorithm 2 presents the details. In Algorithm 2, Line 11 aims to get the host statistics. Lines 2233 first apply the BFI algorithm to estimate the optimal cumulative pmf PCYP_{CY} and then get the stego statistics from the PCYP_{CY}. In Line 44, the start state xx is chosen to be small enough, e.g., set to 11, in order to make the encoder read the bits from the message WW as early as possible, thus increasing the amount of embedded data. Lines 661010 perform the encoding of ANS-Variant coding. In particular, Lines 7788 read bits from the message WW to enlarge the state if the current state is too small. Lines 11111515 perform the decoding of ANS-Variant coding to generate the corresponding stego signal yiy_{i}. Finally, we get a stego sequence Y=(y1,y2,,yN)Y=\left(y_{1},y_{2},\cdots,y_{N}\right) and a final state xx. The final state xx needs to be transmitted along with the stego sequence YY, so that the extractor knows what state to start with. In addition, the shortened length (in bits) of the message WW is the amount of data embedded.

III-B Static Data Extraction

In RDH techniques, data extraction includes extracting the host sequence and the embedded message from the received stego sequence. In the proposed static data embedding (see Section III-A for details), we first perform ANS-Variant encoding based on the known host statistics, and then perform ANS-Variant decoding under the premise that the optimal cumulative pmf of the stego sequence has been estimated. The data extraction is just the inverse of data embedding. Therefore, in order to achieve lossless data extraction, we first perform the ANS-Variant encoding under the premise that the optimal cumulative pmf of the stego sequence is known. Then, we perform the ANS-Variant decoding based on the host statistics to decode the host signal. It is worth noting that the data extraction requires transmitting the cumulative pmfs PCXP_{CX} and PCYP_{CY} as parameters. Another strategy is to only transmit PCXP_{CX} and the extractor generates the PCYP_{CY} with the BFI algorithm. The latter saves space but requires a computational cost to run the BFI algorithm. This section takes the latter as an example to introduce the process of static data extraction.

Given a stego sequence Y=(y1,y2,,yN)Y=\left(y_{1},y_{2},\cdots,y_{N}\right), a start state xx, and the cumulative pmf PCX={PCX(x)|xΣB}P_{CX}=\left\{P_{CX}(x)|x\in\Sigma_{B}\right\}, the details of data extraction are as follows. First, we apply the BFI algorithm on the host PCXP_{CX} to estimate the optimal cumulative pmf PCY={PCY(y)|yΣB}P_{CY}=\left\{P_{CY}(y)|y\in\Sigma_{B}\right\}. According to the PCXP_{CX} and PCYP_{CY}, we calculate the stego statistics FY={fY(y)|yΣB}F_{Y}=\left\{f_{Y}(y)|y\in\Sigma_{B}\right\} and CY={cY(y)|yΣB}C_{Y}=\left\{c_{Y}(y)|y\in\Sigma_{B}\right\}, and the host statistics FX={fX(s)|sΣB}F_{X}=\left\{f_{X}(s)|s\in\Sigma_{B}\right\} and CX={cX(s)|sΣB}C_{X}=\left\{c_{X}(s)|s\in\Sigma_{B}\right\}, where cY(y)=PCY(y)×Nc_{Y}(y)=P_{CY}(y)\times N, fY(y)=cY(y+1)cY(y)f_{Y}(y)=c_{Y}(y+1)-c_{Y}(y), and cX(s)=PCX(s)×Nc_{X}(s)=P_{CX}(s)\times N, fX(s)=cX(s+1)cX(s)f_{X}(s)=c_{X}(s+1)-c_{X}(s). These statistics are used in the subsequent encoding and decoding of ANS-Variant coding. As the encoder and decoder of ANS-Variant coding run in reverse order, we process the stego signals backwards, i.e., from the last to the first signal. For each stego signal yiy_{i}, we first perform the ANS-Variant encoding based on the obtained stego statistics FYF_{Y} and CYC_{Y}. Specifically, let I:=[2Tv×n,2T)I:=\left[2^{T-v\times n},2^{T}\right) and Iyi:=[fY(yi)×2Tv×n,2T)I_{y_{i}}:=\left[f_{Y}(y_{i})\times 2^{T-v\times n},2^{T}\right). In the encoding, we have the following two cases. In the first, if the current state x<fY(yi)×2Tv×nx<f_{Y}(y_{i})\times 2^{T-v\times n}, we read v×nv\times n bits from the message WW to increase xx. Then, we hold an nn-bit temporal information ηY=cY(yi)+(xmodfY(yi))\eta_{Y}=c_{Y}(y_{i})+(x\bmod{f_{Y}(y_{i})}) and update the state xx by x𝒞(x,yi)=x/fY(yi)x\leftarrow\mathcal{C}\left(x,y_{i}\right)=\lfloor x/f_{Y}(y_{i})\rfloor. In the second, if the current state xx is within IyiI_{y_{i}}, we directly maintain an nn-bit temporal information ηY=cY(yi)+(xmodfY(yi))\eta_{Y}=c_{Y}(y_{i})+(x\bmod{f_{Y}(y_{i})}) and produce a new state xx by x𝒞(x,yi)x\leftarrow\mathcal{C}\left(x,y_{i}\right). The above nn-bit temporary information ηY\eta_{Y} produced by encoding the stego signal yiy_{i} is used in the following steps to decode the corresponding host signal xix_{i}.

Input: A stego sequence Y=(y1,y2,,yN)Y=\left(y_{1},y_{2},\cdots,y_{N}\right), a start state xx, the alphabet size BB, the cumulative pmf PCXP_{CX}.
Output: A host sequence X=(x1,x2,,xN)X=\left(x_{1},x_{2},\cdots,x_{N}\right), the embedded message WW.
1 Use the BFI algorithm to estimate the optimal PCYP_{CY};
2 Get the host statistics FX={fX(s)|sΣB}F_{X}=\left\{f_{X}(s)|s\in\Sigma_{B}\right\} and CX={cX(s)|sΣB}C_{X}=\left\{c_{X}(s)|s\in\Sigma_{B}\right\} from the PCXP_{CX};
3 Get the stego statistics FY={fY(y)|yΣB}F_{Y}=\left\{f_{Y}(y)|y\in\Sigma_{B}\right\} and CY={cY(y)|yΣB}C_{Y}=\left\{c_{Y}(y)|y\in\Sigma_{B}\right\} from the PCYP_{CY};
4 for i=Ni=N to 11 do
5       if x<fY(yi)×2Tv×nx<f_{Y}(y_{i})\times 2^{T-v\times n} then
6             DReadW(v×n)D\leftarrow Read_{W}(v\times n);
7             x(xv×n)+Dx\leftarrow(x\ll v\times n)+D;
8            
9      ηY=cY(yi)+(xmodfY(yi))\eta_{Y}=c_{Y}(y_{i})+(x\bmod{f_{Y}(y_{i})});
10       xx/fY(yi)x\leftarrow\lfloor x/f_{Y}(y_{i})\rfloor;
11       xic~X(ηY)x_{i}\leftarrow\tilde{c}_{X}(\eta_{Y});
12       xfX(xi)×x+ηYcX(xi)x\leftarrow f_{X}(x_{i})\times x+\eta_{Y}-c_{X}(x_{i});
13       if x2Tx\geq 2^{T} then
14             call WriteW(x,v×n)Write_{W}(x,v\times n);
15             xxv×nx\leftarrow x\gg v\times n;
16            
17      
Algorithm 3 Proposed static extraction algorithm

Next, we utilize the temporary information ηY\eta_{Y} maintained in the previous step and the host statistics FX={fX(s)|sΣB}F_{X}=\left\{f_{X}(s)|s\in\Sigma_{B}\right\} and CX={cX(s)|sΣB}C_{X}=\left\{c_{X}(s)|s\in\Sigma_{B}\right\} to decode the host signal xix_{i}. Precisely, we transmit the temporary information ηY\eta_{Y} to the ANS-Variant decoder to decode the host signal xix_{i}, i.e., xic~X(ηY)x_{i}\leftarrow\tilde{c}_{X}(\eta_{Y}). Then, we update the state xx by xfX(xi)×x+rxix\leftarrow f_{X}(x_{i})\times x+r_{x_{i}}, where rxi=ηYcX(xi)r_{x_{i}}=\eta_{Y}-c_{X}(x_{i}). From the description of ANS-Variant decoding in Section II-B2, one can see that the updated state xx has the following two cases. In the first, if the value of state xx exceeds the upper bound 2T2^{T} chosen during the data embedding, we write the least significant v×nv\times n bits of xx to the message WW to make xx smaller, i.e., xxv×nx\leftarrow x\gg v\times n. In the second, if the updated state is exactly within the interval II, no further changes to the state are needed. We repeat the above steps for the next stego signal yi1y_{i-1} until there is no next stego signal. Algorithm 3 gives the details. In Algorithm 3, Line 11 uses the BFI algorithm to estimate the optimal cumulative pmf PCYP_{CY}. Lines 2233 derive the host and stego statistics from the known PCXP_{CX} and PCYP_{CY}, respectively. Lines 5599 perform the encoding of ANS-Variant coding on the stego signal and maintain a temporary information ηY\eta_{Y}. Lines 10101414 utilize the temporary ηY\eta_{Y} produced in the previous step to decode the host signal xix_{i}. In particular, Lines 12121414 write v×nv\times n bits to the message WW when the state exceeds a given upper bound 2T2^{T}. Finally, we can reconstruct the host sequence X=(x1,x2,,xN)X=\left(x_{1},x_{2},\cdots,x_{N}\right) and the embedded message WW without loss.

IV Dynamic Implementation

Section III presents the embedding/extraction procedure to embed a message in the i.i.d. host sequence. One can see that the proposed static RDH scheme, like our previous work [1], requires the cumulative pmf PCXP_{CX} of the host sequence as a parameter in both the embedding/extraction process. Namely, the proposal involves using preset probabilities of signals that remain static during the whole process. However, transmitting the host pmf costs an additional overhead proportional to the alphabet size BB, which is usually unacceptable in extremely space-constrained scenarios. Moreover, the usability of the proposal is based on the premise that the pmf of the host sequence is known, imposing certain limitations on the practical application. In this section, we present a dynamic implementation without any prior knowledge of the host pmf. The implementation changes the probability after each host signal is encountered, thus saving the prior transmission of PCXP_{CX}. First, we describe the dynamic data embedding algorithm. Second, the corresponding dynamic data extraction algorithm is given.

IV-A Dynamic Data Embedding

When any prior knowledge of the host pmf is not available, the proposed dynamic embedding method becomes a two-pass procedure. First, considering that the ANS-Variant decoder reconstructs all symbols in the reverse order of those processed by its encoder, so in order to ensure the synchronization of the encoder and the decoder, it is necessary to follow the order of the decoder run to collect the statistics of the host signal at each location. Therefore, we first pass the host sequence X=(x1,x2,,xN)X=\left(x_{1},x_{2},\cdots,x_{N}\right) once in the decoder’s running order. Precisely, we start with a frequency distribution FXF_{X}, e.g., FX={fX(s)=1,sΣB}F_{X}=\left\{f_{X}(s)=1,s\in\Sigma_{B}\right\}, where fX(s)f_{X}(s) denotes the number of occurrence of the signal ss. If the ANS-Variant decoder runs backwards, i.e., from the last to the first signal, we update the frequency distribution FXF_{X} backwards based on the host signals encountered. Details of the update are as follows. We start with the last host signal xNx_{N} and perform a backward pass over the host sequence. During the backward pass, when a host signal xix_{i} is encountered, the frequency corresponding to signal xix_{i} is updated via fX(xi)fX(xi)+1f_{X}(x_{i})\leftarrow f_{X}(x_{i})+1. Next, we backward continue with the next signal and update its frequency in the same manner. When the frequency of the first host signal x1x_{1} is updated and there is no next host signal, we get the newest frequency distribution FXF_{X}, which counts the frequency of each signal in the host sequence XX over the start frequency distribution.

Input: A host sequence X=(x1,x2,,xN)X=\left(x_{1},x_{2},\cdots,x_{N}\right), a message WW to embed, the alphabet size BB.
Output: A stego sequence Y=(y1,y2,,yN)Y=\left(y_{1},y_{2},\cdots,y_{N}\right), the final state xx.
1 Initialize FX={fX(s)=1|sΣB}F_{X}=\left\{f_{X}(s)=1|s\in\Sigma_{B}\right\};
2 for i=Ni=N to 11 do
3       fX(xi)fX(xi)+1f_{X}(x_{i})\leftarrow f_{X}(x_{i})+1;
4      
5Set a start state x1x\leftarrow 1;
6 for i=1i=1 to NN do
7       fX(xi)fX(xi)1f_{X}(x_{i})\leftarrow f_{X}(x_{i})-1;
8       Get the host statistic CX={cX(s)|sΣB}C_{X}=\left\{c_{X}(s)|s\in\Sigma_{B}\right\} and the cumulative pmf PCXP_{CX} from the newest FX={fX(s)|sΣB}F_{X}=\left\{f_{X}(s)|s\in\Sigma_{B}\right\};
9       Use the BFI algorithm to estimate the optimal cumulative pmf PCYP_{CY};
10       Get the stego statistics FY={fY(y)|yΣB}F_{Y}=\left\{f_{Y}(y)|y\in\Sigma_{B}\right\} and CY={cY(y)|yΣB}C_{Y}=\left\{c_{Y}(y)|y\in\Sigma_{B}\right\} from the PCYP_{CY};
11       if x<fX(xi)×2Tv×nx<f_{X}(x_{i})\times 2^{T-v\times n} then
12             DReadW(v×n)D\leftarrow Read_{W}(v\times n);
13             x(xv×n)+Dx\leftarrow(x\ll v\times n)+D;
14            
15      ηX=cX(xi)+(xmodfX(xi))\eta_{X}=c_{X}(x_{i})+(x\bmod{f_{X}(x_{i})});
16       xx/fX(xi)x\leftarrow\lfloor x/f_{X}(x_{i})\rfloor;
17       yic~Y(ηX)y_{i}\leftarrow\tilde{c}_{Y}(\eta_{X});
18       xfY(yi)×x+ηXcY(yi)x\leftarrow f_{Y}(y_{i})\times x+\eta_{X}-c_{Y}(y_{i});
19       if x2Tx\geq 2^{T} then
20             call WriteW(x,v×n)Write_{W}(x,v\times n);
21             xxv×nx\leftarrow x\gg v\times n;
22            
23      
Algorithm 4 Proposed dynamic embedding algorithm

When the first backward pass is completed, with the help of the newest statistics FX={fX(s)|sΣB}F_{X}=\left\{f_{X}(s)|s\in\Sigma_{B}\right\}, we then use an ANS-Variant encoder followed by an ANS-Variant decoder to perform data embedding. The dynamic embedding procedure is similar to that of the static data embedding described in Section III-A, except that the dynamic version employs a changing pmf. Given a host sequence X=(x1,x2,,xN)X=\left(x_{1},x_{2},\cdots,x_{N}\right), a message WW to embed and the newest frequency distribution FX={fX(s)|sΣB}F_{X}=\left\{f_{X}(s)|s\in\Sigma_{B}\right\} obtained in the first backward pass, the embedding details are described below. Considering that the encoder and decoder of the ANS-Variant coding run in opposite directions, and the first pass in the same order as the decoder runs is backwards, we perform a second forward pass (i.e., from the first to the last signal) on the host sequence. For each host signal xix_{i} encountered, we first update the corresponding frequency of signal xix_{i} by fX(xi)fX(xi)1f_{X}(x_{i})\leftarrow f_{X}(x_{i})-1. Then, we compute the host statistics CX={cX(s)=i=1s1fX(i)|sΣB}C_{X}=\left\{c_{X}(s)=\sum_{i=-1}^{s-1}f_{X}(i)|s\in\Sigma_{B}\right\} and the cumulative pmf PCX={cX(s)/cX(B1)|sΣB}P_{CX}=\left\{c_{X}(s)/c_{X}(B-1)|s\in\Sigma_{B}\right\} from the newest FX={fX(s)|sΣB}F_{X}=\left\{f_{X}(s)|s\in\Sigma_{B}\right\}. Next, we apply the BFI algorithm on the PCXP_{CX} to estimate the optimal cumulative pmf PCYP_{CY} of the stego sequence YY. Once the PCY={PCY(y)|yΣB}P_{CY}=\left\{P_{CY}(y)|y\in\Sigma_{B}\right\} is known, we can also get the stego statistics FY={fY(y)|yΣB}F_{Y}=\left\{f_{Y}(y)|y\in\Sigma_{B}\right\} and CY={cY(y)|yΣB}C_{Y}=\left\{c_{Y}(y)|y\in\Sigma_{B}\right\}, where cY(y)=PCY(y)×Nc_{Y}(y)=P_{CY}(y)\times N and fY(y)=cY(y)cY(y1)f_{Y}(y)=c_{Y}(y)-c_{Y}(y-1). Once the host and stego statistics (i.e., FXF_{X}, CXC_{X}, FYF_{Y} and CYC_{Y}) are both available, we then implement data embedding using similar steps in static data embedding (see Lines 661515 in Algorithm 2 for details). We repeat the above steps for the next host signal until there is no next host signal. Finally, we get a stego sequence Y=(y1,y2,,yN)Y=\left(y_{1},y_{2},\cdots,y_{N}\right) and a final frequency distribution FXF_{X}. From the update rule of the FXF_{X}, it can be seen that the final FXF_{X} is exactly the same as the start FXF_{X} before the first backward pass. Algorithm 4 gives the details. In Algorithm 4, Lines 1133 initialize the frequency distribution FXF_{X} to all ones and perform a backward pass. Line 66 updates the frequency corresponding to the signal xix_{i}. Lines 7788 first calculate the host statistics CXC_{X} and PCXP_{CX}, and then use the BFI algorithm to estimate the optimal PCYP_{CY}. Line 99 gets the stego statistics FYF_{Y} and CYC_{Y} from the PCYP_{CY}. Lines 10101414 perform the encoding of ANS-Variant coding. In particular, Lines 11111212 read bits from the message WW to enlarge the state if it is too small. Lines 15151919 perform the decoding of ANS-Variant coding to obtain the corresponding stego signal yiy_{i}. Finally, we get a stego sequence Y=(y1,y2,,yN)Y=\left(y_{1},y_{2},\cdots,y_{N}\right) and a final state xx.

According to the above description, it is clear that the host pmf is not necessary in the proposed dynamic data embedding. Therefore, unlike our previous work [1], the proposed dynamic method overcomes the limitation of explicitly transmitting the host pmf PCXP_{CX}, and brings flexibility in practical applications.

IV-B Dynamic Data Extraction

In order to reconstruct the host sequence and the embedded message from the stego sequence without any prior knowledge of the host pmf, the proposed dynamic data extraction also initializes a frequency distribution FXF_{X^{\prime}} and changes it after a host signal is decoded. The start FXF_{X^{\prime}} is exactly the same as the start FXF_{X} before the first backward pass in the dynamic data embedding, i.e., we have FX={fX(s)=1,sΣB}F_{X^{\prime}}=\left\{f_{X^{\prime}}(s)=1,s\in\Sigma_{B}\right\}, where fX(s)f_{X^{\prime}}(s) denotes the number of occurrence of signal ss. The dynamic extraction procedure is similar to that of the static data extraction described in Section III-B, except that the dynamic version employs a changing pmf to save the prior transmission of the host pmf.

Given a stego sequence Y=(y1,y2,,yN)Y=\left(y_{1},y_{2},\cdots,y_{N}\right), a final state xx in the dynamic embedding, the details of the dynamic data extraction are as follows. First, we initialize the frequency distribution FXF_{X^{\prime}} as FX={fX(s)=1,sΣB}F_{X^{\prime}}=\left\{f_{X^{\prime}}(s)=1,s\in\Sigma_{B}\right\}. Then, considering that the encoder and decoder of the ANS-Variant coding run in the opposite directions, we process the stego signals backwards, i.e., from the last to the first signal. For each stego signal yiy_{i}, we first get the statistics CX={cX(s)|sΣB}C_{X}=\left\{c_{X}(s)|s\in\Sigma_{B}\right\} and the cumulative pmf PCX={PCX(s)|sΣB}P_{CX}=\left\{P_{CX}(s)|s\in\Sigma_{B}\right\} from the newest FX={fX(s)|sΣB}F_{X^{\prime}}=\left\{f_{X^{\prime}}(s)|s\in\Sigma_{B}\right\}, where cX(s)=i=1s1fX(i)c_{X}(s)=\sum_{i=-1}^{s-1}f_{X^{\prime}}(i) and PCX(s)=cX(s)/cX(B1)P_{CX}(s)=c_{X}(s)/c_{X}(B-1). Then, we adopt the BFI algorithm on the PCXP_{CX} to estimate the optimal cumulative pmf PCY={PCY(y)|yΣB}P_{CY}=\left\{P_{CY}(y)|y\in\Sigma_{B}\right\}. Once the PCYP_{CY} is available, the corresponding stego statistics FY={fY(y)|yΣB}F_{Y}=\left\{f_{Y}(y)|y\in\Sigma_{B}\right\} and CY={cY(y)|yΣB}C_{Y}=\left\{c_{Y}(y)|y\in\Sigma_{B}\right\} can also be calculated, where cY(y)=PCY(y)×Nc_{Y}(y)=P_{CY}(y)\times N and fY(y)=cY(y+1)cY(y)f_{Y}(y)=c_{Y}(y+1)-c_{Y}(y). Next, we implement data extraction using similar steps in static data extraction (see Lines 551414 in Algorithm 3 for details). Notably, when a host signal xix_{i} is decoded, we need to update its frequency by fX(xi)fX(xi)+1f_{X^{\prime}}(x_{i})\leftarrow f_{X^{\prime}}(x_{i})+1. We repeat the above steps for the next stego signal yi1y_{i-1} until there is no next stego signal. Finally, we can reconstruct the host sequence and the embedded message. Algorithm 5 gives the details. In Algorithm 5, Line 11 initializes FXF_{X^{\prime}} to all ones. Line 33 gets the host statistics CXC_{X} and PCXP_{CX} from the newest FXF_{X^{\prime}}. Lines 4455 first use the BFI algorithm to estimate the optimal PCYP_{CY} and then get the stego statistics FYF_{Y} and CYC_{Y} from the PCYP_{CY}. Lines 661010 perform the ANS-Variant encoding on the stego signals and maintain a temporary information ηY\eta_{Y}. Lines 11111212 utilize the temporary ηY\eta_{Y} to decode the host signal xix_{i} and update the state xx. If the updated state exceeds the selected upper bound 2T2^{T}, Lines 14141515 write the least significant v×nv\times n bits of xx to the message WW to decrease xx. Finally, Line 1616 updates the frequency of the corresponding host signal xix_{i}. From the above description, one can see that the dynamic data extraction procedure does not require any prior knowledge of the host pmf. It dynamically adjusts the pmf based on the decoded host signals to save the additional overhead.

Input: A stego sequence Y=(y1,y2,,yN)Y=\left(y_{1},y_{2},\cdots,y_{N}\right), a final state xx, the alphabet size BB.
Output: A host sequence X=(x1,x2,,xN)X=\left(x_{1},x_{2},\cdots,x_{N}\right), the embedded message WW.
1 Initialize FX={fX(s)=1|sΣB}F_{X^{\prime}}=\left\{f_{X^{\prime}}(s)=1|s\in\Sigma_{B}\right\};
2 for i=Ni=N to 11 do
3       Get the host statistic CX={cX(s)|sΣB}C_{X}=\left\{c_{X}(s)|s\in\Sigma_{B}\right\} and the cumulative pmf PCXP_{CX} from the FX={fX(s)|sΣB}F_{X^{\prime}}=\left\{f_{X^{\prime}}(s)|s\in\Sigma_{B}\right\};
4       Use the BFI algorithm to estimate the optimal PCYP_{CY};
5       Get the stego statistics FY={fY(y)|yΣB}F_{Y}=\left\{f_{Y}(y)|y\in\Sigma_{B}\right\} and CY={cY(y)|yΣB}C_{Y}=\left\{c_{Y}(y)|y\in\Sigma_{B}\right\} from the PCYP_{CY};
6       if x<fY(yi)×2Tv×nx<f_{Y}(y_{i})\times 2^{T-v\times n} then
7             DReadW(v×n)D\leftarrow Read_{W}(v\times n);
8             x(xv×n)+Dx\leftarrow(x\ll v\times n)+D;
9            
10      ηY=cY(yi)+(xmodfY(yi))\eta_{Y}=c_{Y}(y_{i})+(x\bmod{f_{Y}(y_{i})});
11       xx/fY(yi)x\leftarrow\lfloor x/f_{Y}(y_{i})\rfloor;
12       xic~X(ηY)x_{i}\leftarrow\tilde{c}_{X}(\eta_{Y});
13       xfX(xi)×x+ηYcX(xi)x\leftarrow f_{X}(x_{i})\times x+\eta_{Y}-c_{X}(x_{i});
14       if x2Tx\geq 2^{T} then
15             call WriteW(x,v×n)Write_{W}(x,v\times n);
16             xxv×nx\leftarrow x\gg v\times n;
17            
18      fX(xi)fX(xi)+1f_{X^{\prime}}(x_{i})\leftarrow f_{X^{\prime}}(x_{i})+1;
19      
Algorithm 5 Proposed dynamic extraction algorithm

V Simulations

In this section, we evaluate the embedding performance of the proposed static/dynamic method in terms of the mean squared error (MSE), embedding rate (in bits per pixel, bpp) and PSNR value (in dB) for images. The proposed methods are compared with three methods, including our previous work [1] and two high capacity RDH methods [20], [32]. The proposed methods are implemented in Visual C++ code with OpenCV image development kits. First, we show the experimental results on the i.i.d. sequence. Second, all experiments are conducted on gray-scale images (two common standard images of size 512 ×\times 512).

Refer to caption
Figure 3: The pmf of host signals drawn from discrete normal distribution versus pmf of the corresponding stego-signals for various α\alpha.

V-A Experiments for i.i.d. Sequence

In this subsection, we test the embedding performance of the proposed static/dynamic method and our previous work [1] on i.i.d. sequence. We consider the host sequence drawn from discrete normal distribution. The host signals are 88-bit gray-scale with B=256B=256, and the mean of the normal distribution is at 127.5127.5. Figure 3 shows the pmf of host signals for the standard deviation σ=256\sigma=256, and the pmf of stego signals by applying the BFI algorithm for α=1.01,1.001\alpha=1.01,1.001 and 1.00011.0001. It can be seen that the pmf of stego signals are more stationary than that of host signals, thus the stego sequence has a larger Shannon entropy. That is, more information is required on average to represent it, and the increased volume can be utilized to embed secret data.

We adopt two common metrics embedding rate (bpp) and MSE to evaluate the embedding performance. For a host sequence X=(x1,x2,,xN)X=\left(x_{1},x_{2},\cdots,x_{N}\right) and a stego sequence Y=(y1,y2,,yN)Y=\left(y_{1},y_{2},\cdots,y_{N}\right), the MSE is defined as

MSE=1Ni=1N(xiyi)2.\text{MSE}=\frac{1}{N}\sum_{i=1}^{N}(x_{i}-y_{i})^{2}.

The lower the MSE value, the better the quality. The secret message WW is generated by multiple calls to the operation RAND()65536RAND()*65536, which returns a pseudo-random number between 0 (inclusive) and 6553665536 (exclusive). Figures 46 show the rate-distortion curves for the pmf PCXP_{CX} with various σ\sigma. The blue line, yellow line and gray line, respectively, depict the expected rate-distortion curves for σ=128\sigma=128, 256256 and 512512. We implement our previous work [1] (Code [1]), the proposed static method (Ours_s) and the proposed dynamic method (Ours_d) to embed the message in the 6553665536 host signals for each corresponding pmfs PCXP_{CX} and PCYP_{CY}, and the rate-distortion values are respectively marked as circles, rhombuses and triangles in Figures 46. In the proposed static/dynamic implementation, we use 3232 bits to represent the state and choose parameters T=16T=16, n=16n=16 and v=1v=1. From Figures 45, one can see that the performance of the proposed static method is similar to that of our previous work [1], both approach the expected rate-distortion bound. More importantly, the proposed method uses ANS coding instead of arithmetic coding for RDH, which does not cause computing precision problems. Figure 6 gives the performance of the proposed dynamic method. It shows that the rate-distortion values in the dynamic version have a small distance from the expected rate-distortion bound. This is due to the fact that we use a predicted pmf instead of the true pmf. Especially, our proposal does not require transmitting the host pmf during data embedding and extraction, which saves additional overhead and enhances practicality.

Refer to caption
Figure 4: Rate-distortion curves of a given normal distribution for various σ\sigma, and rate-distortion values of our previous code [1].
Refer to caption
Figure 5: Rate-distortion curves of a given normal distribution for various σ\sigma, and rate-distortion values of the proposed static method.
Refer to caption
Figure 6: Rate-distortion curves of a given normal distribution for various σ\sigma, and rate-distortion values of the proposed dynamic method.
Refer to caption
(a)
Refer to caption
(b)
Figure 7: (a). Host image Lena. (b). Four neighboring pixels adopted in the predictor.

V-B Experiments for Gray-Scale Images

In this subsection, the embedding performance of the proposed static/dynamic method on gray-scale images is compared with three state-of-the-art methods, including our previous work [1] and two high capacity RDH methods [20] and [32]. To reduce the correlation of the neighboring image pixels, we preprocess the host image with predictive coding. The predictive coding processes the host pixels from left to right, and from top to bottom. Figure 7 shows a 512×512512\times 512 host image Lena and depicts the four neighboring pixels used for predicting the host pixel hih_{i}. The predicted value h^i\hat{h}_{i} is defined as

h^i=38×a1+38×a2+18×a3+18×a4.\hat{h}_{i}=\frac{3}{8}\times a_{1}+\frac{3}{8}\times a_{2}+\frac{1}{8}\times a_{3}+\frac{1}{8}\times a_{4}. (4)

For the pixels at the leftmost column, e.g, a3a_{3}, we pick the nearby right pixel a2a_{2} as the predicted value. For the pixels at the rightmost column, e.g, a1a_{1}, we pick the nearby lower pixel a4a_{4} as the predicted value. For the pixels at bottom row, e.g, a2a_{2}, we pick the nearby right pixel a4a_{4} as the predicted value. We omit the bottom-right corner pixel and default it to 128128.

In order to embed a message into the host image, the existing RDH methods usually utilize the differences between the host and predicted pixels. However, these methods use Modulo 256256 operation to avoid the overflow/underflow problem, i.e., the values of some pixels in the stego image may exceed the upper or lower bound, e.g., 255255 or 0 for an 88-bit gray-scale image. This may cause the stego image to suffer from the salt and pepper noise. To overcome this issue, we suggest to develop a two-dimensional lookup table K of size B×BB\times B. The row index ii of the table denotes the predicted pixel and the column index jj denotes the true host pixel, where 0i<B0\leq i<B, 0j<B0\leq j<B. The entry K[i][j]\textit{K}[i][j] at the ii-th row and the jj-th column in the table K represents the number of host pixel jj predicted as predicted pixel ii. Next, we give the details of static data embedding/extraction on gray-scale images. First, we initialize the two-dimensional table K to all ones, i.e., K[i][j]=1\textit{K}[i][j]=1, where i,jΣBi,j\in\Sigma_{B}. Then, for each host pixel hih_{i}, we use the predictor to get its predicted value h^i\hat{h}_{i}, and update the corresponding entry by K[h^i][hi]K[h^i][hi]+1\textit{K}[\hat{h}_{i}][h_{i}]\leftarrow\textit{K}[\hat{h}_{i}][h_{i}]+1. If the predictor works accurately enough, i.e., h^ihi\hat{h}_{i}\approx h_{i}, one can see that the larger values in the table K will be clustered on the diagonal. When the two-dimensional table K has been built, we then embed the message into host image by utilizing the corresponding row as the host pmf. We process the host pixels from left to right, and from top to bottom. Specifically, for each host pixel hih_{i}, we first use the predictor to get its predicted value h^i\hat{h}_{i}. Then, we take h^i\hat{h}_{i} as the row index to look up the table K, and return a one-dimensional statistic K[h^i]\textit{K}[\hat{h}_{i}] of size BB. Afterwards, the K[h^i]\textit{K}[\hat{h}_{i}] is used as the frequency distribution to calculate the PCXP_{CX} in Algorithm 2. Once the PCXP_{CX} is available, we can embed the message by applying Algorithm 2. We repeat the above steps until there is no next host pixel. Finally, we can get a stego image. In the static data extraction, we reconstruct the host image in the reverse order from right to left, and from bottom to top. Consider the step of decoding hih_{i} shown in Figure 7b, the four neighboring host pixels a1a_{1}, a2a_{2}, a3a_{3} and a4a_{4} had been decoded by the previous decoding steps. Therefore, we have the predicted value h^i\hat{h}_{i} through (4). Then, we take h^i\hat{h}_{i} as the row index to look up the table K and return a one-dimensional statistic K[h^i]\textit{K}[\hat{h}_{i}] of size BB. Also, the K[h^i]\textit{K}[\hat{h}_{i}] is used as the frequency distribution to calculate the PCXP_{CX} in Algorithm 3. Once the PCXP_{CX} is available, we can perform the data extraction via Algorithm 3. We repeat the above steps until there is no next stego pixel. Finally, we can reconstruct the host image and embedded message.

Below, we discuss the details of dynamic data embedding/extraction on gray-scale images. First, we initialize a two-dimensional table K to all ones, i.e., K[i][j]=1\textit{K}[i][j]=1, where i,jΣBi,j\in\Sigma_{B}. Then, we perform a backward pass over the host image, and for each host pixel hih_{i}, we use the predictor to obtain its predicted value h^i\hat{h}_{i}. And, we update the corresponding entry in the table by K[h^i][hi]K[h^i][hi]+1\textit{K}[\hat{h}_{i}][h_{i}]\leftarrow\textit{K}[\hat{h}_{i}][h_{i}]+1. We repeat the above steps until there is no next host pixel. Next, the dynamic data embedding is implemented on the basis of the newest table K. We process the host pixels from left to right, and from top to bottom. Specifically, for each host pixel hih_{i}, we first use the predictor to get its predicted value h^i\hat{h}_{i}. Notably, when a predicted value h^i\hat{h}_{i} is obtained, we update the corresponding entry by K[h^i][hi]K[h^i][hi]1\textit{K}[\hat{h}_{i}][h_{i}]\leftarrow\textit{K}[\hat{h}_{i}][h_{i}]-1. Then, we take h^i\hat{h}_{i} as the row index to look up the table K and return a one-dimensional statistic K[h^i]\textit{K}[\hat{h}_{i}] of size BB. Afterwards, the K[h^i]\textit{K}[\hat{h}_{i}] is used as the frequency distribution to calculate the host statistics CXC_{X} and PCXP_{CX} in Line 77 of Algorithm 4. Then, we can embed the message via Lines 881919 in Algorithm 4. We repeat the above steps until there is no next host pixel, and finally we can get a stego image. In the corresponding dynamic data extraction, we first initialize a two-dimensional lookup table K to all ones, and predict the pixel h^i\hat{h}_{i} in the reverse order from right to left, and from bottom to top. Similarly, we then take h^i\hat{h}_{i} as the row index to look up the table K and return a statistic K[h^i]\textit{K}[\hat{h}_{i}] of size BB. The K[h^i]\textit{K}[\hat{h}_{i}] is used to calculate the host statistics CXC_{X} and PCXP_{CX} in Line 33 of Algorithm 5. Then, we can decode the host pixel hih_{i} via Lines 441515 in Algorithm 5. Notably, when a host pixel hih_{i} is decoded, we update the corresponding entry by K[h^i][hi]K[h^i][hi]+1\textit{K}[\hat{h}_{i}][h_{i}]\leftarrow\textit{K}[\hat{h}_{i}][h_{i}]+1. We repeat the above steps until all host pixels are decoded. Finally, the host image and the embedded message are reconstructed.

The PSNR (in dB) is often used as an objective measure of image quality, which reflects the degree of similarity between the stego image and host image. That is, the larger the value of PSNR, the better the image quality. The PSNR is calculated as

PSNR=10×log10(MAX2/MSE).\text{PSNR}=10\times\log 10(MAX^{2}/\text{MSE}).

where MAXMAX is the maximum possible pixel value of the image (e.g., 255255 for 88-bit gray-scale images), and MSE is the mean squared error between the host and stego image. Given a noise-free L1×L2L_{1}\times L_{2} host image XX and its stego image YY, MSE is defined as

MSE=1L1×L2i=0L11j=0L21(X(i,j)Y(i,j))2,\text{MSE}=\frac{1}{L_{1}\times L_{2}}\sum_{i=0}^{L_{1}-1}\sum_{j=0}^{L_{2}-1}(X(i,j)-Y(i,j))^{2},

where X(i,j)X(i,j) and Y(i,j)Y(i,j) represent the pixels at the ii-th row and the jj-th column in images XX and YY, respectively.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 8: (a). Host image Lena. (b). Stego image, R=1.83, PSNR=24.2. (c). Stego image, R=1.07, PSNR=34.4. (d). Stego image, R=0.11, PSNR=56.5.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 9: (a). Host image Airplane. (b). Stego image, R=2.65, PSNR=22.68. (c). Stego image, R= 1.46, PSNR=33.7. (d). Stego image, R=0.3, PSNR=53.0.
Refer to caption
Figure 10: Comparisons in terms of capacity-distortion performance.

Next, we give the experimental results on gray-scale images. First, Figures 89 shows the embedding rate (bpp) and PSNR (dB) of the proposed static method on two 512×512512\times 512 sized standard gray-scale images, including Lena and Airplane from the USC_SIPI image database111https://sipi.usc.edu/database/, where the embedding rate RR is defined as the number of bits embedded divided by the number of pixels in the host image. The larger RR, the higher embedding capacity. It can be observed that when the embedded capacity decreases, the PSNR increases steadily.

Second, we show the embedding performance of the proposed static/dynamic method, our previous work [1] and two high capacity RDH methods [20] and [32] on the test image Lena. We use 3232 bits to represent the state and choose parameters T=16T=16, n=16n=16 and v=1v=1. The embedding performance with respect to the embedding rate versus the PSNR curve are plotted in Figure 10. It shows that for the same embedding capacity, the proposed static method provides slightly higher PSNR values than our previous work [1]. Compared with the method [20], our static method can achieve the similar performance at the low embedding rate. However, the method [20] states that it can only produce 0.6120.612 bpp on average, while the proposed method is capable to achieve larger embedding rates. Further, a location map is required in [20] to prevent the underflow/overflow problem, which brings additional storage consumption. Compared with the method [32], our static proposal can obtain a slight PSNR gain at the low embedding rate. But, the embedding capacity of the method [32] is relatively low, e.g., for the test image Lena, it has an embedding rate of at most 0.290.29 bpp. In addition, the method [32] also requires a location map to avoid the underflow/overflow problem. For the proposed dynamic implementation, we initialize the two-dimensional lookup table K as follows: K[h^i][hi]=32\textit{K}[\hat{h}_{i}][h_{i}]=32 if h^i=hi\hat{h}_{i}=h_{i}; otherwise, we have K[h^i][hi]=1\textit{K}[\hat{h}_{i}][h_{i}]=1. As can be seen in Figure 10, the proposed dynamic method provides slightly lower PSNR values under the same embedding capacity. This observation is mainly due to the fact that we use the predicted pmf rather than the true pmf. More importantly, unlike our previous work [1], the proposed dynamic method does not require prior transmission of the host pmf during both embedding and extraction. In addition, compared with the existing methods [20], [32], our dynamic proposal also allows for larger embedding rates to achieve high capacity requirements.

VI Conclusion

In this paper, we proposed a novel RDH scheme based on our recent asymmetric numeral systems (ANS) variant [26]. To the best of our knowledge, this paper is the first that employs ANS coding for RDH technique. Unlike our previous work [1], the proposed method can completely avoid the computing precision problem. In addition, a dynamic implementation is also adaptively introduced to save the additional overhead of explicitly transmitting the host pmf during embedding and extraction. The experimental results show that the proposed static method provides slightly higher PSNR values than our previous work [1] and larger embedding rates than some state-of-the-art methods [20], [32] on gray-scale images. Besides, the proposed dynamic method saves additional overhead at the cost of a small image quality loss.

References

  • [1] S.-J. Lin and W.-H. Chung, “The scalar scheme for reversible information-embedding in gray-scale signals: Capacity evaluation and code constructions,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 4, pp. 1155–1167, 2012.
  • [2] G. Gao, S. Tong, Z. Xia, B. Wu, L. Xu, and Z. Zhao, “Reversible data hiding with automatic contrast enhancement for medical images,” Signal Processing, vol. 178, pp. 1–14, 2021.
  • [3] Z. Ni, Y. Q. Shi, N. Ansari, W. Su, Q. Sun, and X. Lin, “Robust lossless image data hiding designed for semi-fragile image authentication,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 4, pp. 497–509, 2008.
  • [4] X. Li, B. Yang, and T. Zeng, “Efficient reversible watermarking based on adaptive prediction-error expansion and pixel selection,” IEEE Transactions on Image Processing, vol. 20, no. 12, pp. 3524–3533, 2011.
  • [5] J. Tian, “Reversible data embedding using a difference expansion,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 8, pp. 890–896, 2003.
  • [6] Y. Hu, H.-K. Lee, and J. Li, “De-based reversible data hiding with improved overflow location map,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 2, pp. 250–260, 2009.
  • [7] W. He and Z. Cai, “Reversible data hiding based on dual pairwise prediction-error expansion,” IEEE Transactions on Image Processing, vol. 30, pp. 5045–5055, 2021.
  • [8] S.-k. Lee, Y.-h. Suh, and Y.-s. Ho, “Reversible image authentication based on watermarking,” 2006 IEEE International Conference on Multimedia and Expo, pp. 1321–1324, 2006.
  • [9] J. Wang, J. Ni, X. Zhang, and Y.-Q. Shi, “Rate and distortion optimization for reversible data hiding using multiple histogram shifting,” IEEE Transactions on Cybernetics, vol. 47, no. 2, pp. 315–326, 2017.
  • [10] X. Li, B. Li, B. Yang, and T. Zeng, “General framework to histogram-shifting-based reversible data hiding,” IEEE Transactions on Image Processing, vol. 22, no. 6, pp. 2181–2191, 2013.
  • [11] J. Wang, X. Chen, J. Ni, N. Mao, and Y. Shi, “Multiple histograms-based reversible data hiding: Framework and realization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 8, pp. 2313–2328, 2020.
  • [12] M. Celik, G. Sharma, A. Tekalp, and E. Saber, “Lossless generalized-lsb data embedding,” IEEE Transactions on Image Processing, vol. 14, no. 2, pp. 253–266, 2005.
  • [13] W. Zhang, X. Hu, X. Li, and N. Yu, “Recursive histogram modification: Establishing equivalency between reversible data hiding and lossless data compression,” IEEE Transactions on Image Processing, vol. 22, no. 7, pp. 2775–2785, 2013.
  • [14] C.-C. Lin, X.-L. Liu, W.-L. Tai, and S.-M. Yuan, “A novel reversible data hiding scheme based on ambtc compression technique,” Multimedia Tools and Applications, vol. 74, pp. 3823–3842, 2015.
  • [15] W. Qi, S. Guo, and W. Hu, “Generic reversible visible watermarking via regularized graph fourier transform coding,” IEEE Transactions on Image Processing, vol. 31, pp. 691–705, 2022.
  • [16] A. Alattar, “Reversible watermark using the difference expansion of a generalized integer transform,” IEEE Transactions on Image Processing, vol. 13, no. 8, pp. 1147–1156, 2004.
  • [17] Z. Ni, Y.-Q. Shi, N. Ansari, and W. Su, “Reversible data hiding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no. 3, pp. 354–362, 2006.
  • [18] M. Fallahpour, “High capacity lossless data hiding based on histogram modification,” IEICE Electronics Express, vol. 4, no. 7, pp. 205–210, 2007.
  • [19] W.-L. Tai, C.-M. Yeh, and C.-C. Chang, “Reversible data hiding based on histogram modification of pixel differences,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 6, pp. 906–910, 2009.
  • [20] B. Ou and Y. Zhao, “High capacity reversible data hiding based on multiple histograms modification,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 8, pp. 2329–2342, 2020.
  • [21] V. Manikandan, K. S. R. Murthy, B. Siddineni, N. Victor, P. K. R. Maddikunta, and S. Hakak, “A high-capacity reversible data-hiding scheme for medical image transmission using modified elias gamma encoding,” Electronics, vol. 11, no. 19, pp. 3101–3121, 2022.
  • [22] T. Kalker and F. Willems, “Capacity bounds and constructions for reversible data-hiding,” 2002 14th International Conference on Digital Signal Processing Proceedings, vol. 1, pp. 71–76, 2002.
  • [23] F. Willems, D. Maas, and T. Kalker, “Semantic lossless source coding,” 42nd Annual Allerton Conference on Communication, Control and Computing, 2004.
  • [24] J. J. Rissanen, “Generalized kraft inequality and arithmetic coding,” IBM Journal of Research and Development, vol. 20, no. 3, pp. 198–203, 1976.
  • [25] X. Hu, W. Zhang, X. Hu, N. Yu, X. Zhao, and F. Li, “Fast estimation of optimal marked-signal distribution for reversible data hiding,” IEEE Transactions on Information Forensics and Security, vol. 8, no. 5, pp. 779–788, 2013.
  • [26] N. Wang, W. Yan, S.-J. Lin, and Y. Huang, “An entropy coding based on binary encoding for mixed-radix digits,” 2022 Data Compression Conference (DCC), pp. 488–488, 2022.
  • [27] J. Duda, “Asymmetric numeral systems,” arXiv preprint arXiv:0902.0271, 2009.
  • [28] J. Duda, K. Tahboub, N. J. Gadgil, and E. J. Delp, “The use of asymmetric numeral systems as an accurate replacement for huffman coding,” 2015 Picture Coding Symposium (PCS), pp. 65–69, 2015.
  • [29] S. Camtepe, J. Duda, A. Mahboubi, P. Morawiecki, S. Nepal, M. Pawłowski, and J. Pieprzyk, “Compcrypt–lightweight ans-based compression and encryption,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 3859–3873, 2021.
  • [30] D. Dubé and H. Yokoo, “Fast construction of almost optimal symbol distributions for asymmetric numeral systems,” 2019 IEEE International Symposium on Information Theory (ISIT), pp. 1682–1686, 2019.
  • [31] D. A. Huffman, “A method for the construction of minimum-redundancy codes,” Proceedings of the IRE, vol. 40, no. 9, pp. 1098–1101, 1952.
  • [32] Y. Jia, Z. Yin, X. Zhang, and Y. Luo, “Reversible data hiding based on reducing invalid shifting of pixels in histogram shifting,” Signal Processing, vol. 163, pp. 238–246, 2019.