Windowed total variation denoising and noise variance monitoring

Zhanhao Liu Université de Lorraine, CNRS, Inria, IECL, F-54000 Nancy, France Saint-Gobain Research Paris, 39 Quai Lucien Lefranc, F-93300 Aubervilliers, France Marion Perrodin Saint-Gobain Research Paris, 39 Quai Lucien Lefranc, F-93300 Aubervilliers, France Thomas Chambrion Institut de Mathématiques de Bourgogne, UMR 5584, CNRS, UBFC, F-21000 Dijon, France Radu S.Stoica Université de Lorraine, CNRS, Inria, IECL, F-54000 Nancy, France

Abstract

We proposed a real time Total-Variation denosing method with an automatic choice of hyper-parameter $\lambda$ , and the good performance of this method provides a large application field. In this article, we adapt the developed method to the non stationary signal in using the sliding window, and propose a noise variance monitoring method. The simulated results show that our proposition follows well the variation of noise variance.

I Introduction

The signal $y=(y_{1},\cdots,y_{n})\in\mathbb{R}^{n}$ collected by the sensor can be modeled as $y=u+\epsilon$ : a random noise $\epsilon$ is added into the useful physical quantity $u$ with $\mathbb{E}(\epsilon)=0$ and $\mathbb{V}(\epsilon)=\sigma^{2}$ .

We aim to recover the unknown vector $u=(u_{1},\cdots,u_{n})\in\mathbb{R}^{n}$ from the noisy sample vector $y=(y_{1},\cdots,y_{n})$ with $y_{i}$ the sample at time $t_{i}$ by minimizing the Total Variation (TV) restoration functional:

\displaystyle F(u,y,\tau,\lambda)=\sum_{i=1}^{n}\tau_{i}(y_{i}-u_{i})^{2}+\lambda\sum_{i=1}^{n-1}|u_{i}-u_{i-1}|

(1)

with the sampling period vector $\tau=(\tau_{1},\cdots,\tau_{n})$ where $\tau_{i}=t_{i}-t_{i-1}$ for $i=2,...,n$ and $\tau_{1}=\tau_{2}$ . The restored signal is given by:

u^{*}(\lambda)=(u^{*}_{1}(\lambda),\cdots,u^{*}_{n}(\lambda))=\arg\min F(u,y,\tau,\lambda)

(2)

Total Variation based restoration method is first proposed in [1]. The authors in [2] show an one-to-one correspondence between the noise’s variance $\sigma^{2}$ and $\lambda$ under the hypothesis of constant $\sigma^{2}$ .

Motivated by the local influence of new sample to the actual restoration, we propose a new online TV-restoration algorithm with an automatic choice of $\lambda$ for a stationary 1D signal in [3] following the work of [4] and [5]. The simulations show that our proposition of $\lambda$ has a similar performance as the existing methods (SURE [6] and cross-validation). Further more, our method is appropriate for real time applications, especially for monitoring a huge amount of sensors in the plants.

In this paper, we adapt our online TV restoration to non-stationary signals and propose a noise variance monitoring method. The key idea is to use our method on sliding windows, assuming a local-stationary hypothesis for the signal. This hypothesis is strong but commonly used already (e.g. Fourier analysis, wavelet transforms). In order to apply our method to non-stationary signals, two problems are to be solved:

•

Adaptation of the online method to the sliding window
•

Choice of the length of window (noted $m$ )

The first point ensures the efficiency of the real time estimation, while the second point guarantees the good performance of method. Indeed, the choice of $m$ needs to get a compromise between the local stationarity assumption and the performance of the restoration : the local influence of new sample vanishes inside a large window, but the local stationarity may not be respected. In this article, we present how we deal with the first point by supposing the influence of new sample stays inside the actual window. The choice of window size is the immediate perspective of this work.

The increasing of the ground noise is one of the index for the failures of the sensors or the production lines. This motivates us to build a real time application for detecting the variance behaviour by tracking the windowed restoration residuals $y-u^{*}(\lambda_{ours})$ based on the automatic determination of hyper-parameter $\lambda_{ours}$ .

Here some approaches for the estimation of noise variance $\sigma^{2}$ are listed:

•

Most of the existing methods are based on the separation of signal and noise by digital filters [7] or the restoration methods (i.e. wavelet transform) [8] [9] [10]. See more details in [11].
•

Median absolute deviation (MAD) proposed by [12] is a heuristic method largely used in many applications, especially for the choice of threshold for wavelet restoration. $\sigma$ can be estimated by the median absolute deviation of the finest scale of wavelet coefficients $By$ of signal.

With respect to the previous cited method, our approach, based on the separation of signal and noise, proposes a procedure for simultaneously signal denoising and variance tracking, while being implementable as an online application. Thanks to its low time and space complexity, our proposition is well fitted to the cases with limited computation resource.

The paper continues as it follows: in Section II we will at first present our TV-based denosing method with an automatic choice of parameter. Then, in Section III we adapt our online implementation to the sliding window. After that, in Section IV, we will present an application of our algorithms: noise variance monitoring, and compare with an existing method. Finally, conclusions and perspectives are depicted.

II Automatic total variation denoising method

In this section, we will present the automatic TV-denoising method proposed in [3]. We aim to recover the unknown vector $u$ from the noisy samples $y$ by minimizing (1). The restored signal is given by (2).

Since we work on a finite sample of signal, the solution $u^{*}(\lambda)$ can be seen as piece-wise constant. We use the segment representation [5] for the constant pieces: a set of index $\{j,j+1,\cdots,k\}$ of consecutive points whose restored value $u^{*}_{j}(\lambda)=\cdots=u^{*}_{k}(\lambda)$ is called a segment if it can not be enlarged, which means if $u^{*}_{j-1}(\lambda)\neq u^{*}_{j}(\lambda)$ (or $j=1$ ) and $u^{*}_{k}(\lambda)\neq u^{*}_{k+1}(\lambda)$ (or $k=n$ ). The segments number of $u^{*}(\lambda)$ is noted as $K(\lambda)$ . The following notations are introduced for the $j^{th}$ segment:

•

Index set $\mathcal{N}_{j}(\lambda)=\{i^{j}_{1},\cdots,i^{j}_{n_{j}}\}$ with $n_{j}(\lambda)=\text{Card}(\mathcal{N}_{j}(\lambda))$ , containing the point inside the $j^{th}$ segment.
•

Segment level $v^{*}_{j}(\lambda)=u^{*}_{i}(\lambda),\forall i\in\mathcal{N}_{j}(\lambda)$

An equivalent representation of $u^{*}(\lambda)$ is provided by the couple $\{v^{*}(\lambda),\mathcal{N}(\lambda)\}$ with the set of segment levels $v^{*}(\lambda)=(v^{*}_{1}(\lambda),\cdots,v^{*}_{K}(\lambda))$ and the cutting set $\mathcal{N}(\lambda)=\{\mathcal{N}_{1}(\lambda),\cdots,\mathcal{N}_{K}(\lambda)\}$ . By knowing the cutting set $\mathcal{N}(\lambda)$ , let $s(v)=\{s_{0}(v),s_{1}(v),\cdots,s_{K}(v)\}$ with $s_{i}(v)=\text{sign}(v_{i+1}-v_{i})$ for $i=1,\cdots,K-1$ , $s_{i}(v)=0$ for $i\in\{0,K\}$ and $s^{*}=s(v^{*}(\lambda))$ , the level of $j^{th}$ segment is given by:

v^{*}_{j}(\lambda)=\overline{y}^{*}_{j}+\frac{\lambda}{2\mathcal{T}_{j}}(s^{*}_{j}-s^{*}_{j-1})

(3)

with $j=1,\cdots,K(\lambda)$ , the segment length $\mathcal{T}_{j}=\sum_{i\in\mathcal{N}_{j}(\lambda)}\tau_{i}$ and the mean value inside $j^{th}$ segment $\overline{y}^{*}_{j}=\frac{\sum_{i\in\mathcal{N}_{j}(\lambda)}\tau_{i}y_{i}}{\mathcal{T}_{j}}$ .

A different value of $\lambda$ may provide a solution with distinct cutting set: $\lambda=0$ gives $u^{*}=y$ , while $\lambda=\infty$ implies a parsimonious solution $u^{*}=\text{mean}(y)$ . The authors in [4] and [5] show there exists a sequence $\Lambda=(\lambda_{1},\cdots)$ such that for every $\lambda\in\Lambda$ , two segments are merged together by moving $\lambda-\eta$ to $\lambda$ with $\eta\to 0$ and $\eta>0$ . In [3], we propose a rapid algorithm to estimate the dynamic of the restoration in function of $\lambda$ , and the dynamic is saved in $\Lambda^{\circ}=(\lambda^{\circ}_{1},\lambda^{\circ}_{2},\cdots,\lambda^{\circ}_{n-1})$ with $\lambda^{\circ}_{i}$ the value of $\lambda$ for which the points $i$ and $i+1$ are merged into the same segment. $\Lambda^{\circ}$ allows the computation of the cutting set and also the restoration (3) for every $\lambda$ .

Based on the variation of the extremums number (noted $g(\lambda)$ ) of the restoration $u^{*}(\lambda)$ in function of $\lambda$ , an adaptive choice of hyper-parameter $\lambda$ is proposed. The simulations show that our estimation $\lambda_{ours}$ has a similar performance as the state of the art, all near the optimal choice $\lambda_{op}=\arg\min|u^{*}(\lambda)-u_{net}|^{2}$ with $u_{net}$ the original signal. The variation $\Delta g(\lambda)=g(\lambda-\eta)-g(\lambda)$ for every $\lambda\in\Lambda$ with $\eta\to 0$ and $\eta>0$ can be estimated simultaneously as the estimation of $\Lambda^{\circ}$ .

Besides, we analysed the local influence for $\Lambda^{\circ}$ by introducing a new sample at the end of the sequence: only a small part of $\Lambda^{\circ}$ will be changed. Based on this local property, we proposed an online algorithm (c.f. Algorithm 2 in [3]) for updating the changed part of $\Lambda^{\circ}$ , which is more efficient than the offline estimation. The locality of the TV-denoising method motives the adaption of sliding windows in order to deal with the non-stationary signal by supposing the local stationarity inside the window.

III Adaptation to sliding windows

In this section, we will present some theoretical elements about the local behaviour of the restoration and adapt the online algorithm to the sliding window.

Let’s consider two successive sliding windows $w_{i}^{m}=[i,i+m-1]$ and $w_{i+1}^{m}=[i+1,i+m]$ . $w_{i+1}^{m}$ is indeed $w_{i}^{m}$ in which we take off the first point $(y_{i},\tau_{i})$ and add a new point $(y_{i+m},\tau_{i+m})$ . For the sake of simplicity, $F(u_{\{i,\cdots,i+m-1\}},y_{\{i,\cdots,i+m-1\}},\tau_{\{i,\cdots,i+m-1\}},\lambda)$ is noted $F^{m}_{i}(\lambda)$ for the window $w^{m}_{i}$ .

We note $u^{*}=(u^{*}_{1},\cdots,u^{*}_{m})=\{v^{*},\mathcal{N}^{*}\}=\arg\min F^{m}_{i}$ the restoration of the window $w^{m}_{i}$ with a given $\lambda$ and $\hat{u}=(\hat{u}_{1},\cdots,\hat{u}_{m})=\{\hat{v},\hat{\mathcal{N}}\}=\arg\min F^{m}_{i+1}$ for $w^{m}_{i+1}$ . For the simplify of the presentation, the elements concerning about $w^{m}_{i}$ will be noted with the symbol $*$ , and that for $w^{m}_{i+1}$ will be noted with the hat symbol. For example, with a given $\lambda$ , the segment length set of $u^{*}(\lambda)$ is $\mathcal{T}^{*}=\{\mathcal{T}^{*}_{1},\cdots,\mathcal{T}^{*}_{\hat{K}}\}$ with $\mathcal{T}^{*}_{j}=\sum_{i\in\mathcal{N}^{*}_{j}}\tau_{i}$ , and that of $\hat{u}(\lambda)$ is $\hat{\mathcal{T}}=\{\hat{\mathcal{T}}_{1},\cdots,\hat{\mathcal{T}}_{\hat{K}}\}$ with $\hat{\mathcal{T}}_{j}=\sum_{i\in\hat{\mathcal{N}}_{j}}\tau_{i}$ .

III-A Influence of slide movement

The update of the restored signal and $\Lambda^{\circ}$ need to consider the sliding of index from $w^{m}_{i}$ to $w^{m}_{i+1}$ : a restored signal point is unchanged under the influence of slide movement means $\hat{u}_{i-1}=u^{*}_{i}$ . In [3], we have shown the following theorem:

Theorem 1.

If there exists an index $j\in\{2,\cdots,K^{*}(\lambda)\}$ such that $\text{sign}(v^{*}_{j-1}(\lambda)-v^{*}_{j}(\lambda))=\text{sign}(v^{*}_{K^{*}(\lambda)}(\lambda)-y_{i+m})$ , then the new restoration $\hat{u}$ satisfies $\hat{u}_{i-1}(\lambda)=u^{*}_{i}(\lambda)$ for all $i<i^{j,*}_{1}$ .

The diffusion from the “new” sample $y_{i+m}$ changes only the last part of the restoration of $w^{i}_{m}$ up to the junction of two segments whose sign of the variation $v^{*}_{j-1}(\lambda)-v^{*}_{j}(\lambda)$ corresponds to that of $v^{*}_{K^{*}}(\lambda)-y_{n+1}$ . We can establish a similar theorem (c.f. Theorem 2) about the influence of the first sample $y_{i}$ for the sliding window $w^{m}_{i}$ : the removing of $y_{i}$ changes only the first part of the restoration.

Theorem 2.

If there exists an index $j\in\{2,\cdots,K^{*}(\lambda)\}$ such that $\text{sign}(v^{*}_{j-1}(\lambda)-v^{*}_{j}(\lambda))=\text{sign}(v^{*}_{1}(\lambda)-y_{i})$ , then the new restoration $\hat{u}$ satisfies $\hat{u}_{i-1}(\lambda)=u^{*}_{i}(\lambda)$ for all $i\geq i^{j+1,*}_{1}$ .

To sum up, for updating $u^{*}(\lambda)$ to $\hat{u}(\lambda)$ , only the first part and the last part of $w^{m}_{i}$ are changed. We introduce the following definitions:

Definition 1.

With $l=i^{j}_{1}$ and $p=i^{k+1}_{1}-1$ where $j$ is the last segment which satisfies $\text{sign}(v^{*}_{j-1}(\lambda)-v^{*}_{j}(\lambda))=\text{sign}(v^{*}_{K^{*}}(\lambda)-y_{i+m})$ and $k$ is the first segment which satisfies $\text{sign}(v^{*}_{k-1}(\lambda)-v^{*}_{k}(\lambda))=\text{sign}(v^{*}_{1}(\lambda)-y_{i})$ ,

•

$(u^{*}_{l},\cdots,u^{*}_{m})$ is called non right-isolated sequence.
•

$(u^{*}_{2},\cdots,u^{*}_{p})$ is called non left-isolated sequence.
•

$(u^{*}_{p+1},\cdots,u^{*}_{l-1})$ is called isolated sequence.

III-B Independence between segments

For a given $\lambda$ (noted $\hat{\lambda}$ ), the cutting set $\mathcal{N}(\hat{\lambda})$ is given by $\{\lambda^{\circ}>\hat{\lambda}\}$ . In [3], we showed the points inside a segment of $u^{*}(\hat{\lambda})$ are merged for $\lambda\leq\hat{\lambda}$ , and the value of $\lambda$ provoking the merge is independent to the points outside the segment. By introducing the virtual segment (c.f. Definition 2), the estimation of $\Lambda^{\circ}$ can be broken into some sub-problems for each virtual segment, shown in proposition 1. Combining with Theorem 1 and 2, the elements of $\{\lambda^{\circ}\leq\hat{\lambda}\}$ inside the isolated sequence are not influenced by the sliding movement.

Definition 2.

Let $\hat{\lambda}$ and $\epsilon_{\lambda}>0$ , we have $v^{*}(\hat{\lambda})=\{v_{1}^{*}(\hat{\lambda}),\cdots,v_{l}^{*}(\hat{\lambda})\}$ , $\mathcal{N}^{*}(\hat{\lambda})=\{\mathcal{N}_{1},\cdots,\mathcal{N}_{l}\}$ , $l=K(\hat{\lambda})$ and $s_{i}=\text{sign}(v_{i+1}^{*}(\hat{\lambda})-v_{i}^{*}(\hat{\lambda}))$ . For each segment $\{y_{\mathcal{N}_{j}},\tau_{\mathcal{N}_{j}}\}$ with $j=1,\cdots,l$ , let $c_{i}=\frac{\hat{\lambda}+\epsilon_{\lambda}}{2s_{i}}$ , we introduce the virtual segment $\{y^{+}_{\mathcal{N}_{j}},\tau^{+}_{\mathcal{N}_{j}}\}$ where:

•

$y^{+}_{\mathcal{N}_{1}}=\{y_{\mathcal{N}_{1}},v_{1}^{*}(\hat{\lambda})+c_{1}\}$ , $y^{+}_{\mathcal{N}_{i}}=\{v_{i}^{*}(\hat{\lambda})-c_{i-1},y_{\mathcal{N}_{i}},v_{i}^{*}(\hat{\lambda})+c_{i}\}$ for $i=2,...,l-1$ and $y^{+}_{\mathcal{N}_{l}}=\{v_{l}^{*}(\hat{\lambda})-c_{i-1},y_{\mathcal{N}_{l}}\}$ .
•

$\tau^{+}_{\mathcal{N}_{1}}=\{\tau_{\mathcal{N}_{1}},1\}$ , $\tau^{+}_{\mathcal{N}_{i}}=\{1,\tau_{\mathcal{N}_{i}},1\}$ for $i=2,...,l-1$ and $\tau^{+}_{\mathcal{N}_{l}}=\{1,\tau_{\mathcal{N}_{l}}\}$ .

Proposition 1.

Let $\Lambda^{\circ}$ the estimation with all the samples $\{y_{i},t_{i}\}_{1,\cdots,n}$ , $\Lambda^{\circ}_{i}=\{\lambda_{i,1}^{\circ},\cdots\}$ the estimation with $i^{th}$ virtual segment $(y^{+}_{\mathcal{N}_{i}},\tau_{\mathcal{N}_{i}}^{+})$ , $\Lambda^{*}_{1}=\{\lambda_{i,1}^{\circ},\cdots,\lambda_{i,n_{i}-1}^{\circ}\}$ and $\Lambda^{*}_{i}=\{\lambda_{i,2}^{\circ},\cdots,\lambda_{i,n_{i}}^{\circ}\}$ for $i=2,...,l$ , we have $\cup_{i=1,\cdots,l}\Lambda^{*}_{i}=\{\lambda|\lambda\leq\hat{\lambda}\cap\lambda\in\Lambda^{\circ}\}$ .

III-C Proposition of algorithms

Let $\Delta g^{\circ}=(\Delta g(\lambda^{\circ}_{1}),\Delta g(\lambda^{\circ}_{2}),\cdots,\Delta g(\lambda^{\circ}_{n-1}))$ , we note $\Lambda^{\circ,i}$ and $\Delta g^{\circ,i}$ , two vectors of size $m-1$ , the result of $w^{m}_{i}$ . The results after sliding ( $\Lambda^{\circ,i+1}$ and $\Delta g^{\circ,i+1}$ ) based on $w^{m}_{i+1}$ can be obtained by updating $\Lambda^{\circ,i}$ and $\Delta g^{\circ,i}$ . In this section, we will propose an adaptation of the online algorithm proposed in [3] to the sliding windows.

We will only talk about the online estimation of $\Lambda^{\circ,i+1}$ from $\Lambda^{\circ,i}$ in detail. By following, we note an application of Algorithm 1 in [3] to a given sequence of $\{y\}$ and $\{\tau\}$ as $\Lambda^{\circ}=\text{DP-TV}(\{y\},\{\tau\})$ .

After choosing $\hat{\lambda}\geq 0$ , called the cutting point, the restoration for $w^{m}_{i}$ is $u^{*}(\hat{\lambda})$ . We note $p$ and $j^{*}_{p}$ respectively the last point and the last segment of the non left-isolated sequence of $u^{*}(\hat{\lambda})$ , $l$ and $j^{*}_{l}$ respectively the first point and the first segment of the non right-isolated sequence of $u^{*}(\hat{\lambda})$ . $\Lambda^{\circ,i+1}$ can be splitted into four parts: (1) Some elements of $\{\lambda^{\circ,i+1}>\hat{\lambda}\}$ are changed following Proposition 1; (2) The non left-isolated sequence $\{\lambda^{\circ,i}_{j}\leq\hat{\lambda}\}$ for $j\leq p$ is influenced by the removal of the first point $y_{i}$ ; (3) The non right-isolated sequence $\{\lambda^{\circ,i}_{j}\leq\hat{\lambda}\}$ for $j\geq l$ is influenced by the new point $y_{i+m}$ ; (4) All $\lambda^{\circ,i}_{j}\leq\hat{\lambda}$ remains the same for $p<j<l$ .

We treat at first the unchanged part of $\Lambda^{\circ,i+1}$ : due to the sliding, we have $\lambda^{\circ,i+1}_{\{\lambda^{\circ,i+1}_{p,\cdots,l-2}\leq\hat{\lambda}\}}=\lambda^{\circ,i}_{\{\lambda^{\circ,n}_{p+1,\cdots,l-1}\leq\hat{\lambda}\}}$ .

For the non right-isolated sequence, let $\epsilon_{\lambda}>0$ , $\Lambda^{a}=\lambda^{\circ,i+1}_{\{\lambda^{\circ,i+1}_{l-1,\cdots,m}\leq\hat{\lambda}\}}$ can be estimated by $\text{DP-TV}(y^{+}_{a},\tau^{+}_{a})$ with the virtual segment:

•

$y^{+}_{a}=\{v^{*}_{j^{*}_{l}}(\hat{\lambda})-\frac{\hat{\lambda}+\epsilon_{\lambda}}{2\text{sign}(v_{j^{*}_{l}}-v_{j^{*}_{l}-1})},y_{\{l+i-1,\cdots,m+i\}}\}$
•

$\tau^{+}_{a}=\{1,\tau_{\{l+i-1,\cdots,m+i\}}\}$

For the non left-isolated sequence, $\Lambda^{b}=\lambda^{\circ,i+1}_{\{\lambda^{\circ,i+1}_{1,\cdots,p-1}\leq\hat{\lambda}\}}$ can be estimated by $\text{DP-TV}(y^{+}_{b},\tau^{+}_{b})$ with the virtual segment:

•

$y^{+}_{b}=\{y_{\{i+1,\cdots,i+p-1\}},v^{*}_{j^{*}_{p}}(\hat{\lambda})+\frac{\hat{\lambda}+\epsilon_{\lambda}}{2\text{sign}(v_{j^{*}_{p}+1}-v_{j^{*}_{p}})}\}$
•

$\tau^{+}_{b}=\{\tau_{\{i+1,\cdots,i+p-1\}},1\}$

The isolated and non-isolated sequences can be assembled in $\Lambda^{c}=\{\lambda^{c}_{1},\cdots,\lambda^{c}_{m}\}$ with $\lambda^{c}_{k}=\lambda^{a}_{k-l+3}$ for $k\geq l-1$ , $\lambda^{c}_{k}=\lambda^{b}_{k}$ for $k\leq p-1$ and $\lambda^{c}_{k}=\lambda^{\circ,n}_{k-1}$ for $p<k-1<l$ .

It remains $\{\lambda^{\circ,i+1}>\hat{\lambda}\}$ . Let $d=\{\lambda^{\circ,i+1}>\hat{\lambda}\}=\{\lambda^{c}_{k}>\hat{\lambda}\}$ containing indeed all the last points of $\hat{u}(\hat{\lambda})$ ’s segments, we can get $\Lambda^{d}=\lambda^{\circ,i+1}_{d}=\text{DP-TV}(\hat{v}(\hat{\lambda}),\hat{\mathcal{T}}(\hat{\lambda}))$ .

Finally, $\Lambda^{\circ,i+1}$ can be assembled in the following way:

\lambda^{\circ,i+1}_{k}=\begin{cases}\lambda^{c}_{k},&\text{if }\lambda^{d}_{k}\leq\hat{\lambda}.\\ \lambda^{d}_{\alpha(i)}&\text{if }\lambda^{d}_{k}>\hat{\lambda}.\par\end{cases}

(4)

with $\alpha\mathrel{\mathop{\mathchar 58\relax}}\mathbb{Z}\to\mathbb{Z}$ giving the index of $i$ in the vector $d$ .

Algorithm 1 Adaptation of online implementation to sliding window

(y_{i},\cdots,y_{i+m-1}),(\tau_{i},\cdots,\tau_{i+m-1}),\ (y_{i+m},\tau_{i+m})

\Lambda^{\circ,i},\ \Delta g^{\circ,i},\ \hat{\lambda}

Find non right-isolated sequence

(l,\cdots,m)

u^{*}(\hat{\lambda})

Find non left-isolated sequence

(2,\cdots,p)

u^{*}(\hat{\lambda})

p<l-2

then

(

\Lambda^{a},\Delta g^{a})=\text{DP-TV}(y^{+}_{a},\tau^{+}_{a})

(

\Lambda^{b},\Delta g^{b})=\text{DP-TV}(y^{+}_{b},\tau^{+}_{b})

\Lambda^{\circ,i+1}=\{\Lambda^{b}_{\{1,\cdots,p-1\}},\Lambda^{\circ,i}_{\{p+1,\cdots,l-1\}},\Lambda^{a}_{\{2,\cdots,m-l+2\}}\}

\Delta g^{\circ,i+1}=\{\Delta g^{b}_{\{1,\cdots,p-1\}},\Delta g^{\circ,i}_{\{p+1,\cdots,l-1\}},

\qquad\qquad\quad\Delta g^{a}_{\{2,\cdots,m-l+2\}}\}

d=\{\lambda^{\circ,n+1}>\hat{\lambda}\}

(\lambda^{\circ,i+1}_{d},\Delta g^{\circ,i+1}_{d})=\text{DP-TV}(\hat{u}(\hat{\lambda}),\hat{\mathcal{T}}(\hat{\lambda}))

else

\triangleright

Offline approach

(\lambda^{\circ,i+1}_{d},\mathcal{T}^{\circ,i+1}_{d})=\text{DP-TV}(\{y_{i},\tau_{i}\}_{i+1,\cdots,i+m})

end if

return

\Lambda^{\circ,i+1}

and

\Delta g^{\circ,i+1}

The algorithm adapted for the sliding window is gathered in Algorithm 1. With a nice choice of $\hat{\lambda}$ , only a small part ( $\Lambda^{a},\Lambda^{b},\Lambda^{d}$ ) of the window $w^{m}_{i+1}$ needs to be updated from $w^{m}_{i}$ . The overall complexity for estimating the restoration in a window of size $m$ is in $O(m)$ .

IV Performance analysis

IV-A Application: noise variance monitoring

We propose a method to detect the shift of noise’s variance based on our restoration method. For an observed signal $y$ of size $n$ , we take a sliding window of size $m$ with $m<n$ . For each window $w_{i}^{m}=[i,i+m-1]$ , we apply our denoising algorithm with the automatic choice of $\lambda$ on the sequence $(y_{i},\cdots,y_{i+m-1})$ and $(\tau_{i},\cdots,\tau_{i+m-1})$ for estimating the restored signal $u^{*}(\lambda_{ours})=(u^{*}_{1},\cdots,u^{*}_{m})$ . The windowed restoration residual for $j=1,\cdots,m$ is given by $r_{j}=y_{j+i-1}-u^{*}_{j}(\lambda_{ours})$ . The variance of the residual inside $w_{i}^{m}$ can be estimated by:

(\sigma_{i}^{m*})^{2}=\frac{1}{m-1}\sum_{j=1}^{m}(r_{j}-\overline{r}_{i})^{2}

(5)

with the mean value of residual $\overline{r}_{i}=\frac{1}{m}\sum_{j=1}^{m}r_{j}$ .

For the simulation, the realisation of noise is $\hat{\epsilon}_{j}=y_{j}-u_{net,j}$ with $u_{net}$ the original signal and $j=1,\cdots,n$ . The estimation of noise variance inside $w_{i}^{m}$ is given by $(\hat{\sigma}_{i}^{m})^{2}=\frac{1}{m-1}\sum_{j=i}^{i+m-1}(\hat{\epsilon}_{j}-\overline{\hat{\epsilon}}_{i})^{2}$ with $\overline{\hat{\epsilon}}_{i}=\frac{1}{m}\sum_{j=i}^{i+m-1}\hat{\epsilon}_{j}$ .

Since $u^{*}(\lambda_{ours})$ is a good restoration of $y$ , $u^{*}(\lambda_{ours})$ is similar to $u_{net}$ , which means that $\sigma_{i}^{m*}$ is close to $\hat{\sigma}_{i}^{m}$ inside each window $w^{m}_{i}$ . The noise variance $\sigma_{i}^{m*}$ is not available for the real data, so we propose to detect the variance shift in using the residual standard deviation vector $\sigma^{m*}=(\sigma^{m*}_{1},\cdots,\sigma^{m*}_{n-m+1})$ with $\sigma^{m*}_{i}$ obtained by (5) inside $w^{m}_{i}$ .

For estimating the noise variance in a window of size $m$ , the complexity is in $O(m)$ with the online implementation of sliding windows, and the overall complexity for monitoring the variance of a signal of size $n$ is in $O((n-m)m)$ .

IV-B Results

For the variance monitoring, the method needs to estimate accurately the noise variance in the ideal case or capture the variation of noise variance in a more realistic case. We use the following criteria to evaluate the performance of variance monitoring:

•

Average bias: $\overline{\text{bias}}=\frac{1}{n-m+1}\sum_{i=1}^{n-m+1}(\hat{\sigma}^{m}_{i}-\sigma^{m*}_{i})$ .

•

Ratio of $\hat{\sigma}^{m}$ variation explained:

\text{RVE}=1-\frac{\sum_{i=1}^{n-m+1}(\hat{\sigma}^{m}_{i}-\overline{\text{bias}}-\sigma^{m*}_{i})^{2}}{\sum_{i=1}^{n-m+1}(\hat{\sigma}^{m}_{i}-\overline{\hat{\sigma}^{m}})^{2}}

(6)

where $\overline{\hat{\sigma}^{m}}=\frac{1}{n-m+1}\sum_{i=1}^{n-m+1}(\hat{\sigma}^{m}_{i})$ . It is indeed the $R^{2}$ score between $\sigma^{m*}$ and $\hat{\sigma}^{m}$ after adjusting those two items to the same mean value. The range of RVE is between $0$ and $1$ , and $\text{RVE}=1$ indicates our estimation explains perfectly the variation of $\sigma^{m*}$ .

We compare our method with Median Absolute Deviation (MAD) following the proposition of [12] and [13] $\sigma=1.4826\ \text{median}(By-\text{median}(By))$ with $By=\sqrt{2}(y_{2}-y_{1},\cdots,y_{n}-y_{n-1})$ .

At first, we will fix the window length $m=400$ . Our proposition requires a parameter $q$ which stands for the length of the approximation of derivative, see p13 in [3] for more details. We apply our method to a simulated piece-wise constant signal with the parameter $\log_{10}(q)=1$ . An example of $y=u_{net}+\epsilon$ with $\epsilon\sim\mathcal{N}(0,\sigma_{a}^{2}(t))$ and the noise variance estimations for 400-point windows $w^{400}_{i}$ and $i=1,\cdots,1601$ are shown in Figure 1. Both MAD and our method propose an estimation similar to the variance of the noise realisation for every sliding windows. In this example, our method underestimates the variance, but the variation of our estimation inside each sliding window follows well that of $\hat{\sigma}^{400}$ . The estimation of MAD propose a precise estimation, but does not follow the variation of $\hat{\sigma}^{400}$ . The lengths of non-right-isolated sequence of each window (shown in Figure 1) are all smaller than the window size $m=400$ , which means the influence of new sample remains inside each window and validates our assumption about the local influence.

We tested the performance of MAD and our approach under different noise hypothesis:

1.

$\mathcal{N}(0,\sigma_{a}^{2}(t))$ with $\sigma_{a}(t)=1+0.0005t$ .
2.

$\mathcal{N}(0,\sigma_{b}^{2}(t))$ with $\sigma_{b}(t)=1$ for $t\leq 1000$ and $\sigma_{b}(t)=1+0.001t$ for $t>1000$ .
3.

$\mathcal{U}(-d(t),d(t))$ with $d(t)=1+0.0005t$ .
4.

$\mathcal{N}(0,\sigma_{a}^{2}(t))+\mathcal{U}(-1,1)$ .

We have done 100 simulations for each type of noise, and the results are gathered in Figure 2. For the most of simulations, our method has $\text{RVE}>0.95$ and over-performs MAD. The high RVE score indicates the variation of our proposition fits well that of the real values, which allows us to monitor the variation of noise variance for the real signal collected from plants. However, the result of $\overline{\text{Bias}}$ shows our method always underestimates the noise variance, and the bias depends on the type of noise, which means we can not estimate an universal offset for removing this bias in a general case.

The windows length $m$ plays an important role in the restoration and the variance monitoring, and needs to be chosen carefully. A short window can capture the local variation of the noise variance, but the new point (i.e $(i+m)^{th}$ point) may still have a strong influence for the pass (i.e. $i-1,i-2,\cdots$ ), which limits the restoration performance for the pass points. We test $m=200,400,600$ with the noise $\epsilon\sim\mathcal{N}(0,\sigma^{2}_{a}(t))$ , and the results are shown in Figure 3. A longer window provides a better performance of monitoring. But the local variation of $\sigma^{2}$ will be neglected for a long window, and the stationarity hypothesis inside the window is not validated anymore. The choice of $m$ is out of the scope of this article, and it remains an open question.

To sum up, our method can not provide a precise estimation of the noise’s variance values, but our estimation follows well that of the real value with an estimation error depending on the type of noise and the shape of original signal.

One of the applications of our method is to detect the shift of noise variance. The residual variance may stay stable (i.e $\sigma^{m*}_{i}=\sigma$ ) during the correct functional period, and the noise variance shift can be detected by a prefixed threshold over $\sigma^{m*}_{i}$ comparing to the stable regime: e.g $\sigma^{m*}_{i}>1.2\sigma$ for some successive sliding windows $w^{m}_{i}$ .

V Conclusion

In this article, we adapt the TV denoising method proposed in [3] to sliding windows based on the local property of the TV-denoising signal. We believe our adaptive choice of $\lambda$ works also for the non-stationary signal. We applied the adaption to some signals with time varying noise variance. The simulated results shows that the variance of the restoration residuals follows well that of the noise, and this method can be used to monitor the ground noise variance in real time with a limited computation resource. However, the performance of these methods (both restoration and variance monitoring) depends on the choice of window length $m$ which is our on-going research interest. The application to the real data is also one of on-going works.

Acknowledgment

This work has been supported by the EIPHI Graduate School (ANR-17-EURE-0002).

References

[1] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D: nonlinear phenomena, 1992.
[2] A. Chambolle and P.-L. Lions, “Image recovery via total variation minimization and related problems,” Numerische Mathematik, 1997.
[3] Z. Liu, M. Perrodin, T. Chambrion, and R. Stoica, “Revisit 1d total variation restoration problem with new real-time algorithms for signal and hyper-parameter estimations,” arXiv preprint arXiv:2012.09481, 2020.
[4] R. J. Tibshirani and J. Taylor, “The solution path of the generalized lasso,” The Annals of Statistics, 2011.
[5] I. Pollak, A. S. Willsky, and Y. Huang, “Nonlinear evolution equations as fast and exact solvers of estimation problems,” IEEE Trans. on Signal Processing, 2005.
[6] C. M. Stein, “Estimation of the mean of a multivariate normal distribution,” The annals of Statistics, 1981.
[7] D.-H. Shin, R.-H. Park, S. Yang, and J.-H. Jung, “Block-based noise estimation using adaptive gaussian filtering,” IEEE Transactions on Consumer Electronics, 2005.
[8] S. Beheshti and M. A. Dahleh, “A new information-theoretic approach to signal denoising and best basis selection,” IEEE Transactions on Signal Processing, 2005.
[9] M. Hashemi and S. Beheshti, “Adaptive noise variance estimation in bayesshrink,” IEEE Signal Processing Letters, 2010.
[10] J. Han and L. Xu, “A new method for variance estimation of white noise corrupting a signal,” in I2MTC, 2006.
[11] S. Pyatykh, J. Hesser, and L. Zheng, “Image noise level estimation by principal component analysis,” IEEE transactions on image processing, 2012.
[12] D. L. Donoho and J. M. Johnstone, “Ideal spatial adaptation by wavelet shrinkage,” biometrika, 1994.
[13] S. Sardy and H. Monajemi, “Efficient threshold selection for multivariate total variation denoising,” Journal of Computational and Graphical Statistics, 2019.