Millisecond Cadence Radio Frequency Interference Filters

Joseph W. Kania,^1,2 Kevin Bandura,^3,2 Duncan R. Lorimer,^1,2 Richard Prestage ,^1,¹¹1Deceased
¹Department of Physics and Astronomy, West Virginia University, PO Box 6315, Morgantown, WV 26506, USA
²Center for Gravitational Waves and Cosmology, West Virginia University, Chestnut Ridge Research Building, Morgantown, WV 26505, USA
³Lane Department of Computer Science and Electrical Engineering, 1220 Evansdale Drive, PO Box 6109, Morgantown, WV 26506, USA E-mail: jkania@mix.wvu.edu (JWK)

(Accepted XXX. Received YYY; in original form ZZZ)

Abstract

Radio Frequency Interference (RFI) greatly reduces sensitivity of radio observations to astrophysical signals and creates false positive candidates in searches for radio transients. Real signals are missed while considerable computational and human resources are needed to remove RFI candidates. In the context of transient astrophysics, this makes effective RFI removal vital to effective searches for fast radio bursts and pulsars. Radio telescopes typically sample at rates that are high enough for there to be tens to hundreds of samples along the transient’s pulse. Mitigation techniques should excise RFI on this timescale to account for a changing radio frequency environment. We evaluate the effectiveness of three filters, as well as a composite of the three, that excises RFI at the cadence that the data are recorded. Each of these filters operates in a different domain and thus excises as a different RFI morphology. We analyze the performance of these four filters in three different situations: (I) synthetic pulses in Gaussian noise; (II) synthetic pulses injected into real data; (III) four pulsar observations. From these tests, we gain insight into how the filters affect both the pulse and the noise level. This allows use to outline which and how the filters should be used based on the RFI present and the characteristics of the source signal. We show by flagging a small percentage of the spectrum we can substantially improve the quality of transit observations.

keywords:

light pollution – pulsars: general – fast radio bursts – software: data analysis

^†^†pubyear: 2022^†^†pagerange: Millisecond Cadence Radio Frequency Interference Filters–A

1 Introduction

Radio Frequency Interference (RFI) refers to anthropomorphic radio energy received by radio telescopes. These signals can be significantly brighter than the radio sky, obscuring or degrading the observed signal. These effects are of grave concern for transient radio astronomy, where some events are not repeated. fast radio bursts (FRBs; Lorimer et al., 2007) and a subset of the pulsar population, the rotating radio transients (RRATs; McLaughlin et al., 2006) in particular have a limited number of pulses and if these individual pulses are missed, that information is lost. This contrasts with persistent sources such as pulsars, where longer integration can increase the signal-to-noise ratio. Effective RFI mitigation is still important for these sources because it can save valuable telescope time and reduce the amount of false positive candidates that must be discarded. Currently, there is significant interest in transient astrophysical sources. More FRBs will allow their use as cosmological probes (Walters et al., 2018) and allow better constraints on progenitor models. Pulsars provide insights into different states of matter, interstellar medium, galactic dynamics, general relativity, and gravitational waves. An overview of pulsars and their uses can be found in Lorimer & Kramer (2004). The interested in FRBs and pulsars has lead to dedicated campaigns to search for these sources, see CHIME/FRB Collaboration et al. (2018) and Deneva et al. (2009) for an example of a notable FRB and pulsar search, respectively.

RFI can originate from anything that has an electric current. This means there is a very wide variety of RFI sources that differ between telescope locations, observing times, telescope pointings, and observing frequencies. Different sources produce RFI with different morphologies. Sparks from spark plugs or bad connections in power lines will cause a broadband response across all channels. Digital communications will be limited to a few channels, and the power (for our purposes) will be effectively randomly distributed. Transitional radar will likewise be limited to a few channels, but will be perfectly periodic. We also encounter salt-and-pepper noise; bright random RFI that does not have a clear source.

RFI mitigation is a multi-pronged approach that takes place at every step in the signal path. Spectrum management attempts to reduce conflict between human made signals and frequencies of astronomical interest. (Pankonin & Price, 1981). Increasing spectrum usage (Liszt, 2019) and larger telescope bandwidths (Deng, 2020; Velazco et al., 2019) are leading to greater conflict. Radio observatories are being planned and built in isolated locations protecting them for household electronics like microwaves or television broadcasts. These locations are often legally protected by radio-quiet zones, which limit the use of transmitters around the instruments. This isolation does not protect them from satellites or airplane RADAR, the latter being bright enough that radio telescopes can see them in other star systems (Siemion et al., 2015). Telescopes are now also being designed to be less sensitive to RFI, for example the Green Bank Telescope’s off axis feed (Norrod et al., 1992). Persistently bad frequencies are often blocked with band-stop filters. After the signal is digitized, a variety of techniques can be employed to flag undesirable sections of data. Criteria include usually bright regions (Agarwal et al., 2020; Offringa et al., 2010), statistical tests for Gaussianity (Ransom, 2011; Nita et al., 2007), and projecting the dynamic spectra into a new basis where the RFI can easily be removed (Maan et al., 2021; Kocz et al., 2012). Interferometers can use adaptive pointing to steer their beams away from RFI sources (Jeffs & Warnick, 2013).

Most of the above post detection techniques focus on using a single filter on sections of dynamic spectra. We evaluate the three such filters, one to remove usually bright points in the dynamic spectra, another to remove narrow-band periodic signals, and finally one to remove broad-band signals. These three provide improvements to similar filters found in the literature. We also evaluate the effect of using all three filters sequentially. These filters are a part of JESS²²2https://github.com/josephwkania/jess, a Python package we are writing that includes many RFI filters as well as utilities to better understand the data quality.

The paper is organized as follows, in Section 2 we will describe the three filters. Section 3 we discuss the effects of our filters on synthetic pulses in ideal and real noise. We show the effectiveness of the filters on four pulsars. In Section 4 we discuss the performance of the filers and how they can be extended to multi-beam systems. We summarize of findings and speculate future improvements in Section 5

2 Three High Cadence Filters

We present three filters in this work. The first is a median absolute deviation (MAD) filter that operates in the time domain, the second is a Fourier domain MAD filter that removes bright narrow-band periodic signals. Finally, we use a high-pass filter that removes wide-band RFI. The first two filters work by calculating a robust measure of scale and central value in their respective domain. These two values are then used to find outliers (i.e., data points that are significantly far from the central value). This outliers are then replaced with values that are less destructive to the observation. The high-pass filter removes slowly varying signals that are present across most of the frequencies at a given time. If many different sources of RFI are present, these three filters can be used together to remove RFI that are more prominent in their respective filtering domains, synergistically cleaning the observation.

To justify the use of Gaussian statistics, we will walk through the telescope’s signal chain. The astronomical signal is captured at the focus of the telescope, converted into a current in a wire, then filtered and amplified. This amplified signal is then sampled by an analogue-to-digital Converter (ADC) that produces a quantized representation of the signal. ADCs often provide the real and imaginary components. Astronomical signals are expected to produce Gaussian distributed voltages (Nita et al., 2007). The real and imaginary samples will also be Gaussian distributed. A poly-phase filter bank is then used to channelize the data while minimizing spectral leakage between channels. This produces complex channelized samples. These complex samples are square-law detected (i.e., multiplied by their complex conjugate and summed). This produces samples with a $\chi^{2}$ distribution of power samples for each channel and time sample. These samples are at a time resolution that is $\mathcal{O}(10,000)$ times faster that what is needed for transient radio astronomy, so the samples are accumulated by this factor to make processing and storage reasonable. The central limit theorem tells us that the result of this accumulation will be nearly Gaussian samples. This allows us to use techniques designed to normal data for calculate a central value and a measure of scale.

2.1 Robust Measures of Scale

To find outliers, we need both a central location and a measure of scale. The central location tells us the expected value of our data and is a useful replacement value for bad data. The measure of scale tells us the dispersion of the distribution, standard deviation being the most common measure of scale. RFI can be orders of magnitude brighter than the signal of interest, so we must use estimators that are robust to large outliers. There are many possible choices, see Fridman (2008) for an overview. We choose Median Absolute Deviation (MAD) because of its stability. For a set of samples, $X$ , we define

\text{MAD}=\text{median}(|X_{i}-\text{median}(X)|).

(1)

Rousseeuw & Croux (1993) calculate the coefficient $\hat{k}=1.4826$ that relates MAD to standard deviation

\sigma=\hat{k}\cdot\text{MAD}.

(2)

From Eq (1) we see MAD will be stable as long as the bottom half of the distribution is unaffected by RFI. This stability gives MAD an advantage over other robust measures of scale that use data higher in the distribution. We considered other robust measures (Interquartile Range, Rousseeuw & Croux (1993) estimators $S_{n}$ and $Q_{n}$ , and Biweight Midvariance (Beers et al., 1990)), but they either did not provide as stable results as MAD, had high computation cost, or proved difficult to implement.

2.2 Time Domain MAD

The time domain MAD filter works to remove bright points in the dynamic spectra while preserving the astrophysical signal. Implementations of this filter have been used successfully in several experiments, for examples see, CHIME/FRB Collaboration et al. (2018); Buch et al. (2019); Rajwade et al. (2020). These implementations are bespoke to a particular instrument. Our aim is a more generic implementation that will work a high time resolution data from a variety of telescopes and sources.

Fig. 1 shows a section of dynamic spectra that includes three bursts from pulsar B0329+54 recorded using the GREENBURST backend (Agarwal et al., 2020; Surnis et al., 2019). GREENBURST is an L-Band transient search backend, the filtering performance here is typical for other telescope and backends. We want to remove the bright terrestrial sources while preserving the large pulses. We need to choose sections of dynamic spectra which are large compared to the RFI but smaller than the pulse. Such a block of data will provide robust statistics to remove and replace the RFI, while the pulse affects the statics and is preserved. We will operate two dimensional sub-blocks of the dynamic spectra.

The analogue receiver chain can add substantial frequency structure to the dynamic spectra. An example of this structure is show in the bottom right plot of Fig. 1. To perform outlier tests on two dimensional chunks of dynamic spectra, we must remove this structure, while preserving the the outlining RFI. If done correctly, this separates the RFI from the shape of the frequency response across the observing band. The RFI will then be removed as a non-Gaussian outlier.

Refer to caption — Figure 1: GREENBURST dynamic spectra with three pulses from B0329+54, which has a dispersion measure of 26.7 cm^-3 pc and period of 0.715 s. The central plot shows the original (i.e., not corrected for the effects of dispersion) dynamic spectra. The top plot is integrated along frequency, giving the time series. The right plot is averaged over time, giving the bandpass. Also visible are bright sharp peaks from channels badly affected by RFI. The top right plot is the histogram of the dynamic spectra.

We considered several fitting methodologies, some of which are shown in Fig. 2. The non-linear nature of RFI precludes filters such as high-pass or Savitzky–Golay (Savitzky & Golay, 1964). Iterative Chebyshev polynomial (Chebyschev, 1853) fitting, where an initial fit to all the points is used to exclude large outliers for a subsequent fit. B-splines (de Boor, 1978) fitted with a Huber loss function, where nearby points are fitted with a squared error loss and outlier points are fitted with a linear loss (Huber, 1964). We also the investigated asymmetrically re-weighted penalized least squares (arPLS) algorithm (Baek et al., 2015; Zeng et al., 2020). arPLS gives very good results, however its iterative matrix algebra is much slower than other filters. For this work, we settled on a rolling median filter. The kernel size is a adjustable parameter, which we provide a reasonable default. The median filter has the advantage of being very fast as well as being stable, were the polynomial and arPLS will sometimes diverge, leading to good data being flagged. Iterating the median filter to produce a cascade filter similar to Galassi et al. (2009) did not produce meaningful improvement.

Once the dynamic spectra has been detrended in frequency and time, we subdivide the dynamic spectra into subbands of $n$ channels. This allows for changes in dynamic spectra noise level across the bandpass, from a bandstop filter for example We then calculate the MAD and the median along the time axis of each of the subbands. We then use the rolling median filter to find the median of median and the median of the MADS along the time axis. This procedure is a recursive approximation to finding median and MAD of two dimensional blocks of spectra, while relaxing the requirement that the dynamic spectra dimensions be an integer number of subblock dimensions. We perform two iterations. The first iteration detrends using the median filter on the channel and time medians, preserving any RFI spikes while removing the intrinsic band and time structure. The MAD filter is run and pixels that are flags are replaced with Not a Numbers (NaNs). The NaNs are not considered during the subsequent detrends, this allows us to better model the good parts of the spectrum. We detrend the data again, now using only the medians along the two axes without the median filter. This iteration allows the filter to be more sensitive to smaller scale structure. The MAD filter is run again and the flagged values from both iterations are replaced by NaNs. A final detrend is followed by replacing NaNs with the desired value, which is now the median along both axes of the dynamic spectra. If the pulse has a low dispersion measure we can re-add the time trend, to preserve pulse power.

Comparing Fig. 1 and Fig. 3 shows the effects of the MAD filter. The frequency axis has been flattened, removing the effects of the signal chain. The time domain had likewise been flattened, however we re-added the total powers, to preserve the brightness of the the low-DM pulses. Almost all of the bright salt-and-pepper noise visible in Fig. 1 has been removed. The narrow band RADAR around channel 1300 has been greatly reduced, but is still present. This calls for frequency-domain filtering which we’ll discuss in the next section. Likewise, the bright narrow time RFI present around channel 3500 has been greatly reduced, but is still present. The communication band around channel 1900 has been completely removed. Importantly, much of the bright pulsar pulses remains. This indicates that the filter is working as expected.

For a given pulse morphology, larger DMs will cause the pulse to be spread out over a larger section of the dynamic spectra. This causes the MAD sections to be larger compared to the pulse. This can cause a DM dependence on the recovered pulse. However, as shown in Section 3 that for ( $\mathcal{O}(100)$ S/N) candidates this effect is not measurable. As discussed in Amiri et al. (2021), FRB DM and pulse brightness are related. High DM FRBs will likely be less bright, and thus less likely to have signal excises by the MAD filter.

2.3 Frequency Domain MAD

As seen in Figs. 1 and 3, we often encounter persistent periodic signals which can sometimes survive the time domain MAD filter. We would like to remove these anthropomorphic signals while preserving the periodic pulsar signal. This cleaning is best done in the frequency domain. We use the fast Fourier transform (FFT) along the time axis to make a representation of the dynamic spectra in Fourier space. The FFT give a real and imaginary components, or equivalently a magnitude and phase. We are interested in excising powerful periodic signals in each of the channels, but are uninterested when these signals start relative to the start of the dynamic spectra. Thus we can ignore the phase and flag the magnitudes. To flag the magnitudes, we roughly follow the same procedure as described in Section 2.2.

We take the magnitudes of the Fourier Transformed dynamic spectra, split these powers into frequency subbands. This split allows for changing noise and pulse structure across frequencies. We robustly fit in time and frequency to remove large trends show structure across the bandpass. The origin of these trends include changes in instrument sensitivity and stranding waves in the analogue chain. Red noise will leave a large spike in the Fourier domain time space. Once we have the detrended the magnitudes, we compute MAD across channels for each of the subbands. As discussed by Nita et al. (2007), for Gaussian noise the real and imaginary parts of the Fourier transform will be in turn be Gaussian distributed. The magnitudes will be $\chi^{2}$ distributed. The filter will be slightly more aggressive than pure Gaussian nose of a given sigma.

Broadband signals (such as FRBs and Pulsars) will be present across many channels. When only considering the magnitudes, broadband signals will be aligned across the band, since their arrival time is controlled by the phase. These astronomical signals will increase the median and the noise level uniformly and will be protected from the filter while narrow band RFI, and will only show in a few channels and will be excised. This filtering methodology contrasts with that of Maan et al. (2021), who also filter in the Fourier domain. Maan et al. (2021) use statistics on per-channel blocks in Fourier space flagging blocks that have outlying differences in mean or standard deviation. Our method is more forgiving to broadband signals. This is advantageous for bright pulsars, at the cost of excising broadband periodic noise (such as power lines). We will filter these broadband signals with our final filter.

Frequency channels that are used used for communication, but don’t contain periodic signals can still benefit. Digital communication consists of steps between on and off. These steps are abrupt and causes considerable ringing in the Fourier domain. The Fourier MAD filter removed the largest peaks of the ringing leading to a smoother channel in the time domain. This smoothing can be seen between Channels #1500-1700 in Figs. 1 & 3. After the filtering, these channels are much smoother, lowering the noise level of the dynamic spectra.

Once magnitudes above a defined threshold are flagged, we must make a choice about replacement. The easiest replacement option is zero. This can be problematic because sharp edges in the Fourier domain can lead to ringing in the time domain since we are computing finite FFTs. We did not find these effects noticeable, so we found zero replacement satisfactory. Alternatively we could rescale the powers or use Least-Squares Spectral Analysis (LSSA) to handle missing samples (Chapter 13 Press et al., 2007).

We see the effects of this filter on B0329+54 in Fig. 4. The bright radar pulses have been removed, additionally the communication RFI has been greatly reduced. Bright salt and pepper noise remains. The three pulses are largely the same, losing only a little power in the sub-pulse that precedes and follows the main pulse. We also see that the bandpass shape is preserved. For the time domain MAD filter, we removed this because it provided no useful information and makes replacement values difficult to determine. We did not have to do this flattening step with this filter because the total is determined by the first component for the FFT. We preserve this component and thus do not require the use of a replacement value. In this sense, the Fourier MAD filter can be seen as a smoothing filter, spreading out RFI power over the channel. This is acceptable because we are interested in change in power level, not the absolute level.

2.4 High-pass

The first highpass filter for pulsar searches was described by Eatough et al. (2009), in which the dynamic spectra is collapsed over frequency to form a zero-DM time series. This time series is then subtracted off the dynamic spectra, thereby removing signals that are common across all channels at a given time sample. This is very effective at removing structureless RFI sources, such as spark plugs and unshielded electrical equipment.

This highpass filter worked well for the Parkes Multibeam Pulsar Survey (Manchester et al., 2001), which has one bit dynamic range and 288 MHz bandwidth made for uniform telescope response. If RFI is strong enough to flip one bit, the RFI was likely to show up across all the channels. More modern receivers can more than a GHz of bandwidth and often have more than eight bits of dynamic range. These combined give rise to more complex zero-dm RFI than was seen in Eatough et al. (2009), as shown in Figure 5. Bandstop filters and band roll off coupled with changing RFI intensity leads the the RFI being under-subtracted in some frequencies and over subtracted in others. To account for this we extend methodology of Eatough et al. (2009) into a more general highpass filter. We FFT along the frequency axis, replace the lowest $n$ Fourier components with zeros and FFT back, this filtering is equivalent to Eatough et al. (2009) for $n=1$ . As both then remove the DC component across frequencies. As discussed by Eatough et al. (2009), this filtering will reduce the pulse energy as a function of dispersion measure. Low DM pulses occupy a large portion of the dynamic spectra. These pulses can then lose a significant portion of their energy to the filter. As discussed in Eatough et al. (2009), this energy loss can be partially recovered by using convolution kernel that takes into account the how the filter changes the shape of the time series. However, we don’t know of any search pipelines that have implemented these kernels.

This ideal high pass filter’s jump discontinuity will introduce Gibbs ringing. We tried tapered transition windows, but the improvement was minimal and this made the filter less intuitive to use. As discussed in the next section, this filter is used almost exclusively with the MAD filter, which detrends frequency structure, reducing the size of the low Fourier components before this filter is run. As shown in the right most panel of Fig. 6, some ringing is visible, however it is near the noise level, and thus not of great concern.

2.5 Composite Filter

Each of the above filters operates in a different domain, so it is often useful to combine the three of them. We show such a case in Fig. 6 where we have multiple bright RFI sources. We begin by running the time domain MAD filter. This removes salt-and-pepper RFI and detrends the data in time and frequency. As shown in the second panel of Fig. 6, periodic RFI is still bright and present after the MAD filter. We then apply the FFT-MAD filter to remove this persistent narrow-band RFI. The third box of Fig. 6 shows the now periodic RFI has been removed. The long sweeping pulses are zero-DM RFI. They show a sweep because these blocks are dispersed. As shown in Figure 5, mean subtraction does not fully remove the effects of this signal. We then apply the high-pass filter, removed the first six modes to remove the broad-band RFI. Shown by Figure 6, this process greatly reduced the structure in the dynamic spectra, revealing candidates that would be missed if only one filter was used.

This filter order was chosen because the MAD filter detrends the dynamic spectra and remove large spikes, linearizing the dynamic spectra which make the operations in Fourier space more stable. We noticed zero-DM subtraction alone would often increase the standard deviation of de-dispersed time series. This is possibly due to very bright narrow-band RFI, which as we see in Fig. 6, can be significantly brighter than broadband RFI, being smeared out in frequency by the zero-DM subtraction. Hence we put the highpass filter last, after all the narrow-band RFI is removed.

3 Filter Effects

3.1 In Ideal noise

We used will³³3https://github.com/josephwkania/will (Kania & Bandura, 2022) to create synthetic pulses and inject them into Gaussian noise that has the same MAD and median as a GREENBURST dynamic spectra. The pulse creation will be described in detail in the next section. For this section and the rest of the paper, we used the pulse detection in will. We do a pulse detection of a chunk of dynamic spectra by dispersing the dynamic spectra around the pulse, perform a boxcar convolution and detrend and measure the peak in the time series as well as the noise level. This later measurement is an additional reason why we used will, as common single pulse search pulse packages don’t report noise levels. This noise measurement is needed because we are interested in measuring the interaction between noise level and recovered pulse characteristics An additional reason for using our own search implementation is for a standard single pulse search would be extended over many DM and width trials. Candidates are then clustered in DM–time–pulse width space, with one value reported for the cluster. RFI can affect this clustering. See Aggarwal et al. (2021) for a discussion of clustering methodologies. Since we know the pulse and width of the injected pulse, so we know the optimal width and DM to search, this allows us to skip the clustering step and simplify our analysis.

We cleaned the dynamic spectra with our four filters. The effects of filters are shown in Fig. 7. We see that over DM and pulse pulse intensity, the filters behave as expected. The time domain MAD filter removes parts of very bright pulses, however some part of the pulse persists and is recoverable. The MAD filter drives the response of the composite filter bright pulse regime. The FFT MAD filter had no dependency on pulse power, the pulse magnitudes line up and the pulse protects itself. The highpass filter also behaves as expected. The signal removed by this filter is passes on how the fraction of the total bandwidth the pulse occupies and this of course is not a function of pulse power.

The response over DM similarly behaves as expected. Low DM pulses occupy a higher percentage of the band and more of the pulse gets removed by the highpass filter. At higher DM, the pulse occupies less of frequencies and less of the pulse is removed. This DM-response filter shows us that filtering introduced some variability, on order of 10%, to the recovered S/N. This variability was expected, but we are surprised by size of the effect. This variability comes from two sources. First, the background and the pulse are randomly generated. These realizations will have different random bright points both on and off the pulse that will get flagged. The flagging will cause random variation around the mean injected S/N. The second effect, for the MAD and composite filters, is as the pulse changes in DM, the pulse is spread out over different numbers of MAD sub-arrays. If the pulse only occupies a small portion of the sub-array, more of the pulse could be flagged. If the pulse occupies a large portion of the sub-array, then the pulse statistics will drive the sub-array statistics and none of the pulse will be flagged. This filter-pulse interaction is determined by the characteristics of the pulse, the pulse phase relative to the filter sub-arrays, and DM. We partially marginalize over these last two qualities in Fig. 7 by using multiple pulses randomly phase shifted relative the filter and multiple DM averaged into a bin. We use the standard deviation of S/N in the DM bins to produce one sigma errors.

We can use these two plots to determine what filtering scheme would be useful. If we are interested in low DM pulses, we should use a less aggressive high-pass filter, or forgo this filtering step. If we expect bright pulses, we would forgo MAD filtering which has a poor response to pulse where S/N is above a few hundred. The former decision can be made by even during blind searched by consulting estimated dispersion measure modules, such as Cordes (2004) and Yao et al. (2017) for a given pointing direction and estimating a minimum reasonable DM.

3.2 Synthetic Pulses in Real Noise

We used three hours of GREENBURST data from a single pointing that had been search but and did not have any bright astrophysical pulses. This observation was done at night which usually has less RFI but during a period where the telescope elevation was low which can allow more RFI into the telescope. GREENBURST records filterbank files (i.e. a discretely sampled array of frequency and time data from the telescope) that that are 597 s long. We took twenty two of these filterbank files and used them as the base to inject pulses. For each file we samples the parameter space described in Table 1, all the pulse properties except for the individual pulse powers are the some for a given file. See Agarwal et al. (2020) for the physical insight behind the parameter ranges. Each file had 288 pulses interjected. We repeated this processes 115 times for a total of 33120 pulses. This base files where used multiple times, but our analysis is focused on the effects on burst morphology and not the effects of sky position.

We create pulses that are Guassian along the time and the frequency axis. We convolve the time profile with an exponential scatter function $e^{-t/\tau}/\tau$ where $\tau$ is the scatter time in Table 1. For the frequency profile we multiply the Gaussian by a spectral index $(F/F_{\text{center}})^{\alpha}$ , where $\alpha$ is the spectral index, $F$ is the channel frequency and $F_{\text{center}}$ is the reference frequency, we choose the centre channel. The radio light travels through ionized gas on its path from the source to the telescope. This plasma in plasma has random structure. The structure creates the structure, causes amplitude variations across frequency (Burke & Graham-Smith, 2009). We simulate scintillation by multiplying the frequency profile with the factor

S\equiv\Big{|}\cos\Big{(}2\pi N_{s}(F/F_{\text{center}})^{2}\Big{)}\Big{|},

(3)

where $N_{s}$ is half of the number of bright patches. We injected all the scintles at the same phase, since we do not anticipate this having a significant effect. Finally we take into account the receiver roll off and bandstop filter. These are frequencies where the analogue chain has low sensitivity where we are unlikely to see an astrophysical signal. We module the rolloff by calculating the medians along the band-pass, we then smooth these medians with a median filter. We calculate the standard deviation of this smoothed band-pass. We flag channels that are within 2/3 of a standard deviation of zero power. The transition between on and off regions is made using a Gaussian filter.

The previous paragraphs describe the continuous functions used to make the pulse profile. We need discrete values to inject into the the dynamic spectra. To transform from continuous to discrete, we sample continuous function for a given number of samples. The number of samples will be proportional to the pulse brightness and is reported in Table 1. We reference this number and not S/N because we do not have a way to know the noise level without RFI because we cannot perfectly remove all RFI. Based on Mickaliger et al. (2018), we used a log-normal distribution of pulse powers, but did not include an exponential tail. Since we precisely know the injected pulse powers, we can see how the recovered pulse energy distribution is affected by both RFI and RFI filters.

First we sample the space described by Table 1, which we use to create a pulse profile. From this profile we calculate the ideal boxcar width. We sample this profile and inject this into the file.

We sequentially clean this file with each one of the filters, and perform the same pulse detection as described above. We record these values and provide a summary in Table 2. The first column is the average pulse amplitude, $\overline{\text{Amplitude}}$ , that is the mean amplitude of the boxcar convolved pulse. We see that each filter has some loss and composite filter inherits these losses. The next column, $\overline{\text{Noise}}$ is the average of the convolved noise levels at each pulse window. We see that both outlier filters provide a substantial reduction in the noise level. The average S/N over all the pulses is $\overline{\text{S/N}}$ .

To test the distribution of injected against that of the recovered pulses we used two statistical tests, the Kolmogorov-Smirnov (KS) and the Anderson-Darling (AD) two sample tests. The goal of both these tests is to quantify the difference between the injected and recovered pulse distributions. In the ideal case, the RFI filtering should bring these distributions closer together by removing spurious signals that distort our view of the recovered pulse without degrading the pulses. Since we are injecting pulse samples we need to find a way to map these values to S/Ns. We take inspiration from Lilliefors Test (Lilliefors, 1967). Lilliefors test is a normality test for when the mean and variance is unknown. The test divides by the variance and subtracts the mean. Likewise we can make the powers and the recovered S/N unit variance and zero mean, and then apply both two sample tests. This will change the degrees of freedom, so we cannot report significance values without numerical simulation of these tests.

We see that we have mixed results. For some RFI filters, particularly the FFT-MAD, we see this values go down indicating that the two distributions are closer to each other. We see that this is not the case for the MAD filter, the Anderson-Darling value increases, while the Komogrorov-Smirnov value decreases. The Anderson-Darling statistic, $A^{2}$ for two continuous distributions $F$ and $F_{n}$ is defined by Anderson & Darling (1952) as

A^{2}=n\int^{\infty}_{-\infty}\frac{(F_{n}(x)-F(x)))^{2}}{F(x)(1-F(x))}dF(x)

(4)

This equation shows that the Anderson-Darling test is weighted toward the tails of the distribution. Our test these tails will be high and low S/N pulses. This gives us an explanation why the KS and AD statistics diverge when applying the MAD filter. The MAD filter stabilized the noise level and separates low S/N candidates from the noise, this improves the KS statistic, which is equally weighted among all the pulses. This filter will clip bright pulses, reducing the returned fidelity of the S/N high tail and increasing the AD static. Which filter is most useful partially depends on the characteristics of the pulses of interest. We see that the outlier filters do a good job of increasing the occupancy, that is the percent of pulse windows that have a S/N above six. However in in this test, the composite filter is too aggressive. The % flagged tells us the amount of data that is flagged in each domain, the composite % flagged is the sum of the percent flagged of reach filter. The high cadence filters can make significant improvements to the recovered pulse characteristics while flagging only a small amount of of the data.

Table 1: Distributions sampled for the pulse parameter space.

Parameter	Distribution	Range
Pulse Bright. Median	Log-Normal	( $5\times 10^{2}$ , $5\times 10^{6}$ ) Samp.
Pulse Bright. Stand. Dev.		20% of median
Pulse Width	Uniform	(0.5, 30) ms
Spectral Index	Uniform	(-3, 3)
Frequency Width	Uniform	(350, 1000) MHz
Center Frequency	Uniform	(1056, 1728) MHz
# Scintiles	Uniform	(0, 3)
Scatter Time	Uniform	(0.2, 5) ms
Dispersion Measure	Uniform	(10, 3000) pc $\text{cm}^{-3}$

Table 2: A comparison of the four filters on synthetic pulsars.

\overline{\text{Amplitude}}

is the average amplitude of the box car convoluted pulse.

\overline{\text{Noise}}

is the average noise level of the deterened time series around each pulse.

\overline{\text{S/N}}

is the mean of the S/N of each pulse. Folded S/N is the weighted mean of the folded S/Ns across epochs. KS is the modified Kolmogorov-Smirnov test between and injected power and recovered S/N. AD is the modified Anderson-Darling test for the aforementioned distance. Occupancy is the percent of pulse windows with a pulse above S/N 6. % Flagged is the percent of the data flagged by each filter.

Filter	$\overline{\text{Amplitude}}$	$\overline{\text{Noise}}$	$\overline{\text{S/N}}$	Stacked S/N	KS	AD	Occupancy.	% Flagged
None	10.3	0.263	51.5	353	0.109	1.29	78.1	None
MAD	7.36	0.0918	82.9	829	0.0813	1.94	91.7	2.36
FFT-MAD	8.89	0.0880	95.9	1352	0.0625	0.342	91.1	2.29
Highpass	9.73	0.253	49.6	351	0.108	2.36	75.6	0.293
Composite	6.71	0.0864	78.4	1100	0.0824	1.93	89.5	4.10

3.3 Using real pulses

The final test makes use of real pulses in GREENBURST data. Pulsar will have more complex burst structures than previously modeled. We’ve chosen four pulsars that cover a range of dispersion measures and brightness as shown in Table 4. These pulsars are bright, and will have many visible single pulses.

As in the previous subsection we report the $\overline{\text{Amplitude}}$ , $\overline{\text{Noise}}$ , $\overline{\text{S/N}}$ , and stacked S/N. We again see that the filters can significantly improve both the individual pulse S/N and the stacked S/N. However there is no one-size-fits-all solution. For bright pulsar J1935+1616, the FFT-MAD filter does the best, but the time domain MAD filter removes more of the pulse than is ideal. This result is as we expect from Figure 7. A less expected result is J0358+5413, this pulsar is about half as bright as J1935+1616 but we see the same effect with the latter. This is because much of J0358+5413’s energy is in very bright (above 200 S/N) pulses. This is discussed further in Appendix A. These bright pulses get clipped. The less bright pulses are not removed, and their S/N increases as expected. This gives mixed result for the MAD filter, the number of windows with a pulse above six sigma increases while the average pulse S/N decreases.

PSR J1705–1906 has a low DM of 23 cm^-3 pc, this causes the pulse take up a significant fraction of the bandwidth. The highpass filter subsequently removes 10% of the pulse power. This result again shows the need to consider the source while choosing the filtering. Instead of three modes, we should have excised one or zero Fourier modes.

Unlike the case with synthetic pulses, we do not know the intrinsic power distribution, so we cannot perform the KS or AD tests. Instead we will use the Spearman’s rank correlation coefficient to measure the relationship between a pulses S/N and the noise level. We choose the Spearman rank coefficient because it describes how well the relationship between two variables can be described by a monotonic function. We expect for a given signal brightness that S/N and noise level will be monotonically decreasing. In the ideal case, S/N and noise level will be uncorrelated, the noise will be Normal and any changes in pulse S/N will reflect intrinsic changes to the source brightness. If these variables are correlated, then S/N is partially measuring the evolving radio frequency environment, and the pulse measurements as reported by S/N will be less meaningful.

We see that this raw value is negative, as expected. With filtering, we can bring this metric closer to zero. This trend to zero is an indication that the S/Ns better reflect the intrinsic pulse energy and is less affected by the varying radio frequency environment.

Even though these are bright pulses, the filtering noticeably increases the percent of windows with pulses. This means that search pipelines should be able to find more pulses. These pulses can then be used to better understand the source.

The percentage flagged is higher than for the simulated pulses. This reflects that these observations took place at times throughout the day, leading to a worse RFI environment. The composite filter removes less than ten percent of the data. This small amount of flagging is an advantage of filtering on the shortest timescale. Flagging methods that remove blocks of data will often flag higher percent of the data. More flagging means more signal lost to the filter.

Table 3: A comparison of the four filters on four pulsars.

\overline{\text{Amplitude}}

is the average amplitude of the box car convoluted pulse.

\overline{\text{Noise}}

is the average noise level of the detrended time series around each pulse.

\overline{\text{S/N}}

is the mean S/N of each pulse. Stacked S/N is the weighted mean of the folded S/Ns across epochs. Spearman is the Spearman coefficient between the pulse S/N and the standard deviation. Occupancy is the percent of pulse windows with a pulse above S/N 6. % Flagged is the percent of the data flagged by each filter.

PSR	Filter	$\overline{\text{Amplitude}}$	$\overline{\text{Noise}}$	$\overline{\text{S/N}}$	Stacked S/N	Spearman	Occupancy.	% Flagged
	None	11.0	0.100	118	1640	-0.664	98.1	None
	MAD	6.93	0.0510	155	2000	-0.395	98.3	4.48
	FFT-MAD	10.5	0.0620	178	2000	-0.593	98.2	4.82
	High-pass	11.0	0.102	116	2040	-0.593	98.1	0.293
J1935+1616	Composite	6.85	0.0451	153	1780	-0.391	98.3	7.06
	None	3.31	0.110	31.2	133	-0.340	99.0	None
	MAD	2.53	0.0588	43.6	199	-0.186	99.9	5.15
	FFT-MAD	3.20	0.0800	41.4	167	-0.321	99.7	2.92
	High-pass	3.31	0.111	31.0	153	-0.335	98.9	0.293
J0742-2822	Composite	2.51	0.0589	43.2	202	-0.194	99.9	5.72
	None	1.28	0.0815	17.2	20.8	-0.285	66.7	None
	MAD	0.714	0.0427	16.5	11.2	0.174	84.9	4.59
	FFT-MAD	1.22	0.0603	21.1	23.2	-0.192	75.7	4.85
	High-pass	1.28	0.0819	17.1	19.5	-0.282	66.4	0.293
J0358+5413	Composite	0.733	0.0425	17.0	14.7	0.163	85.0	6.55
	None	3.00	0.211	14.5	95.1	-0.269	93.5	None
	MAD	2.73	0.0739	38.1	218	-0.242	99.9	4.82
	FFT-MAD	2.90	0.136	22.2	131	-0.343	98.6	3.18
	High-pass	2.69	0.211	13.0	34.2	-0.229	90.2	0.293
J1705-1906	Composite	2.49	0.0790	31.8	61.5	-0.077	99.9	6.16

Table 4: Four pulsars considered. For each pulsar, we list the period, DM, and the mean flux density at 1400 MHz, S₁₄₀₀. These values are as reported in Manchester et al. (2005). Windows are the number of pulse windows considered in the analysis.

Pulsar	Period [ms]	DM [pc $\text{cm}^{-3}$ ]	S₁₄₀₀ [mJy]	Windows
J1935+1616	359	159	58	8276
J0742-2822	167	73.7	26	7482
J0358+5413	156	57.1	23	10438
J1705-1906	300	22.9	5.66	10754

4 Performance

Using a Nvidia 20 Series GPU, the composite filter can clean data an 1.7 times the acquisition speed for GREENBURST filtebanks (4096 channels, 256 $\mu$ s sampling, 8 bit dynamic range). This is more than sufficient for offline processing and use for real time observations when a GPU is available. This runtime is favorable to that of Presto (Ransom, 2011) (1.0 times real time) and rficlean (Maan et al., 2021) (1.0 times real time), but slower than IQRM Kurtosis (Morello et al., 2022) (3.8 times real time) for the same data.

The majority of the compute time for the composite filter is spend on time domain MAD (80%), while the FFT-MAD (17%) and the highpass (1.5%) takes significantly less. The remainder of the time is for data handling and I/O. The MAD filter is more expensive because it carries out multiple detrendings, which require calculating the medians along the time and frequency axis. By leveraging GPUs the medians can be calculated massively in parallel.

There are several steps we could take for better performance, which would be of necessary for multiple beams or for real time CPU processing. We could replace the first iteration of the MAD filter with a Gaussianity test, such as a Kurtosis filter. This would save both the MAD calculation and the initial detrend, making this filter almost twice as fast. MAD is very expensive because it requires two partitions of the data, one to find the median and another to find the median of deviations. We could use the interquartile range which requires the partition in three places (20th, 75th for the range and 50th percentile for the replacement value) this tends to be faster than the double partition needed for MAD. A third consideration is we calculate the MAD along subbands on the time axis, by taking the median of these medians, we can more cheaply calculate the medians for the time deterend. A combination of these above in higher performance language would greatly reduce the computational requirements.

jess is written in Python 3 (Van Rossum & Drake, 2009) and uses CuPy, Matplotlib (Hunter, 2007), NumPy (Harris et al., 2020), Rich, SciPy (Virtanen et al., 2020), Sphinx, and Your (Aggarwal et al., 2020).

5 Conclusions

Radio frequency interference (RFI) imposes serious changes to radio observations. RFI degrades the observation, making it difficult to find faint pulses and produces significant numbers of false positive candidates. This false positives require significant human and computational resources to eliminate them. Good RFI filters reduce the noise level of an observation allowing for fainter pulses to be found. The filters should also remove RFI structure to reduce the number of false positives. In this paper we introduce implementations of three RFI filters that have advantages to their earlier implementations. This filters work in different domains, this lead us to the conclusion that they would work in concert to remove a wide variety of RFI.

Understanding the transfer function on true pulses is vital to determining true physical quantities. We investigated the effects of the filters in three ways. Injected synthetic pulses into Normal noise, an ideal filter would flag not data in this situation. We see that the MAD filter removes very bright pules. We also see that the highpass filter removes power from low Dispersion Measure pulses.

Next we injected pulses onto an actual observation. This allowed us to see the effects of the filters on a wide variety of pulses. We see that the MAD and FFT MAD filters greatly reduce the noise level which is further reduced by the highpass. We use two statistical tests to show that filtering can allow the recovered S/Ns to better reflect the intrinsic pulse energies.

We filtered four pulses that range over brightness and DMs. This experiment shows that there is no universal filtering. The optimal filtering depends on the RFI present, the characteristics of the pulse, and what we want to know about the pulse. The composite filter has a robust overall response and can be used in a wide range of situations to greatly reduce the noise level of an observation.

The filters present here are not a unique combination, adding or removing a test may be useful in different radio frequency environments and for different sources. These filters can also be improved in both RFI excision and performance. For example a better MAD filtering can be done at dedispersion. As the dedispersed dynamic spectra is collapsed, the MAD filter can be iterative applied. Filtering along the frequency axis, sum nearby frequencies, and then filter again. This filter is more sensitive to large structures but is very expensive.

Our goal was a filter that removed the minimal amount of signal, that is fast, and works over a wide range of telescopes and sources. We believe that these filters meet the criteria. For blind searches, the number of false positives and the number of true positives are important. We will address this question in a future work. These filters are available at https://github.com/josephwkania/jess and can clean both filterbank and psrfits files. We hope that others find them useful in their pulsar and FRB searches.

Acknowledgements

JWK and KB are supported by a National Science Foundation (NSF) award 2006548. DRL acknowledges support for GREENBURST from the NSF through award AAG-1616042. The Green Bank Observatory is a facility of the National Science Foundation operated under cooperative agreement by Associated Universities, Inc. We would like to thank Devansh Agarwal for his help with GREENBURST data collection.

Data Availability

The output for the real and synthetic pulse searches are available at https://zenodo.org/record/6487651#.YmeiH6YpDok. The corresponding author can make GREENBURST fitlerbanks available on reasonable request.

References

Agarwal et al. (2020) Agarwal D., et al., 2020, MNRAS, 497, 352
Aggarwal et al. (2020) Aggarwal K., et al., 2020, Journal of Open Source Software, 5, 2750
Aggarwal et al. (2021) Aggarwal K., et al., 2021, The Astrophysical Journal, 914, 53
Amiri et al. (2021) Amiri M., et al., 2021, ApJS, 257, 59
Anderson & Darling (1952) Anderson T. W., Darling D. A., 1952, The Annals of Mathematical Statistics, 23, 193
Baek et al. (2015) Baek S.-J., Park A., Ahn Y.-J., Choo J., 2015, The Analyst, 140, 250
Beers et al. (1990) Beers T. C., Flynn K., Gebhardt K., 1990, AJ, 100, 32
Buch et al. (2019) Buch K. D., Naik K., Nalawade S., Bhatporia S., Gupta Y., Ajithkumar B., 2019, Journal of Astronomical Instrumentation, 8, 1940006
Burke & Graham-Smith (2009) Burke B. F., Graham-Smith F., 2009, An Introduction to Radio Astronomy. Cambridge
CHIME/FRB Collaboration et al. (2018) CHIME/FRB Collaboration et al., 2018, ApJ, 863, 48
Chebyschev (1853) Chebyschev P., 1853, Thèorie des mèccanismes connus sous le nom de parallèlogrammes. Imprimerie de l’Académmie impériale des science,, St.-Pétersbourg
Cordes (2004) Cordes J. M., 2004, in Clemens D., Shah R., Brainerd T., eds, Astronomical Society of the Pacific Conference Series Vol. 317, Milky Way Surveys: The Structure and Evolution of our Galaxy. p. 211
Deneva et al. (2009) Deneva J. S., et al., 2009, ApJ, 703, 2259
Deng (2020) Deng M., 2020, PhD thesis, University of British Columbia, doi:http://dx.doi.org/10.14288/1.0395494, https://open.library.ubc.ca/collections/ubctheses/24/items/1.0395494
Eatough et al. (2009) Eatough R. P., Keane E. F., Lyne A. G., 2009, MNRAS, 395, 410
Fridman (2008) Fridman P. A., 2008, AJ, 135, 1810
Galassi et al. (2009) Galassi M., et al., 2009, A GNU manual. OCLC, 552301619
Harris et al. (2020) Harris C. R., et al., 2020, Nature, 585, 357
Huber (1964) Huber P. J., 1964, The Annals of Mathematical Statistics, 35, 73
Hunter (2007) Hunter J. D., 2007, Computing in Science & Engineering, 9, 90
Jeffs & Warnick (2013) Jeffs B. D., Warnick K. F., 2013, in 2013 US National Committee of URSI National Radio Science Meeting (USNC-URSI NRSM). pp 1–1, doi:10.1109/USNC-URSI-NRSM.2013.6525067
Kania & Bandura (2022) Kania J., Bandura K., 2022, WILL - Weighted Injector of Luminous Lighthouses, Manuscript in Preparation
Kocz et al. (2012) Kocz J., Bailes M., Barnes D., Burke-Spolaor S., Levin L., 2012, Monthly Notices of the Royal Astronomical Society, 420, 271
Lilliefors (1967) Lilliefors H. W., 1967, Journal of the American Statistical Association, 62, 399
Liszt (2019) Liszt H., 2019, Managing spectrum for evolving technologies.
Lorimer & Kramer (2004) Lorimer D. R., Kramer M., 2004, Handbook of Pulsar Astronomy. 4, Cambridge University Press
Lorimer et al. (2007) Lorimer D. R., Bailes M., McLaughlin M. A., Narkevic D. J., Crawford F., 2007, Science, 318, 777
Maan et al. (2021) Maan Y., van Leeuwen J., Vohl D., 2021, A&A, 650, A80
Manchester et al. (2001) Manchester R. N., et al., 2001, MNRAS, 328, 17
Manchester et al. (2005) Manchester R. N., Hobbs G. B., Teoh A., Hobbs M., 2005, AJ, 129, 1993
McLaughlin et al. (2006) McLaughlin M. A., et al., 2006, Nature, 439, 817
Mickaliger et al. (2018) Mickaliger M. B., McEwen A. E., McLaughlin M. A., Lorimer D. R., 2018, MNRAS, 479, 5413
Morello et al. (2022) Morello V., Rajwade K. M., Stappers B. W., 2022, MNRAS, 510, 1393
Nita et al. (2007) Nita G. M., Gary D. E., Liu Z., Hurford G. J., White S. M., 2007, PASP, 119, 805
Norrod et al. (1992) Norrod R., Srikanth S., Balister M., 1992, in 1992 IEEE MTT-S Microwave Symposium Digest. pp 1365–1368 vol.3, doi:10.1109/MWSYM.1992.188259
Offringa et al. (2010) Offringa A. R., de Bruyn A. G., Biehl M., Zaroubi S., Bernardi G., Pandey V. N., 2010, Monthly Notices of the Royal Astronomical Society, 405, 155
Pankonin & Price (1981) Pankonin V., Price R. M., 1981, IEEE Transactions on Electromagnetic Compatibility, EMC-23, 308
Press et al. (2007) Press W. H., Teukolsky S. A., Vetterling W. T., Flannery B. P., 2007, Numerical Recipes. Cambridge University Press
Rajwade et al. (2020) Rajwade K. M., Mickaliger M. B., Stappers B. W., Bassa C. G., Breton R. P., Karastergiou A., Keane E. F., 2020, MNRAS, 493, 4418
Ransom (2011) Ransom S., 2011, PRESTO: PulsaR Exploration and Search TOolkit (ascl:1107.017)
Rousseeuw & Croux (1993) Rousseeuw P. J., Croux C., 1993, Journal of the American Statistical Association, 88, 1273
Savitzky & Golay (1964) Savitzky A., Golay M. J. E., 1964, Analytical Chemistry, 36, 1627
Siemion et al. (2015) Siemion A., et al., 2015, in Advancing Astrophysics with the Square Kilometre Array (AASKA14). p. 116 (arXiv:1412.4867)
Surnis et al. (2019) Surnis M. P., et al., 2019, Publ. Astron. Soc. Australia, 36, e032
Van Rossum & Drake (2009) Van Rossum G., Drake F. L., 2009, Python 3 Reference Manual. CreateSpace, Scotts Valley, CA
Velazco et al. (2019) Velazco J. E., Ledezma L., Bowen J., Samoska L., Soriano M., Akgiray A., Weinreb S., Lazio J., 2019, in 2019 IEEE Aerospace Conference. pp 1–6, doi:10.1109/AERO.2019.8742126
Virtanen et al. (2020) Virtanen P., et al., 2020, Nature Methods, 17, 261
Walters et al. (2018) Walters A., Weltman A., Gaensler B. M., Ma Y.-Z., Witzemann A., 2018, The Astrophysical Journal, 856, 65
Yao et al. (2017) Yao J. M., Manchester R. N., Wang N., 2017, The Astrophysical Journal, 835, 29
Zeng et al. (2020) Zeng Q., Chen X., Li X., Han J. L., Wang C., Zhou D. J., Wang T., 2020, arXiv e-prints, p. arXiv:2008.11949
de Boor (1978) de Boor C., 1978, A practical guide to splines. Springer

Appendix A J0358+5413

We see that some very bright pulses make a significant portion of the total pulse energy. As expected by Figure 7, the composite filter greatly reduces the energy of these large pulses. However the S/N of the pulses closer to the noise level increases. These leads to the results in Table 3, were we see an increase in the number of number of pulse windows with a pulse above S/N of six. However the average S/N decreases. This is also reflected in the stacked S/N as well. The bright pulses contain a significant portion of the total power, and their removal adversely affects the combined S/N.