# FPGA Implementation of a Cyclostationary Detector for OFDM Signals

Douglas Allan\*, Louise Crockett, Stephan Weiss, Kenneth Stuart and Robert W. Stewart Department of Electronic and Electrical Engineering, University of Strathclyde 204 George Street, Glasgow, G1 1XW, Scotland, UK Email: \*d.allan@strath.ac.uk

Abstract-Due to the ubiquity of Orthogonal Frequency Division Multiplexing (OFDM) based communications standards such as IEEE 802.11 a/g/n and 3GPP Long Term Evolution (LTE), a growing interest has developed in techniques for reliably detecting the presence of these signals in dynamic radio systems. A popular approach for detection is to exploit the cyclostationary nature of OFDM communications signals. In this paper, we focus on a frequency domain cyclostationary detection algorithm first introduced by Giannakis and Dandawate and study its performance in detecting IEEE 802.11a OFDM signals in the presence of practical radio impairments such as Carrier Frequency offset (CFO), Phase Noise, I/Q Imbalance, Multipath Fading and DC offset. We then present a hardware implementation of this algorithm developed using MathWorks HDL Coder and provide implementation results after targeting to a Xilinx 7 Series FPGA device.

#### I. INTRODUCTION

Orthogonal Frequency Division Multiplexing (OFDM) is an efficient modulation technique wherein a number of subcarriers are employed to transmit digital information. OFDM is particularly noted for its immunity to multipath propagation effects and its ease of implementation using the Fast Fourier Transform (FFT). Due to the many benefits that OFDM can provide, it has found widespread use in various commercial communications standards including IEEE 802.11a/g/n [1] and 3GPP Long Term Evolution (LTE) [2].

A typical wireless OFDM transmission will include, as well as the data, various forms of redundancy that are artificially added to the signal in order to aid the receiver in synchronising to the transmitted signal. For example, in IEEE 802.11a systems, the OFDM symbol includes special pilot subcarriers that are used for phase noise and residual frequency offset compensation after demodulation has been performed. A feature common to the majority of OFDM systems is the Cyclic Prefix (CP), which makes OFDM robust to multipath fading and simplifies the process of equalisation. These sources of redundancy introduce regularities into the statistics of the signal which can be exploited for the purposes of detection, even in low Signal to Noise Ratio (SNR) environments.

A random signal is called wide-sense cyclostationary if its mean and autocorrelation function are periodic. This property is directly related to redundancy that has been deliberately added to the signal and is, therefore, typically only a feature of man-made signals. This makes it particularly useful in distinguishing a Signal of Interest (SOI) from say thermal noise, which does not possess this property. Equally, the cyclostationary features of a particular signal are usually unique and can be used to identify it when confronted by interference from other man-made signals, provided that they do not share exactly the same cyclostationary properties. Much of the early groundwork on cyclostationary signal processing was conducted by William A. Gardner and colleagues [3]. A comprehensive review of existing literature on the subject can be found in [4].

In a landmark paper [5], the authors developed time and frequency domain statistical tests that can be used to identify a particular SOI based on its cyclostationary properties. In the intervening years since its publication, there have been numerous papers that have applied these algorithms. In [6] and [7], the authors discuss FPGA implementation aspects of the frequency domain cyclostationary test and demonstrate its applicability to OFDM systems. Also, the authors in [8] survey FPGA implementations of various detection algorithms including the time and frequency domain statistical tests and an autocorrelation feature detector. These do not appear to target Xilinx or Altera devices, making a comparison difficult. The authors in [9] discuss the effects of Carrier Frequency Offset (CFO), timing offset and multipath propagation on the performance of the frequency domain detector. It is shown that detection can be achieved with imprecise frequency synchronisation, without timing synchronisation and in the presence of multipath propagation. However, as Root Mean Square (RMS) delay spread is increased, the performance of the detector deteriorates.

In this paper we extend the analysis in [9] to include the effects of Phase Noise, I/Q imbalance and DC offset. We then implement the algorithm using HDL Coder [10], a powerful tool that enables a subset of the functionality of MATLAB and Simulink to be converted to a Hardware Description Language (HDL) such as Verilog or VHDL. We then target the design to the Xilinx Zynq xc7z020 device [11], which consists of both an ARM processor and an FPGA, concentrating exclusively on the FPGA part.

The rest of the paper is organised as follows. In Section 2, we review the cyclostationarity of OFDM signals, focussing particularly on IEEE 802.11a. The frequency domain statistical test detection algorithm is introduced in Section 3, followed by an analysis of its performance under various radio impairments in Section 4. In Section 5 we describe the implementation of the algorithm in HDL Coder, and conclusions are drawn in Section 6.

### II. CYCLOSTATIONARITY OF OFDM SIGNALS

A random process x(t) is called wide-sense cyclostationary if its mean and autocorrelation are periodic with fundamental cyclic period  $T_0$ , such that

$$\mu(t) = \mu(t + T_0) \tag{1}$$

$$R_{xx}(t,\tau) = R_{xx}(t+T_0,\tau) \tag{2}$$

with continuous time t and lag parameter  $\tau$ . Since the autocorrelation function is periodic, it can be decomposed into a Fourier Series as

$$R_{xx}(t,\tau) = \sum_{m=-\infty}^{+\infty} R_x^{m/T_0}(\tau) e^{j2\pi \frac{m}{T_0}t} , \qquad (3)$$

where m is the harmonic index and  $R_x^{m/T_0}(\tau)$  is the Cyclic Autocorrelation Function (CAF). The CAF is defined as

$$R_x^{m/T_0}(\tau) = \frac{1}{T_0} \int_{-\frac{T_0}{2}}^{\frac{T_0}{2}} R_{xx}(t,\tau) e^{-j2\pi \frac{m}{T_0}t} dt .$$
 (4)

Theoretically,  $R_{xx}(t,\tau)$  is obtained by performing an expectation operation. In practice,  $R_{xx}(t,\tau)$  and therefore  $R_x^{m/T_0}(\tau)$  have to be estimated from the data through temporal averaging, such that

$$\hat{R}_{x}^{m/T_{0}}(\tau) = \lim_{T \to \infty} \frac{1}{T} \int_{t-\frac{T}{2}}^{t+\frac{T}{2}} x(t) x^{*}(t+\tau) e^{-j2\pi \frac{m}{T_{0}}t} dt , \quad (5)$$

where T is the period of observation. We assume here that the fundamental cyclic period  $T_0$  is known. Since our interest is in digital systems, it is prudent to define the discrete time estimate of the CAF as follows

$$\hat{R}_x^{m/N_0}[\nu] = \lim_{N \to \infty} \frac{1}{N} \sum_{n=0}^{N-1} x[n] x^*[n+\nu] e^{-j2\pi \frac{m}{N_0}n} , \quad (6)$$

where N is the discrete observation interval,  $N_0$  is the discrete time fundamental cyclic period,  $\nu$  is the discrete lag parameter and n is the sample index. The discrete time cyclic frequencies are  $\alpha_m = m/N_0$  where m is in the range  $(-\infty, +\infty)$ . It is important to note that the CAF is simply a sampled version of the Discrete Fourier Transform (DFT) of the autocorrelation at a particular lag, i.e. sampled at the bins corresponding to the cyclic frequencies.

OFDM systems exhibit baud rate cyclostationarity due to the insertion of the CP. By taking the DFT of the autocorrelation function at  $\nu = N_u$ , where  $N_u$  is the useful OFDM symbol period, spikes appear at the overall OFDM symbol rate and its harmonics. In IEEE 802.11a systems, the symbol consists of a total of 80 samples; the useful symbol length is  $N_u = 64$  samples and  $N_g = 16$  samples where  $N_g$  is the length of the CP. Therefore, in IEEE 802.11a,  $N_0 = 80$ . The sampling rate is  $f_s = 20$ MHz, meaning that the positive cyclic frequencies are  $f_s \alpha_1 = 0.25$ MHz and its integer multiples. The detector requires that we sample at the correct rate and that we have knowledge of the symbol rate.

## III. FREQUENCY DOMAIN CYCLOSTATIONARY DETECTOR

We now describe a detector based on an algorithm first developed by the authors in [5]. An estimate of the CAF (6) can be obtained by taking the DFT (in a practical scenario the FFT) of the autocorrelation at a particular lag  $\nu$ . This is expressed as follows

$$F[k] = \frac{1}{N} \sum_{n=0}^{N-1} x[n] x^*[n+\nu] e^{-j\frac{2\pi kn}{N}}$$
(7)

where k is the DFT bin index. The CAF is estimated at  $k = \alpha$ where  $\alpha$  represents a particular cyclic frequency that we wish to exploit. In this paper we choose to exploit the fundamental cyclic frequency ( $f_s \alpha_1 = 0.25 MHz$ ) in our detector. The test statistic is formulated as

$$\hat{T} = [X[\alpha] \ Y[\alpha]] \Sigma^{-1} [X[\alpha] \ Y[\alpha]]^T$$
(8)

where  $X[\alpha]$  and  $Y[\alpha]$  are the real and imaginary parts of the estimate of the CAF, and  $\hat{\Sigma}$  is an estimate of the covariance matrix for two zero mean random variables. The covariance matrix is expressed as [6][7][8]

$$\hat{\Sigma} = \begin{bmatrix} \hat{E}[X[k]^2] & \hat{E}[X[k]Y[k]] \\ \hat{E}[X[k]Y[k]] & \hat{E}[Y[k]^2] \end{bmatrix}$$
(9)

where X[k] and Y[k] denote the real and imaginary parts of F[k] respectively, and  $\hat{E}$  is the expectation operator. The elements of the covariance matrix are obtained as follows:

Ì

$$\hat{E}[X[k]^2] = \frac{1}{N} \sum_{n=0}^{N-1} X[k]^2$$
 (10)

$$\hat{E}[X[k]Y[k]] = \frac{1}{N} \sum_{n=0}^{N-1} X[k]Y[k]$$
(11)

$$\hat{E}[Y[k]^2] = \frac{1}{N} \sum_{n=0}^{N-1} Y[k]^2.$$
(12)

As noted in [6], we can assume that  $\hat{E}[X[k]Y[k]] <<$  $\hat{E}[X[k]^2], \hat{E}[Y[k]^2]$ , leading to two approximations:  $\hat{E}[X[k]Y[k]]^2 \approx 0$  and  $\hat{E}[X[k]Y[k]] \approx 0$ . The validity of this assumption is discussed in [6]. After some manipulation (8) becomes

$$\hat{T} = \frac{X[\alpha]^2 \bar{E}[Y[k]^2] + Y[\alpha]^2 \bar{E}[X[k]^2]}{\hat{E}[X[k]^2] \hat{E}[Y[k]^2]}.$$
(13)

The test statistic is then compared to a pre-defined threshold chosen to satisfy a desired Probability of False Alarm ( $P_{fa}$ ). Under the null hypothesis, i.e. when Additive White Gaussian Noise (AWGN) is received, the test statistic is  $\chi_2^2$  distributed [6][7][8]. The threshold  $\eta$  for the detector is calculated as

$$\eta = F_{\chi_2^2}^{-1} (1 - P_{fa}) \tag{14}$$

If the test statistic exceeds the threshold, it is determined that cyclostationarity is present in the input signal.



Fig. 1. Pd Vs. SNR Curves for Various Radio Impairments

## IV. PERFORMANCE OF DETECTOR IN THE PRESENCE OF RADIO IMPAIRMENTS

In any practical situation, the assumption that we will receive the SOI plus noise only does not hold. In our study, we perform Monte Carlo simulations with an IEEE802.11a OFDM signal using Binary Phase Shift Keying (BPSK) in order to understand the effects of phase noise, I/Q imbalance and DC offset on the performance of the detector. We carry out 1000 trials in our simulations.

The incoming data is decimated by a factor of M = 16 before passing it into the detector. As noted in [7], decimating by an integer factor before applying the FFT allows us to increase the probability of detection using a fixed length FFT, albeit at the expense of an increase in overall sensing time. The detector uses an N = 1024 point FFT. The threshold was chosen in order to guarantee a  $P_{fa}$  of 5%. This corresponds to  $\eta = 5.991$  calculated using (14). Each impairment was simulated alongside AWGN with SNR ranging from -15dB to 0dB.

In order to understand the effect of phase noise on the performance of the detector we apply three phase noise levels at 100Hz from the carrier: (1) -100dBc/Hz, (2) -75dBc/Hz

and (3) -50dBc/Hz. These values represent a linear increase in phase noise severity. Fig. 1a) compares Probability of Detection ( $P_d$ ) vs. SNR curves for each of the above cases alongside the curve for an AWGN channel only. For cases (1) and (2) the detection performance is unaffected as compared with the AWGN channel. For case (3), a slight reduction in detection performance can be observed at SNRs lower than -8dB. Therefore, higher levels of phase noise may cause a decrease in detector sensitivity. However, the effect of phase noise on performance is essentially minimal, leading us to conclude that the detector can still function well in its presence.

We now study the effects of I/Q imbalance on the performance of the detector. This impairment is a result of direct down conversion from Radio Frequency (RF) to baseband. We choose to analyse a random selection of I/Q amplitude imbalances in the range -10dB to 10dB and phase imbalances between  $-30^{\circ}$  and  $30^{\circ}$ . Fig. 1b) shows the effects of the various amplitude imbalances on detection performance alongside the AWGN only channel. It can be seen that amplitude imbalance has very little effect on detection performance. Equally, Fig. 1c) shows the effects of various phase imbalances. Again, it is clear that variations of this parameter do not significantly impact the detection performance. I/Q imbalance can be corrected prior to the detector [12] or eliminated by avoiding the use of direct conversion receivers.

Finally, we consider the effect of DC offset on detection performance. This effect is also a by product of direct conversion to baseband. The detector was simulated with DC offsets of 0.2, 0.4 and 0.6 respectively. Fig 1. d) shows that the detection performance is degraded severely with increasing DC offset. Therefore, it is essential that DC offset is eliminated through the use of a DC removal filter or by avoiding direct conversion receivers.

Since [9] deals explicitly with the effects of Multipath Fading and CFO on the detection performance, we have chosen not to discuss these separately in our paper. However, Fig. 2 shows  $P_d$  Vs. SNR curves for floating point and fixed point implementations of the detector in the presence of an typical indoor fading channel with 150ns RMS delay spread [13], a random CFO in the range of -1/2 to 1/2 a subcarrier spacing (312.5kHz in IEEE 802.11a), a random phase noise level, a random I/Q imbalance and with DC Offset corrected. The wordlengths correspond to the HDL Coder implementation and are discussed further in Section 5.



Fig. 2. Pd Vs. SNR for floating and fixed point detectors with Impairments

We compare detection performance with an equivalent implementation in [6], which uses N = 2048 and M = 8, achieving 100%  $P_d$  for an SNR of -7dB in floating point and an AWGN channel only. It can be seen from Fig. 2 that both the floating point and fixed point implementations can achieve almost 100%  $P_d$  for an SNR of -5dB. This represents a performance drop of 2dB. This can be attributed mainly to the degradation of the correlation between the CP and the end of the OFDM symbol introduced by multipath propagation effects. As noted in [9], this effect becomes more pronounced with increasing RMS delay spread. However, it is clear that performance is nearly optimal down to an SNR of -7dB, leading us to conclude that the detector is largely robust to radio impairments.

TABLE I Resource Utilisation of Detector on Xilinx xc7z020 FPGA

| FPGA Resource | No. Used | No. Available | % Used |
|---------------|----------|---------------|--------|
| Flip Flops    | 12,208   | 106,400       | 11     |
| LUTs          | 9,062    | 53,200        | 17     |
| BRAMs         | 13       | 140           | 9      |
| DSP48s        | 66       | 220           | 30     |

TABLE II Resource Utilisation of Modified Detector on Xilinx xc7z020 FPGA

| FPGA Resource | No. Used | No. Available | % Used |
|---------------|----------|---------------|--------|
| Flip Flops    | 11,095   | 106,400       | 10     |
| LUTs          | 8,330    | 53,200        | 16     |
| BRAMs         | 13       | 140           | 9      |
| DSP48s        | 28       | 220           | 13     |

#### V. HDL CODER IMPLEMENTATION

In this section, we discuss an implementation of the detector using HDL Coder software. A high level block diagram of the detector is shown in Fig. 3.



Fig. 3. Block Diagram of Detector Implementation

The Decimation and FFT stages were implemented using dedicated blocks in HDL Coder. Elements of the covariance matrix were implemented using efficient Integrator Comb (IC) filters. The autocorrelation and test statistic calculation stages were implemented using a combination of product and addition blocks. Also, large delays were targeted to Block Random Access Memory (BRAM) to reduce the burden on the FPGA fabric.

For input to the detector, we assumed a 16 bit Analogue to Digital Converter (ADC). The wordlength grew to a total of 26 bits at the output of the detector, due to the various internal calculations. The coefficients of the decimation filter were represented using 10 bits. The threshold was stored as a constant and represented using an unsigned wordlength of 15 bits with 12 fractional bits. In [6] and [8], the division operation in (13) was implemented using an iterative shift and add algorithm. In [7], only the autocorrelation unit, the FFT unit and calculation of the elements of the covariance matrix are implemented on the FPGA. The division operation was implemented using Newton Raphson techniques in HDL Coder in our design. Table 1 captures the cost of the HDL Coder design after synthesis and implementation on the Xilinx xc7z020 FPGA in Vivado. The table lists the cost in terms of Flip Flops, Look Up Tables (LUTs), arithmetic blocks and BRAMs. It can be seen that 30% of DSP48s are consumed by the design, which is very costly. Also, it was only possible to achieve a maximum clock frequency of 79.3MHz. The division operation represented the main performance bottleneck in the design. Therefore, we made the following modification to the test statistic

$$A > \eta B \tag{15}$$

where  $A = X[\alpha]^2 \hat{E}[Y[k]^2] + Y[\alpha]^2 \hat{E}[X[k]^2]$  and  $B = \hat{E}[X[k]^2]\hat{E}[Y[k]^2]$ . This simplifies the computation in the receiver by eliminating the division operation.

Table 2 shows the resource consumption after applying the modification. It can be seen that the consumption of DSP48s has drastically reduced to 13% and less than 20% of each of the remaining resources are used. The design was able to achieve a maximum clock frequency of 113.6MHz, a vast improvement over the original implementation.

Fig. 4 shows an overlay of the HDL Coder output for both the original and modified detectors. The test vector consists of an IEEE 802.11a signal plus impairments with DC Offset compensated and an overall SNR of -7dB, followed by a block containing random noise samples.



Fig. 4. HDL Coder Output for Test Signal

It can be seen that when the input contains the SOI, the test statistic exceeds the threshold and is held for a duration of 1024 samples, i.e. the length of the FFT. It then drops below the threshold when the SOI is absent, demonstrating that IEEE 802.11a signals can be detected at very low SNR with radio impairments present. It is clear that both outputs are identical, confirming the validity of our modified detector. Initial latency is caused by the FFT block, the time required to calculate elements of the covariance matrix, and insertion of pipeline registers. At the decimated sample rate of 1.25MHz, this corresponds to a latency of approximately 2.5ms.

## VI. CONCLUSIONS

In conclusion, we have evaluated the performance of a detector for OFDM signals based on the frequency domain statistical test for the presence of cyclostationarity, first introduced by the authors in [5]. We found that the detector performed well in the presence of phase noise, I/Q imbalance, CFO and multipath fading with moderate delay spread. However, we found that DC offset has a detrimental effect on detection performance and must be mitigated by applying a DC removal filter or by avoiding the use of direct conversion to baseband in the RF hardware.

This paper has also discussed implementation of the algorithm using HDL Coder software. Various features included with HDL Coder were used to successfully target the design to a Xilinx xc7z020 FPGA device, including HDL optimised blocks and targeting of BRAM resources. We also proposed a simple modification to the test statistic calculation to avoid a costly division operation. Simulations demonstrated that it was possible to successfully detect the presence of a test signal generated in MATLAB using the fixed point HDL Coder model. We conclude that the algorithm can be compactly implemented on a modern day FPGA device.

#### REFERENCES

- IEEE Std 802.11<sup>TM</sup>-2012, IEEE Standard for Information Technology-Telecommunications and information exchange between systems-Local and metropolitan area networks-Specific requirements-Part 11:WLAN MAC and PHY specifications, 2012.
- [2] 3GPP TS 36.211 V8.4.0:"Evolved Universal Terrestrial Radio Access (E-UTRA); Physical Channels and Modulation (Release 8)" 2008-2009.
- [3] William A. Gardner, Cyclostationarity in signal processing and communications, 1st ed. New York, USA: IEEE Press, 1994.
- [4] W. A. Gardner, A. Napolitano, and L. Paura. "Cyclostationarity: Half a century of research," *Signal Processing*, vol.86, no. 4, pp. 639-697, Apr. 2006.
- [5] A. V. Dandawate and G.B. Giannakis. "Statistical tests for presence of cyclostationarity,"*IEEE Transactions on Signal Processing*, vol.42, no. 9, pp. 2355-2369, Sep. 1994.
- [6] V. Turunen, M. Kosunen et al, "Spectrum estimator and cyclostationary detector for cognitive radio," in *European Conference on Circuit Theory* and Design, 2009, Aug. 2009, pp. 283-286.
- [7] V. Turunen, M. Kosunen et al, "Implementation of Cyclostationary Feature Detector for Cognitive Radios," in 4th International Conference on Cognitive Radio Oriented Wireless Networks and Communications, 2009, Jun. 2009, pp. 1-4.
- [8] M. Kosunen, V. Turunen et al, "Survey and Analysis of Cyclostationary Signal Detector Implementations on FPGA," *IEEE Journal on Emerging* and Selected Topics in Circuits and Systems, vol.3, no. 4, pp. 541-551, Oct. 2013.
- [9] V. Sebesta, Roman Marsalek et al, "OFDM Signal Detector Based on Cyclic Autocorrelation Function and its Properties,"*Radioengineering*, vol.86, no. 4, pp. 926-931, Dec. 2011.
- [10] MathWorks, "HDL VHDL Coder: Generate Verilog and FPGA ASIC Code for and designs,"[Online]. Available: http://uk.mathworks.com/products/hdl-coder. [Accessed: Feb. 19. 2016].
- Xilinx All Programmable, "Zynq All Programmable SoC," [Online]. Available: http://www.xilinx.com/products/silicon-devices/soc/zynq-7000.html. [Accessed: Feb. 19, 2016].
- [12] L. Anttila, M. Valkama et al, "Blind Compensation of Frequency Selective I/Q imbalances in Quadrature Radio Receivers: Circularity-Based Approach," in *IEEE International Conference on Acoustics, Speech and Signal Processing*, 2007, Apr. 2007, pp. 245-248.
- [13] J. Medbo and P. Schramm, "Channel models for HIPERLAN/2," ETSI/BRAN document no. 3ERI085B.