Fast comparison of IS radar code sequences for lag profile inversion

Abstract. A fast method for theoretically comparing the posteriori variances produced by different phase code sequences in incoherent scatter radar (ISR) experiments is introduced. Alternating codes of types 1 and 2 are known to be optimal for selected range resolutions, but the code sets are inconveniently long for many purposes like ground clutter estimation and in cases where coherent echoes from lower ionospheric layers are to be analyzed in addition to standard F-layer spectra. The method is used in practice for searching binary code quads that have estimation accuracy almost equal to that of much longer alternating code sets. Though the code sequences can consist of as few as four different transmission envelopes, the lag profile estimation variances are near to the theoretical minimum. Thus the short code sequence is equally good as a full cycle of alternating codes with the same pulse length and bit length. The short code groups cannot be directly decoded, but the decoding is done in connection with more computationally expensive lag profile inversion in data analysis. The actual code searches as well as the analysis and real data results from the found short code searches are explained in other papers sent to the same issue of this journal. We also discuss interesting subtle differences found between the different alternating codes by this method. We assume that thermal noise dominates the incoherent scatter signal.


Introduction
In incoherent scatter radar measurements the desired range resolution is often much better than the available pulse length Correspondence to: M. S. Lehtinen (markku.lehtinen@sgo.fi) of the radar.In order to maximize the statistics of the results special coding methods like alternating codes have been developed to allow the radar to be used at full duty cycle and still produce range resolutions much shorter than the pulse length used (Lehtinen and Häggstrom, 1987).Typical features of these kinds of methods are: -Long pulses are modulated by a binary phase pattern, which gives range resolution approximately equal to the baud length of this pattern.
-This requires relatively little computation because the resolution is achieved by a simple summation/subtraction procedure of the corresponding lag products of all the codes in the code set.
Then there are the newer codes that have the advantage that the code set can contain fewer codes, especially important when the equivalent alternating code set has many codes (Virtanen et al., 2007;Damtie et al., 2002).These codes: 1. preserve the range resolution that the equivalent alternating code set provides, 2. but do not allow this resolution to be accessed by such a simple procedure as adding/subtracting corresponding lag products.
3. Instead they require a possibly computationally intensive inversion, and thus are practical now, but were not two decades ago.
4. Since these codes are found by a search technique, computationally intensive due to the vast number of possible code sets, it is necessary to make this process efficient.
5. The important thing is not whether data from a code group can be inverted, but rather how much the signal to noise ratio of the experiment is degraded by the inversion.
Published by Copernicus Publications on behalf of the European Geosciences Union.
6. Since different code sets degrade the SNR by vastly different amounts, a very important part of the search technique is the evaluation of the degradation.
7. This paper describes a fast method for doing this, a method that is not necessarily as accurate as slower methods, but nonetheless is good enough to allow good code sets to be separated from the vast number of bad code sets.
We also use the developed comparison method to discuss some interesting differences found between strong alternating codes, weak alternating codes (Lehtinen and Häggstrom, 1987) and alternating codes of type 2 (Sulzer, 1993), seen in the way they perform in terms of variance behavior when post-integrating lag profile data to coarser resolutions.The actual (accurate) method for decoding the codes is to apply linear statistical inversion to the lag profiles measurements, which represent the desired unknown plasma scattering lag profiles convolved with the range ambiguity functions corresponding to the different lagged products of the codes in the code group under study.If the lag profiles are very long, the faster method of inversion can be based on Fourier transforms of the profiles.These methods will fail near the ends of the profiles, but they provide us with a much faster way to estimate the variance behaviour of the inversion process with results valid if we do not worry about what happens near the ends of the profiles.Because the code groups are searched with time-consuming computer searches the speed obtained this way is essential.
The linear statistical inversion based method is described in Virtanen et al. (2007).These methods are developed towards general-purpose experiments in another paper in this special issue (Virtanen et al., 2008).There are also other related papers in this special issue: Another paper (Vierinen et al., 2008) describes the efficient search algorithms used for actual searches of the code quads.We also discuss the advantages of using polyphase codes and amplitude modulation instead of binary modulation.
We discuss the possibility of doing inversion analysis of echo amplitude data to solve for the scattering amplitudes instead of correlation properties of the target (Vierinen et al., 2008b).This way of analysis is advantageous with very narrow targets like meteor heads and narrow ionospheric layers.
We have shown that the new experiments can be used together with advanced MCMC (Markov Chain Monte Carlo) techniques combined with ionospheric models to do analysis in cases beyond the capabilities of traditional GUISDAP analysis or the lag profile inversion (Kero et al., 2008).
We have also generalized binary alternating codes to a general number of phases (Markkanen et al., 2008).This paper is purely a generalization of the classical way of coding and decoding by summation -with no need for inversion methods.

Code selection
The focus of the code selection is to find the code sequences that produce minimum posteriori variances for the inverted lag profiles.Starting from the scattering process happening in the ionosphere it is possible to derive an analytical formula for the posteriori variance of any lag value for any code sequence.For each code length and each lag value the variance has a theoretical minimum.By comparing the calculated variances of different codes to the minimum variances at all lag values the best codes can be selected.
We use here a discrete formalism, where all times (and ranges which are also specified as corresponding times) are represented as integers.The discrete formalism can be derived from the continuous-time formalism developed in (Lehtinen, 1986;Lehtinen and Huuskonen, 1996) by studying samples of signals given by z i = 1 t (i+1) t i t z(t)dt in the limit t→0.

Scattering from the ionosphere
The scattering coefficient of an ionospheric plasma element is noted with x r,t , where r is signal travel time to the scattering volume and back and t is time.The scattering is modeled as a zero mean stochastic process with covariance x r;t x r ;t = δ rr σ t−t r , where δ rr is the Kronecker delta and x is the complex conjugate of x.Plasma scattering properties are modeled by the plasma autocorrelation function σ t−t r .In a practical incoherent scatter experiment the illumination by radio waves is not continuous, but the radar sends phase coded pulses to the ionosphere.The scattered signal is recorded in the form of discrete IQ-samples (raw data).
If the discretization step t is small enough, the amplitude ambiguity function for range simplifies to be equal to the transmission waveform env t and thus the received complex signal is a weighted sum of the scattering coefficients where env q is the complex transmission envelope and ε q the measurement error due to thermal noise.It's correlation properties are expressed by where C is a constant depending on the amplitude of the thermal noise.

Lag profiles and inversion
In an incoherent scatter radar experiment lagged products of the received signal and its complex conjugate, are calculated.Using Eq. ( 1) the expectations of the lagged products can be written in the form The expectations of the terms containing measurement errors are zeros.
A range profile of plasma autocorrelation function σ τ q with a fixed value of time lag τ is called a "lag profile".In the analysis several lag profiles σ τ q with selected values of τ are solved as separate problems to sample the whole plasma autocorrelation function.The difference between a measured ambiguous lagged product and its expectation is The error term in Eq. ( 6) is rather complicated.The fourth moment's theorem for complex Gaussian processes imply ε q;τ ε q ;τ = z q z q −τ z q z q−τ ε q;τ ε q ;τ = z q z q z q−τ z q −τ .
These formulas, represented in terms of the real and imaginary parts of the lag profile expectations m q;τ can be found in the references cited above.The mutual correlations of the error term ε q;τ between both different lags and different ranges can be fully explained by these formulas.This is the correct way to model the self-noise of the lag profile estimates, when the signal-to-noise ratio is not very low.
For the purposes of this study, we assume a very low signal-to-noise ratio, and the error term simplifies to The Fourier transform of the measured lag profile is as indicated by the convolution theorem of the Fourier transforms.The Fourier transform of the plasma autocorrelation function σ τ r is not the plasma scattering spectrum, because the transform is taken with respect to the range variable r, not the time lag τ .
The inversion solution (and the error term) of the Fourier transform of the unknown lag profile is . (10)

Posteriori noise
Now lets take a closer look at the transform of the noise, i.e. the last term in Eq. ( 9).Using Eq. ( 8) its covariance can be written as The latter term in Eq. ( 10) is the Fourier transform of the posteriori noise.By taking the inverse transform we get the posteriori noise in time domain The covariance of the posteriori noise in Eq. ( 12) is The previous holds for a single transmission envelope that leads to a single range ambiguity function for each lag value.
In case of several envelopes, i.e. a code sequence longer than one, each of the envelopes produces a different range ambiguity function for each lag value.From a sequence of N codes, equations u = 1 . . .N, are produced.Each of these range ambiguity functions produce an equation similar to Eq. ( 6) and has a Fourier transform The inversion solution (and the error term) for the Fourier transform of the lag profile using all the measurements can now be written as where ετ (ω) is the Fourier transform of a normalized white noise.The posteriori noise covariance is From Eq. ( 18) we get the posteriori variance of the inversion result that has a minimum 2.4 Proof of the accuracy limit It is surprisingly easy to see that the inequality (20) must be true.For this, let us assume that the only unknown lag profile value is σ τ q and that we know all the other values of σ τ p , if p =q.Our only unknown then appears in the measurement Eq. ( 15) multiplied by all the values W u r;τ with 1≤u≤N and −∞<r<∞.
As the error terms in Eq. ( 20) are assumed normalized and independent, the inversion accuracy for this only unknown variable is easily seen to be equal to the right hand side of Eq. ( 20).
If we now drop the assumption about exact knowledge of the other scattering terms σ τ p , p =q, the estimation accuracy of the previously only unknown cannot get any better.Thus, the inequality in Eq. ( 20) must be true.
The inequality ( 20) is also a consequence of the wellknown mathematical fact that the harmonic mean of numbers is smaller or equal to the arithmetic mean of the numbers.

Code selection criteria
Now we have finally arrived to the formulas that can be used in comparing different code sequences.The efficiency of a code is defined as ratio of the minimum posteriori variance as defined in Eq. ( 20) and the actual posteriori variance of the code under study, In practical calculations the Fourier transform of the range ambiguity function can be handled with DFT.When the integral in the denominator is written in discrete form using Riemann sum, we get the form In the case of binary code groups the nominator can be simplified even further.The amplitude of range ambiguity function is |W u r;τ |=1 in all non-zero parts of the function.If the codes have N b bits and only the full lags (multiples of the bit length) are studied, Eq. ( 22) can be written in form

Range integration
As lag profile inversion is usually not performed with the best possible range resolution, but range integration is used at least at larger ranges, it is useful to investigate how the code sequences behave in the integration process.Whereas the maximum range resolution is achieved by using the original range ambiguity samples in the inversion process, convolutions of range ambiguity function and an averaging function ψ are used to get the range integrated values.The convolution gives an extra term to the posteriori variance (Eq.19), so that the variance of the integrated lag profile is where ˜ (ω) is the Fourier transform of the averaging convolution kernel .The discrete version of this formula is easily obtained by replacing 1/2π by 1/M and the integral over ω by summation over the discrete FFT values ω i .A very natural choise for the averaging kernel would be the boxcar function However, it was discovered that the boxcar shape performed very badly.For this reason we defined a triangular averaging function by setting and Moreover, a sinusoidal shaped averaging function as well as an iterated sinusoidal averaging function (rather close to boxcar) were defined by All these kernels are shown in Fig. 1.

Numerical results
As our goal was to find a code sequence of 15 bauds of 10 µs duration, we first evaluated such a code set of randomly chosen binary bauds.The results for the randomly chosen code set are shown in Fig. 3.We have plotted the variances of the experiments as a function of range integration.The results from the simple and fast formula ( 22) or ( 23) are shown by the red curve and the results for more careful calculations, modelling the signal values as 5 samples taken for each baud by the blue curves.The three different rows of the curves correspond to a different number of fractional lag profiles taken to represent each full lag value.The formulas used for both cases are the same.The main difference is that in the simple case we use 1 discrete sample to represent each baud, while in the more realistical case we use 5 samples to represent each baud.Moreover, in the more realistic case the full lags are multiples of 5 samples and each of them is estimated using fractional lags around this exact multiple of 5 samples.
It can be seen that realistic curves rather well follow the fast formula ( 22) or ( 23), but that there are occasionally even significant discrepancies between them.However, we feel this anyway justifies the use of rapid formulae in extensive computer searches.

Careful explanation of the result figures
The main contribution of this paper, in addition to Eqs. ( 22) or ( 23) and ( 24) are the numerical results shown in form of matrices of curves for different situations.As these graphics contain a great wealth of information, it is necessary to carfully explain everything shown in them.
The figures consist of three of five rows of panes.The first row shows the pulse forms and the range ambiguity functions.The second row shows the squared moduli of the Fourier transforms of the individual range ambiguities, normalized by dividing by the number of codes and the sum of these individual Fourier transforms.This sum can be considered as the average Fisher information of the inverse convolution problem.The last 1 or 3 rows show how the results behave when range-integrated to various resolutions.We show the behaviour for both the simplest formula as the red curve as well as a set of other curves -calculated more carefully by discretizing the analysis to 5 steps for each baud in the code.We always show the results for calculation of fractional lags ±2, ±1 and 0 and in the 5-row panels also for other kinds of fractional lag sets involved in the analysis.

First line of figure panes
Here we simply show the range ambiguity functions for all full lags.LAG 0 is different, as the range ambiguity function is just a constant of the length of the pulse itself and it is not very informative to show it.Instead, we plot the codes themselves in that position.To emphasize the difference, we plot the code in blue.At the top of these panels we also print the relevant full lag number.

Second line of figure panes
On the second line of figure panes we plot the average of the squared moduli of the Fourier transforms of the ambiguity functions, calculated by simple (no fractional samples) formulas.As the convolution inverse problem diagonalizes on the Fourier side, this curve can be understood as the representation of the diagonal Fisher information matrix of the inverse problem of deconvolving the ambiguity functions.This function appears in the denominator of the integral in Eq. ( 19).If the ambiguity is a Kronecker delta of unit magnitude, the corresponding Fisher information is a constant one in Fourier side.Thus, a constant one represents a set of codes representing exactly the same kind of information that can be obtained from a single unit-strength pulse pair.A constant unit value here represents a perfect code set in the sense that the statistical properties of the analysis result are exactly equal to those corresponding to a simple pulse pair.However, we have normalized this curve by dividing it by the length of the ambiguity function -so it actually means that a constant unit value means pulse compression perfectly done with a apparent power corresponding to the length of the ambiguity function.
Please, note that the frequency axis is that of the FFT software: zero frequency corresponds to the left side of the figure while small negative frequencies are found on the right side of the x-axis.The center of the x-axis corresponds to the highest frequencies seen with the time discretization chosen.
The Fourier side information is easily interpreted in terms of what happens with range integration.If we have much information at zero or low frequencies, it is obvious that range integration should perform well.This happens clearly in lag 8 in Fig. 4. Also, zero lags integrate well if fractional lags are taken into account.For the simple discrete case zero lags are singular and integration cannot help, except for the case of integration length equal to pulse length, which is seen as the downward spike on the red curve in some panels.On the other hand, having little information at DC results in very bad range integration results.This is remarkably clear for lags 5 and 10 in Fig. 3.

Last rows of figure panes
The last 1 row or 3 rows of the figure panes show the behaviour of the codes, when range-integrated to resolution between 1 and 20 baud lengths.If we show 3 rows, the difference between those is in the number of fractional lags we include to form the final estimate for each full lag.The Fig. 3. Evaluation of a randomly chosen 4-code binary sequence of length 15.In the first row of panels the ambiguity functions for each lag are shown, corresponding to all full lag values.For lag 0 we show the codes themselves (drawn in blue), as the ambiguity functions would be just constants shapes of length equal to the length of the total pulse.The second row of panels shows the average of the squared moduli of the Fourier transforms of the ambiguity over the code set (red) and also the individual terms in this average shown by the darker red lines in bottom.The lower three panels show the behavior of variances of lag profiles if integrated to different resolutions between 1 and 20 baud lengths.The fourth row corresponds to using fractional lags of ±2, ±1 and 0 to represent each full lag, while in the third row we have used fractional lags ±1 and 0 and in the fifth row fractional lags ±3, ±2, ±1 and 0. The red curves in all the rows correspond to a simple calculation using no fractional lags at all, while the blue curves correspond to more realistic analysis, all using the fractional lags specified but 3 different kernels for range integration.The simple red curves are used for fast computer searches and we see that they rather well approximate the more realistic variance estimates.
alternatives are fractional lags ±1 and 0 (3rd row), ±2 ±1 and 0 (4th row) and ±3 ±2 ±1 and 0 (5th row).If only one row is shown, the fractional lags are ±2 ±1 and 0, which is the most natural choise, as for 5 samples for each baud this results in each fractional lag profile being used once and only once for each full lag profile estimation.
In each of these figures we have plotted in red the simplest possible result for range integration with only full lags taken into account in Eq. ( 24).This is to make it easy to compare the simple formula to the more careful variance calculations shown by the thick blue, thin blue and dark blue curves corresponding to more careful calculations, discretized at 1/5 of the baud length and using the different range integration kernels defined in Sect.2.6.

Normalization of the curves
It is important to understand that we have normalized all the variance curves by dividing them with the number of independent estimates going into the analysis.These factors include: -Number of codes in the sequence -Number of fractional lags -Range integration Also, the constant C in Eq. ( 3) is chosen to be equal to the baud length, so that for the simplest situations -discretized to the baud length -the signal thermal noise is assumed to be of unit variance.In the more accurate calculations discretizing the signal so that each baud corresponds to N flag samples, the signal variances are equal to N flag as it should be for sampled white noise.
The reason for these choises is to make the scales of the different curves comparable.All variances shown in the curves should actually be equal to 1 if they would be calculated by simple estimates just taking into account the number of independent data going to the analysis.The difference from 1 is then a measure of non-perfectness of the inversion decoding process and the range-post-integration process caused by correlations of the errors of the inversion process as well as detailed behaviour of the ambiguity functions and the post-integration kernel.
Without this normalization the values of the variance curves at range-integration 20 should be 1/20 of the value they have at range-integration 1, which would make these curves difficult to read.This way the reader may understand the curves as additional correction factors, which must be  While not perfect, the variances as function of range integration perform significantly better than in the previous figure with a randomly chosen code.For lags 2, 8, 10, 11 and 12 the variances go down faster than the range integration would imply.This is connected with the fact that the Fourier side information curve (red curve in the second row of panels) shows high values for DC and low frequencies.included after simple scalings taking into account the above listed factors are used.

Numerical data shown in the panes
In the panes we have also helped the inspector by showing some numeric data relevant to the curves.We have shown the R value defined in (Lehtinen et al., 2004) and (Damtie et al., 2002) to describe the close-to-perfectness of a code (or here a code set) on the panels of the second row for each lag profile.There we also print the minimum and maximum values of the Fourier side information curve, to make it easier to see how much it differs from a constant 1.

Conclusions
We have developed a fast method useful in comparing variances of lag profiles and range-integrated lag profiles using sets of codes in incoherent scatter radar measurements.We have shown that faster and more approximative ways to apply the method produce approximatively the same kinds of results as better modeled and more realistic ways to apply the method.This makes it possible to use the method for extensive computer searches of codes, where speed is essential.
We have also shown that the behavior of a new 4-pulse 15-baud code (Fig. 4) is almost as good as known methods of same number of bauds but with much larger number of pulses in the cycle.
In addition, we have found subtle differences in the behavior of the previously known different types of alternating codes and have seen that randomly chosen code sets need to be very long to produce equally good results as the other methods discussed.
if τ ≥0 and τ ≥0.The lag profile estimates are thus independent of each other if either the lag values or the range values differ.The variance of the zero lag profile is two times the variance of the other lag profiles.Negative values of the lags are completely dependent on the positive values, being complex conjugates of each other.

Fig. 2 .
Fig. 2. Evaluation of a simple multiple-pulse type code.

Fig. 4 .
Fig.4.Evaluation of the chosen code for a 15-baud 4-pulse experiment.While not perfect, the variances as function of range integration perform significantly better than in the previous figure with a randomly chosen code.For lags 2, 8, 10, 11 and 12 the variances go down faster than the range integration would imply.This is connected with the fact that the Fourier side information curve (red curve in the second row of panels) shows high values for DC and low frequencies.

Fig. 5 .
Fig. 5.An alternating code set made from a 16-bit strong alternating code by truncating to 12 bits.We have only shown the most natural fractional lag set: ±2/5, ±1/5 and 0. It is clear that all lags ≥2 display a perfect decoding behaviour -they behave as if plain 2-pulse pairs corresponding to all 2-pulse-pairs in the set would have been available.

Fig. 10 .
Fig.10.A set of 150 randomly chosen pulses of 15 bauds each.This many pulses are necessary to make random code sets to be comparable to alternating code sets.
Range integration averaging kernels used in the calculations.The boxcar kernel (magenta) was not used in the subsequent calculations.The triangle kernel (thick blue) seemed to perform best while the sin-shaped kernel (thin blue) and the iterated sinshaped kernel (dark blue) performed slightly worse.The results corresponding to the last three kernels are shown with correspondingly colored curves in the results.