# Novel serial code concatenation strategies for error floor mitigation of low-density parity-check and turbo product codes

# Nouvelles stratégies de concaténation de codes séries pour la réduction du seuil d'erreur dans le contrôle de parité à faible densité et dans les turbo codes produits

Damian A. Morero and Mario R. Hueda\*

This paper presents a novel multiple serial code concatenation (SCC) strategy to combat the error-floor problem in iterated sparse graph-based error correcting codes such as turbo product-codes (TPC) and low-density parity-check (LDPC) codes. Although SCC has been widely used in the past to reduce the error-floor in iterative decoders, the main stumbling block for its practical application in high-speed communication systems has been the need for long and complex outer codes. Alternative, short outer block codes with interleaving have been shown to provide a good tradeoff between complexity and performance. Nevertheless, their application to next-generation high-speed communication systems is still a major challenge as a result of the careful design of long complex interleavers needed to meet the requirements of these applications. The SCC scheme proposed in this work is based on the use of short outer block codes. Departing from techniques used in previous proposals, the long outer code and interleaver are replaced by a simple block code combined with a novel encoding/decoding strategy. This allows the proposed SCC to provide a better tradeoff between performance and complexity than previous techniques. Several application examples showing the benefits of the proposed SCC are described. Particularly, a new coding scheme suitable for high-speed optical communication is introduced.

Cet article présente une nouvelle stratégie de concaténation de codes séries (SCC) multiples permettant de s'affranchir du problème de seuil d'erreur dans les codes correcteurs d'erreurs tels que les Turbo codes produits (TPC) et les codes contrôle de parité à faibles densité (LDPC). Bien que dans le passé, la SCC ait été largement utilisée pour réduire le seuil d'erreur dans les décodeurs itératifs, le principal obstacle pour son implémentation pratique dans les systèmes de communications haut débit a été le recours à des codes extérieurs longs et complexes. De courts codes en blocs extérieurs avec entrelacement ont montré qu'ils peuvent fournir un bon compromis entre la complexité et la performance. Cependant, leur implémentation dans la prochaine génération de systèmes de communications haut débit reste un défit majeur vu que le résultat d'une conception minutieuse des entrelaceurs longs et complexes doit remplir les exigences de ces systèmes. Le schéma SCC proposé dans ce travail repose sur l'utilisation de courts codes en blocs extérieurs. Partant des techniques existantes, le code extérieur long et l'entrelaceur sont remplacés par un simple code en bloc combiné à une nouvelle stratégie de codage/ décodage. Ainsi, par comparaison aux techniques courantes, la stratégie SCC proposée fournit un meilleur compromis entre la performance et la complexité. Plusieurs exemples d'applications sont donnés et montrent les avantages de la stratégie osc proposée. En particulier, nous présentons un nouveau schéma de codage adapté aux communications optiques haut débit.

Keywords: concatenated codes; error correction codes; high-speed optical communication; product codes; turbo codes.

# I Introduction

In future high speed communication systems (e.g., next generation optical transport networks (OTN)), forward error correction (FEC) codes with net coding gains (NCGs)  $\geq \! 10 \; \text{dB}$  at a bit error rate (BER) of  $10^{-15}$  with an overhead (OH) as low as possible (e.g.,  $\sim 20\%$ ) are mandatory [1]–[3]. Given their superior performance and suitability for parallel processing, large block size low density parity check (LDPC) codes and turbo product (TP) codes have been considered as FEC coding schemes of choice for ultra-high speed transmission systems. Unfortunately, these iterative coding schemes usually have error floor problems which significantly degrade their performance at low BER.

Manuscript received January 27, 2013; accepted March 7, 2013 \* D. A. Morero and M. R. Hueda are with Laboratorio de Comunicaciones Digitales - Universidad Nacional de Cordoba - CONICET. Av. Velez Sarsfield 1611 - Cordoba (X5016GCA) - Argentina; Email: dmorero@gmail.com, mhueda@gmail.com

This work was supported in part by Fundación Fulgor. Associate Editor managing this paper's review: S. Yousefi

Numerous techniques have been proposed in the literature to lower the error floor [1]–[5]. These techniques can be divided into two categories. The first one aims at eliminating all weaknesses in the decoder algorithm that create the error floor, while the second one aims at correcting the residual error pattern by adding an outer code. The first category comprises several post-processing [1], [4] and improved decoding algorithms [5] which are mainly proposed for LDPC codes. The design and performance evaluation of these algorithms may be difficult since the knowledge of both the weight and structure of the dominant error patterns is required. On the other hand, the addition of an outer code provides a simple and more general solution to the error floor problem. In particular, its performance can be estimated based on the knowledge of the weights and the probabilities of the error patterns. For these reasons, several SCC FEC schemes for 100 Gigabits per second (Gb/s) OTN applications have been proposed (see [2]–[3] and references therein). In [2], it is experimentally shown that a 20.5% concatenated code based on an inner LDPC and an outer Reed-Solomon (RS) code achieves an NCG of 9 dB at a BER=  $10^{-13}$ . The concatenation of two hard-decision block codes with an LDPC is another alternative proposed in [2]. The total overhead of



Figure 1: Encoding of SCC-1.

this triple-concatenated approach is 20% and the expected NCG is 10.80 dB at a BER= $10^{-15}$ . In [3], the authors present a concatenated LDPC+RS coding scheme with 20.5% OH and NCG=11.3 dB at a BER= $10^{-15}$ .

The use of short outer block codes with interleaving has also been considered in the past to (i) improve the performance and (ii) reduce the error floor [6]. Based on the structure of the dominant error patterns, the interleaving-based SCC solution is able to achieve a good tradeoff between performance and complexity. However, the evaluation of the dominant error patterns is highly complex in numerous codes of practical interest as a result of the very low BER and the high NCGs required in high-speed applications such as OTN. To avoid the evaluation of the structure of the error patterns, long pseudo-random interleavers can be used [6]. Unfortunately, long interleavers significantly increase not only the implementation complexity but also the latency. Therefore, the use of SCC with interleaving in future high-speed transmission systems is still a major challenge.

The present paper describes a novel SCC strategy designed to reduce the error floor problem in very high-speed communication systems. The key ingredient of our technique is the replacement of the long outer code and interleaver by simple block codes in combination with a novel encoding/decoding strategy [7]. Based on this finding, we demonstrate that an error floor reduction of LDPC codes in multigigabit applications can be achieved with a drastic reduction of complexity<sup>1</sup> (e.g., one order of magnitude) in comparison with existing SCC solutions. As a second contribution of this work, we extend the new SCC strategy to combat the error floor caused by both (i) the nearcodewords<sup>2</sup> with non-zero syndrome [8] and (ii) the minimum distance codewords. The latter allows the new SCC technique to be efficiently used with TP codes (TPC). In particular, we show that a TPC composed by two extended Hamming (EH) codes, in combination with the proposed SCC algorithm, can achieve an NCG of ~11.2 dB at BER  $=10^{-15}$  with  $\sim 22$  % total overhead and error floor at  $\sim 7 \cdot 10^{-17}$ . It is important to highlight that this NCG is approximately 0.4 dB higher than that achieved by TPC schemes reported in past literature [2],[9]–[10].

The rest of this paper is organized as follows. Section 2 introduces the basic notation used along the paper and describes the classical SCC schemes proposed for ultra high speed transmission systems. The new SCC technique is described and analyzed in Section 3. An example of the use of the proposed SCC for error floor reduction of TPC with application in high-speed optical communication is presented in Section 4. Finally, Section 5 reviews the main conclusions of the paper.

# II Background

This section introduces basic concepts and notation used in the paper. Let  $\Omega$  be the set of all possible error-patterns at the output of the inner decoder. An *error-pattern* is defined as the set of all bits in error that jointly take place in one received codeword. Let  $p(\omega)$  be the probability of a certain error pattern  $\omega \in \Omega$  at a given signal-to-noise ratio

(SNR). The exact word error rate (WER) due to  $\Omega$  is defined by

$$P_{w}(\Omega) = \sum_{\omega \in \Omega} p(\omega), \tag{1}$$

while the transmission BER,  $P_b(\Omega)$ , and the information BER,  $\tilde{P}_b(\Omega)$ , are given by

$$P_b(\Omega) = \frac{1}{n} \sum_{\omega \in \Omega} p(\omega) w(\omega), \tag{2}$$

$$\tilde{P}_b(\Omega) = \frac{1}{k} \sum_{\omega \in \Omega} p(\omega) \tilde{w}(\omega), \tag{3}$$

where n and k are the length and the dimension of the code, respectively, while  $w(\omega)$  and  $\tilde{w}(\omega)$  are the weight (including the redundant bits) and the *information*-weight (which does not include the redundant bits) of the error pattern  $\omega$ , respectively. Set  $\Omega$  can be divided into two disjoint subsets  $\Omega$  and  $\Omega$  (i.e.,  $\Omega = \Omega \cup \Omega$  and  $\Omega \cap \Omega = \emptyset$ ), where  $\Omega$  is the set of all error-patterns that causes the error floor and  $\Omega$  are the non-problematic error patterns. From (1) and (3), it is simple to derive the following upper bound for the information bit error rate due to  $\Omega$ :

$$\tilde{P}_b(\mathfrak{Q}) = \frac{1}{k} \sum_{\omega \in \mathfrak{Q}} p(\omega) \tilde{w}(\omega) \le \frac{w_{max}}{k} P_w(\mathfrak{Q}), \tag{4}$$

where  $w_{max} = \max_{\omega \in \Omega} \{ \tilde{w}(\omega) \}$  [11]. Parameters  $w_{max}$  and  $P_w(\Omega)$  will be used throughout this paper to design various code concatenation schemes.

# II.A Serial Code Concatenation (SCC)

Code concatenation is a known FEC technique based on the combination of an inner code and an outer code [6], [12]–[14]. Let  $\mathbb{C}_1$  and  $\mathbb{C}_2$  denote the inner and outer code, respectively. Each code is defined by the set of parameters  $[n_j,k_j,d_j]$ , where  $n_j$   $k_j$  and  $d_j$  are the block size, the dimension, and the minimum distance of the code  $\mathbb{C}_j$  respectively. The overhead of the code is defined as  $\Theta_j = (n_j - k_j)/k_j$ . Let  $C_j^i$  denote the ith codeword of  $\mathbb{C}_j$ . A codeword  $C_j^i$  is composed by the information data block  $D_j^i$  of length  $k_j$ , and the parity block  $P_j^i$  of length  $r_j = n_j - k_j$ . In high-speed communication systems, the inner code  $\mathbb{C}_1$  is generally an LDPC or TP code, while the outer code  $\mathbb{C}_2$  is a block code with error correction capability  $t_2 = \lfloor (d_2 - 1)/2 \rfloor$  designed to eliminate or reduce the error floor of  $\mathbb{C}_1$ .

# SCC Scheme I (SCC-I)

Figure 1 shows a classical serial code concatenation scheme denoted here by SCC-I. The encoding process is composed of two steps. First, the uncoded frame is divided into m blocks of  $k_2$  bits denoted  $D_2^i$  for  $i=1,\ldots,m$  (e.g., m=3 in Fig. 1). Each  $D_2^i$  block is encoded by  $\mathbb{C}_2$  generating the codeword  $C_2^i$ . In the second step, the codewords  $C_2^i$  are used as the dataword of code  $\mathbb{C}_1$  (i.e.,  $D_1^i=C_2^i$ ) and they are encoded by  $\mathbb{C}_1$  generating the codewords  $C_1^i$  that will be transmitted. In order to eliminate the error floor of  $\mathbb{C}_1$  generated by the error patterns  $\Omega$ ,  $\mathbb{C}_2$  must correct at least  $w_{max}$  bits. Note that  $\mathbb{C}_2$  will also correct the error patterns  $\omega \in \Omega$  with  $\widetilde{w}(\omega) \leq t_2$  but it may introduce additional errors in the error patterns  $\widetilde{\Omega} = \{\omega \in \Omega : \widetilde{w}(\omega) > t_2\}$ . Since the decoder for  $\mathbb{C}_2$  modifies a maximum of  $t_2$  bits, the maximum number of additional errors introduced over the error patterns  $\widetilde{\Omega}$  is no larger than  $t_2$ . Therefore, as a worst case scenario, the information BER  $\widetilde{P}_b(\Omega)$  is increased by a factor  $(2t_2+1)/(t_2+1) < 2$ . This penalty can be neglected at very low BER (such as  $10^{-15}$ ) where the

<sup>&</sup>lt;sup>1</sup>In this paper we use the equivalent number of gates of different logic cells (e.g., AND, XOR, etc.) to compute the complexity of a given SCC approach.

<sup>&</sup>lt;sup>2</sup>Let C be a binary linear code of length n. An (a, b) near-codeword is a binary vector of length n and Hamming weight a whose syndrome has weight b [8]. Particularly, if b = 0, (a, b) is a codeword. The error-floor in LDPC codes is typically dominated by (a, b) near-codewords where b is small but higher than zero, and a is lower than the minimum distance of the code.



Figure 2: Encoding of SCC-I with interleaving.



Figure 4: Encoding of the proposed SCC technique.

slope of the BER vs. SNR curve is very high (in particular for codes with performance close to the Shannon limit).

Figure 2 shows a variation of SCC-I where an interleaver is introduced between  $\mathbb{C}_1$  and  $\mathbb{C}_2$ . Depending on the structure of the error pattern of  $\mathbb{C}_1$ , it may be possible to design a proper interleaver to divide each error pattern into several codewords of  $\mathbb{C}_2$ . In this way, the correction capability required for  $\mathbb{C}_2$  can be relaxed.

# SCC Scheme II (SCC-II)

The SCC-I scheme can be generalized as depicted in Fig. 3. This scheme, denoted SCC-II, uses a longer outer code  $\mathbb{C}_2$  to protect a frame of m inner datawords (e.g., m=3 in Fig. 3). Let  $t_2=\tau \cdot w_{max}$  be the error correction capability of  $\mathbb{C}_2$ . Then, the error floor is eliminated if  $\tau=m$ . On the other hand, if  $\tau < m$  the error floor is reduced but not eliminated. Based on the binomial distribution [11], the residual error floor  $\tilde{P}_b^{(II)}(\mathfrak{Q})$  of SCC-II can be approximated by

$$\tilde{P}_b^{(II)}(\mathfrak{Q}) \approx \sum_{i=\tau+1}^m \frac{i \cdot w_{max}}{m \cdot k_1} {m \choose i} [P_w(\mathfrak{Q})]^i [1 - P_w(\mathfrak{Q})]^{m-i}.$$
 (5)

In virtually all applications  $P_w(\mathfrak{Q}) < 10^{-4}$ ,  $w_{max} < 50$  and  $k_1 > 500$ . Therefore, by choosing m = 20 and  $\tau = 4$  it is possible to reduce the error floor below  $10^{-17}$ . Furthermore, because the required value of  $\tau$  is significantly lower than m, the outer code overhead of SCC-II is much lower than that of SCC-I. However, this advantage comes at the expense of implementing an outer code m-times longer with a correction capability  $\tau$ -times higher. Unfortunately, this complexity increase makes the use of SCC-II prohibitive in most high speed applications.



Figure 3: Encoding of SCC-II.



Figure 5: Decoding process of the proposed SCC.

# III New serial code concatenation strategy

This section describes a novel SCC scheme to combat the error floor. The new technique is able to achieve an error floor reduction similar to that accomplished by the SCC-II. However, the new approach builds on short outer block codes as in SCC-I; therefore the implementation complexity can be significantly reduced.

Next, we assume that the error floor of the inner code  $\mathbb{C}_1$  is caused by a set of *detectable* error-patterns  $\Omega$  as experienced in most LDPC codes (i.e. the error-patterns are not codewords). The new SCC approach uses two short outer block codes (denoted as  $\mathbb{C}_2$  and  $\mathbb{C}_3$ ) to combat the error floor of the inner code  $\mathbb{C}_1$ . The encoding process comprises three steps (see Fig. 4):

- The uncoded frame is divided into m datawords of k<sub>3</sub> bits denoted D<sub>3</sub><sup>l</sup> for i = 1,...,m (e.g., m = 3 in the example of Fig. 4).
   Each dataword D<sub>3</sub><sup>l</sup> is encoded by C<sub>3</sub> generating the parity bits P<sub>3</sub><sup>l</sup>.
- 2) The *m* parity blocks  $P_3^l$  are grouped together into the dataword  $D_2^l$  which is encoded by  $\mathbb{C}_2$ , generating the parity bits  $P_2^l$
- 3) The parity bits  $P_2^1$  are divided into m sub-blocks of equal (or almost equal) size denoted as  $P_2^{1,i}$  with i=1,...,m. Each dataword  $D_1^i$  is generated by the concatenation of the dataword  $D_3^i$  and the parity bits  $P_2^{1,i}$ . Finally, each dataword  $D_1^i$  is encoded by code  $\mathbb{C}_1$  generating the codeword  $C_1^i$  to be transmitted over the channel.

Figure 5 presents an example of the decoding process when the error pattern occurs in the third codeword,  $C_1^3$ . The process comprises the following steps:

| Table 2                             |
|-------------------------------------|
| Binary BCH and RS code complexities |

|     | Block            | Register    | 2-bits XOR | 2-bits Mux  | Const Mul | Mul    | Gates Count                           |
|-----|------------------|-------------|------------|-------------|-----------|--------|---------------------------------------|
| ВСН | Encoder          | 2t          | 2tp        | 2p          | 0         | 0      | 8tp + 15t + 7p                        |
|     | Syndrome         | tq          | tpq        | tq          | t         | 0      | $4tpq + 2tq^2 + 7tq$                  |
|     | Key Equation     | 4(t+1)q     | (2t+1)q    | 3t + 3 + tq | 0         | 3t + 1 | $24tq^2 + 5.5tq + 22q + 28.5t + 26.5$ |
|     | Chien Search     | tq          | tpq        | tq          | tp        | t      | $2tpq^2 + 8tq^2 - tq + 6t$            |
|     | FIFO             | k + 5tp     | 0          | 0           | 0         | 0      | 0.8(k+5tp)                            |
| RS  | Encoder          | 2tq         | 2tpq       | 2pq         | 2tp       | 0      | $4tpq^2 + 15tq + 7pq$                 |
|     | Syndrome         | 2tq         | 2tpq       | 2tpq        | 2tp       | 0      | $4tpq^2 + 15tpq + 7pq$                |
|     | Forney Synd.     | 2tq         | 2tq        | 2tq         | 0         | 2t     | $30tq + 2t(8q^2 - 12q + 6)$           |
|     | Erasure Locator  | 2tq         | 2tq        | 0           | 0         | 2t - 1 | $23tq + (2t - 1)(8q^2 - 12q + 6)$     |
|     | Errata Evaluator | (3t+1)pq    | 3tpq       | 2(3t-1)pq   | 2(3t-1)p  | 0      | pq(31.5t + 12tq - 4q + 8.5)           |
|     | FIFO             | (n+4tp+8p)q | 0          | 0           | 0         | 0      | 0.8(n+4tp+8p)q                        |

where p,q,t,n,k are the parallelism factor, Galois field size, correction capability, code length, and code dimension respectively.

Table 1
TSMC gate count of basic cells

| Cell                                   | Gate Count |
|----------------------------------------|------------|
| Inverter (INVD0BWP35)                  | 0.5        |
| 2-input AND (AN2D2BWP35)               | 2.0        |
| 2-input XOR (GXOR2D2BWP35)             | 4.0        |
| 2-input MUX (MUX2D2BWP35)              | 3.5        |
| D Flip-Flop (DFD2BWP35)                | 7.5        |
| One Bit of a RAM†                      | 0.8        |
| † estimated based on the area relation | between a  |

† estimated based on the area relation between a 1024x16 SPSRAM and a NAND ND2D1BWP35

- 1) Codewords  $C_1^i$  for i = 1,...,m are decoded and those containing uncorrectable errors are detected.
- 2) From the error-free datawords  $D_1^i$  obtained at Step 1, the datwords  $D_3^i$  are extracted and encoded in order to recover the parity bits  $P_3^i$ .
- 3) The dataword  $D_2^1$  is reconstructed from the parity bits  $P_3^i$  generated at Step 2, while the unavailable parity bits (those that belong to the corrupted codewords) are marked as erasures (e.g.,  $P_3^3$  in the example of Fig. 5). The parity bits  $P_2^1$  are extracted from the error-free datawords  $D_1^i$  and those bits related to the corrupted datawords are marked as erasures. An erasure decoding is carried out over the codeword  $C_2^1$ , where the parity bits  $P_3^i$  of the corrupted codewords  $C_3^i$  are regenerated.
- 4) Finally, the codewords  $C_3^i$  with residual errors are decoded.

# III.A Performance and overhead

The proposed SCC achieves a performance similar to the one derived from SCC-II if  $\mathbb{C}_3$  corrects  $w_{max}$  bits and  $\mathbb{C}_2$  recovers  $\tau \cdot (n_3 - k_3) + (\tau / m) \cdot (n_2 - k_2)$  erased bits. Note that  $\tau \cdot (n_3 - k_3)$  are the parity bits of  $\tau$  codewords of  $\mathbb{C}_3$ , while  $(\tau / m) \cdot (n_2 - k_2)$  are the parity bits of  $\mathbb{C}_2$  that belong to  $\tau$  corrupted codewords of  $\mathbb{C}_1$ . Assuming that  $\mathbb{C}_2$  is a maximum distance separable (MDS) code (e.g., RS codes), it is verified that its erasure correction capability is equal to its redundancy [14], therefore

$$n_2 - k_2 = \tau \cdot (n_3 - k_3) + \frac{\tau}{m} (n_2 - k_2),$$
 (6)

then,

$$n_2 - k_2 = \frac{m \cdot \tau}{m - \tau} \cdot (n_3 - k_3).$$
 (7)

Finally, the overhead of the new SCC is



Figure 6: Encoding and Decoding Block Diagram of the Proposed SCC Scheme.

$$\Theta(\tau) = \frac{n_2 - k_2}{m \cdot k_3} = \frac{\tau}{m - \tau} \left[ \frac{n_3 - k_3}{k_3} \right]. \tag{8}$$

Numerical evaluation of (8) shows that the overhead of the proposed SCC scheme is lower than that required by the classical SCC-I and SCC-II schemes in practical high-speed applications where  $k_1 > 500$   $\tau \le 4$ ,  $w_{max} \le \sqrt{k_1} / 2 \le 200$ , and  $m \ge 20$  [7].

# III.B Complexity

The proposed SCC scheme is composed of three encoders at the transmitter side (one for each code) and one  $\mathbb{C}_3$  encoder and three decoders at the receiver side (one decoder for each code), as observed in Fig. 6. Let  $\Phi(\mathbb{C}^e_x, \mathcal{T})$  and  $\Phi(\mathbb{C}^d_x, \mathcal{T})$  be the complexity of the encoder and decoder respectively of code  $\mathbb{C}_x$  operating at throughput  $\mathcal{T}$  bits/s. In this work, we use the *equivalent number of gates* of different logic cells (e.g., AND, XOR, etc.) as a complexity measure of a given SCC approach. The equivalent gate count is computed based on the TSMC cells described in Table 1 [15]. The estimated number of logic gates of RS over GF ( $2^q$ ) and BCH codes for a classic p-parallel implementation is summarized in Table 2 [16]–[17].

The total complexity  $\Phi_T$  of the proposed SCC scheme is

$$\Phi_{T} \approx 2\Phi(\mathbb{C}_{3}^{e}, \mathcal{T}) + \Phi(\mathbb{C}_{3}^{d}, (\tau/m)\mathcal{T}) + \Phi(\text{Buffer}) 
+ \Phi(\mathbb{C}_{2}^{e}, (r_{3}/k_{3})\mathcal{T}) + \Phi(\mathbb{C}_{2}^{d}, (r_{3}/k_{3})\mathcal{T}),$$
(9)

where the decoder of  $\mathbb{C}_3$  has to correct  $\leq \tau$  of the m codewords per frame and therefore its throughput is reduced to  $\approx (\tau/m)T$ . Similarly, the decoder of  $\mathbb{C}_2$  has to correct one codeword per frame

### Table 3 Performance and complexity comparison of example 1 Scheme **SNR** Penalty Overhead Error Floor Outer Code C<sub>2</sub> Outer Code C<sub>3</sub> Relative Complexity SCC-I 0.3386 dB 8.1081 % None BCH[1320, 1221, 19] None 1.00 $\approx 5 \cdot 10^{-19}$ SCC-II 0.0397 dB 0.9174 % BCH[47520, 47088, 55] None 6.00 $\approx 5 \cdot 10^{-19}$ 0.0297 dB 0.6865 % RS[432, 396, 37] 0.67 This work BCH[1410, 1311, 19]



Figure 7: Encoding process of the proposed SCC optimized for  $\tau = 1$ .

which reduces its throughput to  $\approx (r_3/k_3)T$ . Finally,  $\Phi(\text{Buffer})$  is the complexity of approximately  $3mk_1$  bits of buffering required to compensate for the latencies of the different codes. These complexities depend on the adopted outer codes. A good tradeoff between performance and complexity can be achieved by using the new SCC with an RS code over GF( $2^q$ ) as code  $\mathbb{C}_2$  and a binary BCH code as code  $\mathbb{C}_3$ . As we shall show in the following examples, the complexity of the proposed SCC scheme is significantly lower than that of the existing SCC as a result of the throughput reduction of the decoders.

# III.C Example 1: Concatenated LDPC+RS+BCH Code

Let  $\mathbb{C}_1$  be the [2640,1320] Margulis LDPC code [8]. This code has an error floor starting at  $P_w(\Omega) \approx 10^{-5}$ . This error floor is caused by *trapping-sets* (TS), in particular (12,4) TS and (14,4) TS<sup>3</sup> which have 6 and 7 information bits, respectively. However, as noticed in [18], there are also trapping sets of weight 15, 16, 17 and 18 bits. We take into account these additional trapping sets by designing an SCC scheme to correct error patterns with  $w_{max} = 9$  information bits. The solutions for the different SCC schemes are described in the following items:

- SCC-I: the outer code  $\mathbb{C}_2$  must be the BCH [1320,1221,19]. Therefore, the SNR penalty and bandwidth overhead are  $10 \cdot log_{10}(1320/1221) = 0.3386$  dB and (1320-1221)/1221 = 8.11%, respectively.
- SCC-II: the outer code  $\mathbb{C}_2$  must be a binary BCH [47520,47088,55] code, which corrects up to  $\tau=3$  error-patterns of 9 bits. The error floor is not entirely eliminated, but it is reduced to  $\tilde{P}_b^{(II)}(\mathfrak{Q}) \approx 5 \cdot 10^{-19}$  with an SNR penalty and bandwidth overhead of 0.0397 dB and 0.9174%, respectively.
- Proposed SCC: setting m = 36 and  $\tau = 3$  as in SCC-II (in this way, the same residual error floor is achieved),  $\mathbb{C}_3$  can be a BCH [1410,1311,19] over GF ( $2^{11}$ ), and  $\mathbb{C}_2$  can be an RS [432,396,37] over GF( $2^{9}$ ). The number of additional parity bits is  $36 \cdot 9 = 324$ . This OH introduces an SNR penalty and a bandwidth overhead of 0.0297 dB and 0.6865%, respectively.

The complexity of the three SCC schemes was estimated as described in Section 3.2. A base parallelism factor p = 160 bits was

used as a reference to achieve a throughput of  $\mathcal{T} \approx 100$  Gb/s in 28nm CMOS technology operating at a clock frequency of 625 MHz. Table 3 summarizes the relative complexity of the three schemes and their performances. Note that:

- The performance of SCC-II is better than that of SCC-I at the expense of a higher implementation complexity (because of the longer outer BCH code).
- The new SCC achieves an NCG ~ 0.3 dB higher than that provided by the SCC-I with a similar complexity.
- The performances of SCC-II and the new SCC algorithm are similar. However, the implementation complexity of our technique is approximately one order of magnitude lower with respect to the SCC-II.

# III.D Example 2: Concatenated LDPC+RS+RS Code

As a second example, we consider the LDPC+RS concatenation scheme proposed in [19], which has been designed for next generation optical communication systems. This scheme comprises an inner LDPC [9252,7967] code and an outer RS [992,956,37] code. The total overhead is 20.5% and the NCG at BER= $10^{-15}$  is 10 dB. The outer RS code introduces an SNR penalty of 0.1605 dB. Next, we use the new SCC approach with the same inner LDPC code  $\mathbb{C}_1$ defined before (i.e., LDPC [9252, 7967]). For the outer codes, we consider the RS [830,794,37] as  $\mathbb{C}_3$  (a shortened version of the original RS code), and the RS [1014,936,79] code as  $\mathbb{C}_2$  where m = 26and  $\tau = 2$ . Note that the throughput of  $\mathbb{C}_2$  is  $k_3 / r_3 \approx 22.14$  times lower than the original RS (i.e., RS [992,956,37]), therefore the extra complexity required by the new SCC is negligible. From [19], the error pattern probability is  $P_w(\Omega) \approx 5 \cdot 10^{-7}$ . Then, the residual error floor is  $\tilde{P}_b(\Omega) \approx 10^{-19}$ . The SNR penalty and bandwidth overhead are 0.0164 dB and 0.378% respectively. Therefore, the NCG is increased from 10 dB to 10.1441 dB and the total overhead is reduced from 20.5% to 16.568%. This not only reduces the SNR requirement of the system but also increases the spectral efficiency and reduces the power dissipation (since the sampling rate can be reduced because of the lower overhead).

# III.E Example 3: Concatenated LDPC+SPC+BCH Code

Most LDPC codes proposed for optical applications have a low error floor (BER <  $10^{-10}$ ) that can be reduced below  $10^{-15}$  by correcting only one error pattern per frame (i.e.,  $\tau = 1$ ). For these cases, it is possible to lower the overhead penalty and implementation complexity by using  $(n_3 - k_3)$  single parity check (SPC) codes as the outer code  $\mathbb{C}_2$ . The new encoding process, as depicted in Fig. 7, comprises the following steps:

- 1) The uncoded frame is divided into m blocks. The first m-1 datawords  $D_3^i$  for  $i=1,\ldots,m-1$  correspond to the first m-1 blocks. The last dataword  $D_3^m$  is the concatenation of  $(n_3-k_3)$  zeros and the last block  $\widetilde{D}_3^m$  of  $k_3-(n_3-k_3)$  bits.
- 2) Each dataword  $D_3^i$  is encoded by  $\mathbb{C}_3$  generating the parity bits  $P_3^i$ .

<sup>&</sup>lt;sup>3</sup>In the notation "(*e*, *d*) TS", *e* is the number of wrong bits and *d* is the number of unsatisfied check nodes (see [8] and [18] for more details)



**Figure 8:** BER vs SNR for SCC-II and the proposed SCC with  $\tau = 1$ .

- 3) For  $i=1,...,n_3-k_3$  the *m*-bit dataword  $D_2^i$  is the concatenation of the *i*-th parity bit of  $P_3^j$  for j=1,...,m. Each  $D_2^i$  is encoded by an SPC code.
- 4) The dataword for  $\mathbb{C}_1$  is  $D_1^i = D_3^i$  for i = 1, ..., m-1. The last dataword  $D_1^m$  is the concatenation of  $\widetilde{D}_3^i$  and the  $n_3 k_3$  parity bits generated in Step 3. Finally, each dataword  $D_1^i$  is encoded by  $\mathbb{C}_1$ .

The decoding process is similar to that of the original scheme (e.g., see Fig. 5). The main benefits of this optimized scheme are:

- The implementation complexity of the encoder and decoder of the outer code C<sub>2</sub> is very low since they can be implemented recursively with n<sub>3</sub> − k<sub>3</sub> XOR gates and Flip-Flops, i.e. Φ(C<sup>e</sup><sub>2</sub>) + Φ(C<sup>d</sup><sub>2</sub>) = 2(n<sub>3</sub> − k<sub>3</sub>)(Φ(XOR) + Φ(FlipFlop)) = 23 ⋅ (n<sub>3</sub> − k<sub>3</sub>) gates.
- The overhead penalty is reduced from  $[1/(m-1)] \cdot [(n_3-k_3)/k_3]$  (see eq. (8) with  $\tau=1$ ) to  $(1/m) \cdot [(n_3-k_3)/k_3]$  because the corrupted parity-bits of  $\mathbb{C}_2$  do not have to be erased.
- The encoder latency is reduced because the  $P_2^1$  parity bits are transmitted in the last codeword of  $\mathbb{C}_1$  (i.e., the parity bits can be computed while the codewords of  $\mathbb{C}_1$  are being transmitted). This also reduces the buffer length from  $\approx 3mk_1$  to  $\approx 2mk_1$  bits.

The performance of the optimized SCC approach for  $\tau=1$  is evaluated by using computer simulations in Fig. 8. The [2640,1320] Margulis LDPC code with m=10 is used as inner code,  $\mathbb{C}_1$ . Owing to time constraints, an artificially high error floor with probability  $P_w(\mathfrak{Q})=10^{-3}$  is inserted after the decoding process. As the *real* error floor, the artificial BER is generated from error patterns with 7 information bits and 7 parity bits. Fig. 8 shows the BER of the inner code  $\mathbb{C}_1$  (circles), SCC-II (squares) and the proposed SCC (triangles). For SCC-II, the outer code  $\mathbb{C}_2$  is a BCH [13200,13102,15]. On the other hand,  $\mathbb{C}_3$  is a BCH [1397,1320,15] while  $\mathbb{C}_2$  is an SPC [11,10,2] code for the new SCC. From Fig. 8 note that  $\mathbb{C}_1$  has an error floor at BER  $\approx (7/1320)P_w(\mathfrak{Q}) = 5.3 \cdot 10^{-6}$ . This error floor is reduced by the outer codes to BER  $\approx 4.73 \cdot 10^{-8}$ . From this figure we see that the new SCC is able to achieve a similar performance to SCC-II. It is important to realize that our technique achieves this performance by using short outer block codes. Compared with SCC-II, this fact reduces significantly the implementation complexity in integrated



Figure 9: Encoding process of the generalized SCC.

circuits. Particularly, using a parallelism factor p=160 to achieve a throughput  $T \approx 100$  Gb/s in 28nm CMOS technology operating at a clock frequency of 625 MHz as in example 1 (see section 3.3), the complexity of SCC-II is  $\approx 575$  Kgates while the complexity of the proposed scheme is  $\approx 107$  Kgates, i.e. 5.38 times lower.

# IV Error Floor Reduction in TPC

The SCC strategy introduced previously can be extended to mitigate the error floor caused by low-weight codewords (i.e., undetectable error patterns). Note that this feature is particularly useful for decoding of turbo codes such as turbo product codes (TPC). This approach uses a subset of g parity check bits of  $\mathbb{C}_3$  to detect those inner codewords with residual errors after the inner code decoder<sup>4</sup>.

# IV.A Generalized SCC Scheme

Figure 9 depicts the encoding process of the generalized SCC. Unlike in the previous strategy, in the generalized SCC a subset of g parity bits of  $P_3^i$ , denoted as  $\hat{P}_3^i$ , is not encoded by  $\mathbb{C}_2$ . Instead, this subset is transmitted as a part of the dataword  $D_1^i$  of the inner code  $\mathbb{C}_1$ . In the decoding process, (should it say Figure 10 here?)  $\hat{P}_3^i$  is used to detect the corrupted  $\mathbb{C}_1$ -codewords. The decoding process starts by applying the decoder of code  $\mathbb{C}_1$  to the m received codewords  $C_1^i$ . After that, the dataword  $D_1^i$  is extracted from  $C_1^i$ . For each dataword  $D_1^i$ , the dataword  $D_3^i$  is extracted and partially encoded with  $\mathbb{C}_3$  in order to regenerate only the parity bits  $\hat{P}_3^i$ . If these regenerated parity bits are not equal to the corresponding bits in  $D_1^i$ , this dataword is marked as corrupted. Once all corrupted datawords  $D_1^i$  are identified, the rest of the decoding process continues as in the original proposed SCC.

The generalized SCC scheme may have a residual error floor caused by the occurrence of more than t error patterns  $\in \Omega$  in the same frame. This residual error floor, denoted as  $\tilde{P}_b^{(1)}(\Omega)$ , can be estimated from (5). Additionally, a residual error floor caused by an error pattern that cannot be detected by the g parity bits  $\hat{P}_3^i$  is also possible. The probability  $P_w^{(2)}(\Omega)$  that an undetectable error pattern  $\omega \in \Omega$  takes place in the inner codeword  $C_1^i$  can be computed as

$$P_w^{(2)}(\Omega) = \sum_{\omega \in \Omega} I_v \left( \hat{P}_3(\omega) = 0 \right) p(\omega) \tag{10}$$

where  $\hat{P}_3(\omega)$  are the first g parity bits of  $\mathbb{C}_3$  associated with the error pattern  $\omega \in \Omega$  and  $I_{\nu}(X)$  is the *Iverson operator* which is equal to 1 if the statement X is true, and 0 otherwise. Because the error

<sup>4</sup>It is also possible to use an additional *error-detecting code*, such as a *cyclic redundancy check* (CRC) code, as part of the proposed SCC scheme to detect those inner codewords with residual errors. Furthermore, in a different approach, it is also possible to replace the erasure decoder of code  $C_2$  by an error-correcting decoder at the expense of increasing the minimum distance of  $C_2$ .



Figure 10: Description of the decoding process of the generalized SCC.

patterns of different inner codewords are independent, the probability of at least one undetectable error pattern in the frame can be computed based on the binomial distribution as

$$P_f^{(2)}(\mathfrak{Q}) = \sum_{i=1}^{m} {m \choose i} [P_w^{(2)}(\mathfrak{Q})]^i [1 - P_w^{(2)}(\mathfrak{Q})]^{m-i}, \tag{11}$$

while the BER can be estimated as

$$\tilde{P}_{b}^{(2)}(\mathfrak{Q}) \approx \sum_{i=1}^{m} {m \choose i} \frac{i \cdot w_{max}}{m \cdot k_{1}} [P_{w}^{(2)}(\mathfrak{Q})]^{i} [1 - P_{w}^{(2)}(\mathfrak{Q})]^{m-i}.$$
 (12)

# IV.B Generalized SCC with Turbo Product Codes (TPC)

As mentioned before, powerful FEC codes must be designed to satisfy the need of future multigigabit transmission systems. For instance, net coding gains > 10 dB at a BER of  $10^{-15}$  and overhead of  $\sim 20\%$  are mandatory for next generation OTN [2]. In order to meet these requirements, numerous LDPC and TP codes have been reported in the literature (e.g., see [2] and references therein). In particular, TPC based on  $\geq 2$ -error-correcting BCH codes (or TPC-BCH) with block sizes  $\geq 32$  Kbits have been used in high-speed systems to provide an acceptable tradeoff between performance and complexity [9]–[10], [20]. The feasibility of TPC-BCH for commercial applications at 100 Gbps with NCG of  $\sim 11.4$  dB at BER =  $10^{-15}$  and a total overhead of  $\sim 20$  % has been demonstrated in [20].

In the following, we consider the use of the proposed generalized SCC technique to improve the behavior of TPC in high-speed applications. We demonstrate that a TPC based on simple extended Hamming codes (TPC-EH) with a block size of 8192 bits and minimum distance of 16, can be combined with the new SCC strategy in order to achieve an NCG of  $\sim 11.2$  dB at BER =  $10^{-15}$  with  $\sim 22$ % total overhead and error floor at  $\sim 7 \cdot 10^{-17}$ . Notice that the this performance is: (i) 0.45 dB better than the one achieved by the BCH(144,128,5) × BCH(256,239,6) TPC with block size of 36864 bits and minimum distance 30 proposed in [9]; (ii) 0.4 dB better than the one accomplished by the BCH(128,113,6) × BCH(256,239,6) TPC with block size of 32768 bits and minimum distance 36 proposed in [10], and (iii) 0.4 dB better than achieved by the triple concatenatedcodes proposed in [2]. Furthermore, the implementation complexity of the TPC-based proposed SCC technique is expected to be lower than that of non-concatenated TPC-BCH schemes. This is mainly because the component codes in the TPC-EH with the proposed SCC are much simpler than those required in the non-concatenated TPC-BCH codes. In particular, the latter requires longer BCH codes with an error



**Figure 11:** BER vs SNR simulations of the TPC-EH and the TPC-EH combined with the proposed SCC

correction capability higher than that of the EH code in order to reduce the error-floor. These features make the TPC-EH with the proposed SCC, introduced here, a suitable option for next-generation optical fiber communication networks [2].

# IV.C Concatenated TPC-EH + BCH + RS

Let the inner code  $\mathbb{C}_1$  be a TPC based on two extended Hamming codes with parameters [128,120,4] and [64,57,4]. Let  $A_w$  be the number of codewords with weight w in  $\mathbb{C}_1$ . For w < 28,  $A_w$  can be computed as described in [21] obtaining

$$A_{w} = \begin{cases} 1, & \text{if } w = 0\\ 888943104, & \text{if } w = 16\\ 7154214100992, & \text{if } w = 24\\ 0, & \text{other } w < 28 \end{cases}$$
 (13)

Figure 11 shows the BER vs SNR of this code when it is decoded with 8 turbo iterations between the two component codes. The optimal *maximum a-posteriori probability* (MAP) decoder proposed in [22] is used to decode both EH codes. As observed in Fig. 11,  $\mathbb{C}_1$  has a error floor at a BER  $\leq 10^{-7}$  which is caused by the  $A_{16}$  codewords of minimum weight  $w_{min} = 16$  [23]. Fig. 11 also reports the BER estimations based on the union bound [23] for codewords of weight 16 and 24.

In order to avoid an error floor at BER =  $10^{-15}$ , from Fig. 11 we infer that codewords with weight 24 must be corrected. Furthermore, since a suboptimal iterative decoder algorithm is used, non-codeword stopping sets also have to be analyzed. The latter can be computed as described in [24], given that the minimum non-codeword stopping sets have weight 24 and multiplicity 81782765568 (i.e.  $\approx$  87 times lower  $A_{24}$ ). A frame composed of m=19 shorted TPC (7859,6507) codewords is required to accommodate the 122368 bits of the *Optical Channel data Unit* (ODU) of the G.709 OTN frame [25] and the parity bits of the outer codes. To reduce the error floor below  $10^{-15}$  a correction capability of  $\tau=2$  error patterns of weight  $\leq$  24 and a corrupted codeword detection based on g=32 bits are used. The shortened BCH [6753,6441,49] over GF ( $2^{13}$ ) with a correction capability of 24 bits is used as code  $\mathbb{C}_3$ . Code  $\mathbb{C}_2$  is the shorted RS [596,532,65] code over GF ( $2^{10}$ ). Therefore, the total additional overhead due to  $\mathbb{C}_2$  and  $\mathbb{C}_3$  is  $\approx 1.02\%$  which introduces an SNR penalty of 0.044 dB. The residual error rate caused by the occurrence of more than  $\tau$  error patterns is  $\tilde{P}_b^{(1)}(\mathfrak{Q}) \approx 7 \cdot 10^{-17}$ , while the residual error rate due to undetectable error patterns is  $\tilde{P}_b^{(2)}(\mathfrak{Q}) \approx 4.3 \cdot 10^{-18}$ . These values have

been derived from (5) and (12), respectively, with  $P_w(\Omega) \approx 5 \cdot 10^{-6}$  which has been obtained from computer simulation with SNR = 6.7 dR

The error floor problem of the TPC-EH can also be solved with schemes SCC-I and SCC-II. However, as it will be shown below, the proposed SCC scheme provides a better performance vs. complexity tradeoff:

- SCC-I requires a frame of m = 19 shorted TPC (8105,6753) combined with m BCH [6753,6441,49]. The total overhead is 25.8%, the NCG is 11.05 dB and the complexity is 2.02 times higher than that of the proposed scheme. Therefore, the proposed SCC scheme has better spectral efficiency, 0.15 dB higher NCG and lower complexity than SCC-I.
- SCC-II requires a frame of m = 19 shorted TPC (7836,6484) combined with one BCH [123196,122380,97]. Similarly to the proposed scheme, the total overhead is 21.7% and the NCG is 11.2 dB. However, the complexity is 6.73 times higher, representing a significant complexity advantage in favor of the proposed scheme.

The above illustrates the advantages of the proposed scheme for implementing forward error correction codes for high speed applications. Particularly, the proposed TPC-EH+RS+BCH scheme here proposed represents a low complexity alternative to the non-concatenated TPC based on  $\geq 2$ -error-correcting BCH codes [2],[9]–[10] since the later has higher implementation complexity than EH codes.

# V Conclusions

We have introduced a novel SCC scheme to combat the error floor problem experienced in iterated sparse graph-based error correcting codes. This SCC scheme is based on the use of two short outer codes combined with a novel encoding/decoding strategy. We have shown that the new approach significantly reduces the complexity with negligible penalty. The proposed SCC can be efficiently used with both LDPC and TP codes. In particular, the new SCC approach can be used to improve the performance of high-speed optical communication systems, where high coding gain and very low BER are required. The SCC technique introduced in this work provides a new general framework for solving the error floor problem induced by low-weight error patterns of any coding scheme.

# References

- D.A. Morero, et al., "Non-Concatenated FEC Codes for Ultra-High Speed Optical Transport Networks," *IEEE Global Telecomm. Conf.*, pp.1–5, Dec. 2011
- [2] K. Onohara, et al., "Soft-Decision-Based Forward Error Correction for 100 Gb/s Transport Systems," *IEEE J. Sel. Topics Quantum Electron.*, vol.16, no.5, pp.1258–1267, Sept.–Oct. 2010
- [3] N. Kamiya and S. Shioiri, "Concatenated QC-LDPC and SPC codes for 100 Gbps ultra long-haul optical transmission systems," Optical Fiber Comm. (OFC), collocated National Fiber Optic Eng. Conf. (OFC/NFOEC), pp.1–3, March 2010
- [4] Z. Zhengya, et al., "Lowering LDPC Error Floors by Postprocessing," *IEEE Global Telecomm. Conf.*, pp.1–6, Nov.-Dec. 2008
- [5] N. Varnica, M. Fossorier, A. Kavcic, "Augmented Belief-Propagation Decoding of Low-Density Parity-Check Codes," *IEEE Trans. Comm.* vol.54, no.10, pp.1896, Oct. 2006
- [6] S. Benedetto, et al.; , "Serial concatenation of interleaved codes: performance analysis, design, and iterative decoding," *IEEE Trans. Inf. Theory*, vol.44, no.3, pp.909–926, May 1998
- [7] D.A. Morero and M.R. Hueda, "Efficient concatenated coding schemes for error floor reduction of LDPC and turbo product codes," *IEEE Global Telecomm. Conf.*, pp.1–5, Dec. 2012
- [8] D.J. MacKay and M.S. Postol, "Weaknesses of Margulis and Ramanujan-Margulis low-density parity-check codes," *Elect. Notes in Theoretical Computer Science*, 2003
- [9] T. Mizuochi, et al., "Forward error correction based on block turbo code with 3-bit soft decision for 10-Gb/s optical communication systems," *IEEE J. Sel. Topics Quantum Electron.*, vol.10, no.2, pp.376–386, March-April 2004
- [10] M. Akita, et al., "Third generation FEC employing turbo product code for long-haul DWDM transmission systems," Optical Fiber Comm. (OFC), pp.289–290, Mar. 2002

- [11] J. G. Proakis, "Digital Communications," McGraw-Hill Higher Education, Third Edition, 1996.
- [12] W. Ryan and S. Lin, "Channel codes: Classical and modern," Cambridge University Press, 2009
- [13] G.D. Forney, "Concatenated codes," Cambridge, MA: MIT Press, 1966
- [14] W.C. Huffman and V. Pless, "Fundamentals of error-correcting codes," Cambridge University Press, 2003.
- [15] Taiwan Semiconductor Manufacturing Company Ltd, "N28HP standard cell library," Datasheet TCBN28HPBWP35, Nov. 2010.
- [16] S. Lin and D. Costello, "Error control coding, fundamental and applications," Pearson Prentice Hall, Second Edition, 2004.
- [17] Hsie-Chia Chang, et al., "A Universal VLSI Architecture for Reed-Solomon Errorand-Erasure Decoders," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol.56, no.9, pp.1960–1967, Sept. 2009
- [18] H. Yang; W.E. Ryan, "LDPC decoder strategies for achieving low error floors," Inf. Theory and Applications Workshop, pp.277–286, Jan.-Feb. 2008
- [19] Y. Miyata, et al., "Efficient FEC for Optical Communications using Concatenated Codes to Combat Error-floor," Optical Fiber Comm. (OFC), collocated National Fiber Optic Eng. Conf. (OFC/NFOEC), pp.1–3, Feb. 2008
- [20] S. Dave, et al., "Soft-decision forward error correction in a 40-nm ASIC for 100-Gbps OTN applications," Optical Fiber Comm. (OFC), collocated National Fiber Optic Eng. Conf. (OFC/NFOEC), pp.1–3, Mar, 2011
- [21] L.M.G.M. Tolhuizen, "More results on the weight enumerator of product codes," IEEE Trans. Inf. Theory, vol.48, no.9, pp.2573–2577, Sep. 2002
- [22] A. Ashikhmin and S. Litsyn, `Simple MAP decoding of first-order Reed-Muller and Hamming codes," *IEEE Trans. Inf. Theory*, vol.50, no.8, pp.1812–1818, Aug. 2004
- [23] F. Chiaraluce and R. Garello, "Extended Hamming product codes analytical performance evaluation for low error rate applications," *IEEE Trans. Wireless Comm.*, vol.3, no.6, pp.2353–2361, Nov. 2004
- [24] E. Rosnes, "Stopping Set Analysis of Iterative Row-Column Decoding of Product Codes," *IEEE Trans. Inf. Theory*, vol.54, no.4, pp.1551–1560, Apr. 2008
- [25] Int. Telecomm. Union, "Interfaces for the optical transport network," ITU-T G.709, Feb. 2010.



Damian A. Morero received with honors the degree in electronic engineering from the National University of Cordoba (UNC), Cordoba, Argentina where he is currently working toward the Ph.D. degree in Engineering Science. In 2003 and 2005, he received the Academic Excellence Award from the Engineers Association of Cordoba Argentina and the UNC respectively. From 2006 to 2009, he received a Ph.D. Fellowships from the Secretary of Science and Technology (SeCyT), Argentina. He is currently with ClariPhy Argentina S.A. where he has been engaged in the research and development of error correction coding schemes for high speed optical communications. His research interests include coding, information theory and signal processing.



Mario R. Hueda received the degree in electrical and electronic engineering and the Ph.D. degree from the National University of Cordoba, Cordoba, Argentina, in 1994 and 2002, respectively. From March 1994 to 1996, he received a fellowship from the Scientific and Technological Research Council of Cordoba to carry out research and development in the area of voiceband-data transmission. During the summer of 1996, he was a Visiting Scholar with Lucent Technologies-Bell Laboratories, Murray Hill, NJ, where he worked on code-division multiple-access receivers. Since 1997, he has been with the Digital Communications Research Laboratory, Department of Electronic Engineering, National University of Córdoba. He is currently with the National Scientific and

Technological Research Council (CONICET), Cordoba. His research interests include digital communications and performance analysis of communication systems.