A short-ranged memory model with preferential growth

In this work we introduce a variant of the Yule-Simon model for preferential growth by incorporating a finite kernel to model the effects of bounded memory. We characterize the properties of the model combining analytical arguments with extensive numerical simulations. In particular, we analyze the lifetime and popularity distributions by mapping the model dynamics to corresponding Markov chains and branching processes, respectively. These distributions follow power-laws with well defined exponents that are within the range of the empirical data reported in ecologies. Interestingly, by varying the innovation rate, this simple out-of-equilibrium model exhibits many of the characteristics of a continuous phase transition and, around the critical point, it generates time series with power-law popularity, lifetime and inter-event time distributions, and non-trivial temporal correlations, such as a bursty dynamics in analogy with the activity of solar flares. Our results suggest that an appropriate balance between innovation and oblivion rates could provide an explanatory framework for many of the properties commonly observed in many complex systems.


I. INTRODUCTION
Usually known as Zipf's law or Pareto distributions, power-laws and long-tailed distributions are scattered among many natural systems 1 . For instance, these distributions can be found in physics, biology, earth and planetary sciences, economics and finance, and demography and the social sciences. A short list of the systems where these laws are observed includes the sizes of earthquakes 2 and solar flares 3 , the frequency of use of words in human languages 4 , the number of species in biological taxa 5 , people's annual incomes 6 and the distribution of the popularities of game lines in chess 7 .
Several mechanisms have been proposed for generating highly skewed distributions with long tails 1,8,9 . In particular, the underlying probabilistic mechanism of the Yule-Simon process 10,11 is one of the most studied due to its simplicity. The Yule-Simon model, originally inspired by observations of the statistics of biological taxa, reemerged in the literature several times. In fact, its most recent variant, known as preferential attachment, became one of the most important ideas in the early development of the theory of complex networks 12 . Yule's model was introduced to explain the distribution of the number of species in a genus, family or other taxonomic group. Later, a more particular version of the model was introduced by Simon to explain the frequency of words in literary corpuses, i.e., to explain Zipf's law. Within the context of text generation, the Yule-Simon's model works as follows. At each discrete time step a new word is generated with probability p, or an already existing word is copied with probability 1 − p. This process generates a text with few words that are very popular and many others that are barely used. Namely, the popularity distribution of the words generated by this process is a skewed distribution with a power-law tail of exponent α = 2 + p/(1 − p), meaning that the number n s of words with popularity s ≫ 1 satisfy n s ∼ s −α .
Another interesting effect that is frequently observed in empirical systems that display a long-tailed frequency distribution, is the presence of memory in the form of long-range correlations [13][14][15][16] or a bursty dynamics 17,18 . These memory effects are not accounted by the standard preferential growth models 7,10,11,19 . In order to address the observed memory effects, Cattuto et al. 20 introduced a variant to the Yule-Simon model incorporating a fattailed memory kernel. The series generated by Cattuto's model exhibit temporal long-range correlation which are missing in the Yule-Simon model 21,22 . Moreover, it preserves the long-tailed popularity distribution and is able to reproduce quite well the statistics of tag occurrences in the collaborative activity of web users 21 . In a recent work 22 we have shown that the long-range correlations introduced by Cattuto's model allow to describe the memory effects observed in the growing process of an extensive chess database. However, this model fails to describe the inter-event time distribution in the use of popular game lines by individual players. In other words, Cattuto's model introduces long-range correlations but fails to produce a bursty inter-event time dynamics.
Aiming to generate a bursty dynamics, in this work we replace the fat-tailed kernel of Cattuto's model by a finite-sized memory kernel. This modification introduces bursts in the occurrence of sets of words but eliminates the long-range correlations, although its characteristic length exhibits a non-trivial finite size scaling. It also preserves the power-law distribution expected in Yule-Simon model. However, the distribution exponent is neither the corresponding for the Yule-Simon model nor Cattuto's model.
The article is organized in the following manner. In section I, we introduce the model with the finite-size ker-nel and analyze the emerging lifetime and popularity distributions. In section II we characterize the memory effects, such as correlations and burstiness. In section III we perform an analysis of the kernel state as a function of the model parameters to understand the phenomenology of the model, where we establish an analogy with the statistical mechanics theory of equilibrium system. Finally, in Section IV we provide a discussion of the main results of this work.

A. Models
The mechanism to generate artificial sequences in preferential growth models is as follows. The process begins with an initial state of n 0 randomly generated words, or more generally, elements of different classes. At each discrete temporal step t there are two options: i) to introduce an element of a new class with probability p or ii) to copy an already existing element with probabilitȳ p = 1 − p. The way to select the element to be copied in option (ii) establishes the difference between Yule-Simon and Cattuto's model. In the Yule-Simon model every existing element has the same probability to be copied, while in Cattuto's model the probability of copying an element that occurred at time t − ∆t is given by the functional form of the memory kernel Q(∆t), where C(t) is a logarithmic normalization factor and κ C is the characteristic length or memory kernel extension.
The model we introduce in this work is a simple variant of Cattuto's model. The difference resides in the memory kernel, which in our model is defined by a step function: where κ is the memory kernel extension. We call this variant of the process Bounded Memory Preferential Growth model (BMPG model). Fig. 1 outlines the process for the BMPG model. With probability p the added element is of a new class, i.e., a new color ball is introduced. With complementary probability 1−p an existing element is copied, chosen uniformly among the last elements of the series. Elements that are more than κ time steps away-the shaded balls in the figure-cannot be copied.

B. Master equation
As a first analysis of our model we describe the dynamics of the elements inside the memory kernel. Notice that, at each time step t, the last element in the kernel leaves the kernel in the next time step. As a consequence, a class of elements may go extinct if the leaving element is the last of its class within the kernel-e.g., the pink shaded balls in Fig. 1. In this context, it is interesting to study the extinction dynamics of the elements in the kernel through the analysis of lifetime distributions. Let P n (t), (n = 0, 1, ..., κ) be the probability at time t that a given class of element inside the kernel has popularity n, and P(t) the vector with components P n (t) (n = 0, 1, 2, ..., κ). The probability P n (t) can be approached by a discrete Markov chain process, considering that an element of a class can be copied increasing its popularity n → n+1, can be removed decreasing its popularity, n → n − 1, or can be copied and removed in the same time step keeping its popularity. In the BMPG process the older element in the kernel is removed. However, to derive the Markov chain we make an approximation, where at each time step we remove instead a randomly chosen element of the kernel. In this approximation we derive the following transition probabilities for a kernel with n elements of a particular class, where n = 0, 1, 2, ..., κ. Then, if the probability distribution is known at time t, the probability distribution at time t+1 can be computed as P(t+1) = P(t)Π, where Π is the matrix with elements {Π n,m } in which the no null elements correspond to m = n, n − 1, n + 1 and are given by Eqs. (3). In particular, using dP dt ≈ P(t + 1) − P(t) = P(t)(Π − 1) the master equation can be obtained as dP n (t) dt = Π n−1,n P n−1 (t) + Π n+1,n P n+1 (t) (4) where the transition matrix is extended to include the special cases Π −1,0 = Π κ+1,κ = 0. Expressing the time in units of κ-i.e.,t = t/κ-and rearranging terms, the master equation can be written as: where b (−1) = d (κ+1) = 0 and, for the other cases, Here β(n) = (1 − p)(1 − n κ ). When β is constant this master equation corresponds to a Galton-Watson process 23 . However, since the number of states is bounded in our model-i.e., the popularity inside the kernel of a given element cannot overstep the size of the kernel-β is a decreasing function of n. In fitness landscape models, it is usually assumed that the total population size is fixed and determined by the carrying capacity of the environment 24 . Then, it is interesting to analyze the lifetime distribution for the elements in the kernel, since former studies used this kind of approach to model the lifetime distribution of species in ecologies 23 . As we will later show, different regimes arise in the BMPG model dynamics depending on the value of the product pκ. In  Fig 2 we show the lifetime distributions for the elements in the BMPG model for three different regimes p > p c , p = p c , and p < p c , where p c = 1/κ. These distributions, like all the results shown in this paper, are obtained by generating sequences of N = 10 7 elements. In these series we compute the lifetime of each class of elements g, averaging over 20 realizations for each value of the parameters. In the three cases the lifetime of the elements is distributed according a long-tailed power-law distribution with exponent ≈ −1.9. This exponent is between −3/2, which is the exit time of a random walk problem, and −2, the critical Galton-Watson branching process 23 . Furthermore, it has been suggested 25 that the lifetimes of taxa in the fossil record have a highly right-skewed distribution with a power-law tail with exponents in the range from −3/2 to −2.

C. Popularity distribution
As a further characterization of the model we calculate the popularity distribution P (s) of the generated sequences for the BMPG model using various values of the parameters p and κ. In Fig. 3 (top panel) we show the results obtained for κ = 100, 300 and 500, where in each case p = p c = 1/κ. For all calculated P (s) we find that the distributions are very well fitted by a power-law, with α ≃ 3/2, which means that the finite-sized kernel maintains the power-law decay found in the original Cattuto and Yule-Simon models, but it does modify the exponents α in a particular form as we will discuss below. In order to analyze the popularity distribution of elements in the BMPG model, we map the generation of elements of each class to a corresponding branching process. A branching process can be associated to a rooted tree. The process begins with a root node which creates children, and those children create children of their own and so on. The number of children that each node generates is represented by a stochastic variable. Moreover, each node belongs to a specific generation defined by its distance to the root node of the corresponding tree. In the scheme of the branching process, the generation of elements of a given class in the BMPG model is approximated by the evolution of a corresponding branching process. Each copy of an element of a given class corresponds to one of the nodes of a rooted tree, except for the root which corresponds to the first apparition of this class of elements. In this way, each time the root is copied, a child of the first generation is born. In a similar manner, if a first generation node of this element is copied, a second generation child is born and so on.
Each element can have at most κ children, as it can only be copied while it is inside the memory kernel. In this process, the probability that a given node has i children can be well approximated by a binomial distribution, is the probability that a given element in the kernel is copied at every time step. In a branching process ruled by a binomial distribution, each link in the corresponding tree is generated with probability θ. Since the number of links in a tree is s − 1, where s is the number of nodes or elements, then the probability of generating s − 1 links is θ s−1 . The number of missing links to complete a κ-ary tree with s internal nodes is κs − (s − 1), where internal means nodes that are not leaves. Then the probability that this process generates one particular κ-ary tree with s internal nodes and κs − (s − 1) missing links is 26 The number of κ-arys trees with s internal nodes is 27 , and then the probability of having a κ-ary tree with s internal nodes and κs − (s − 1) missing links, or in the context of the BMPG model an element of popularity s, is: Using Stirling's approximation for κ ≫ 1 and s ≫ 1 we can obtain the asymptotic behavior of Eq. (10), Fig. 3 (bottom panel) we show the resulting distributions P (s) calculated from the generated database with the BMPG model, together with the theoretical prediction Eq. (11). It can be seen that the theoretical prediction fits very well the distribution calculated from the sequences generated by the model, when the values of κ are large enough; this confirms that a branching process with a binomial distribution is a good approximation in this scenario. Moreover, when p = p c = 1/κ, the exponential cutoff of distribution P (s) (Eq. (11)) has the form s 0 ∝ κ 2 . Therefore, for large values of κ, such as the ones used in Fig. 3 (top panel), the distribution tends to a power-law, P (s) ∼ s −3/2 , in a wide range of values of s as observed in the figure.
Finally, we fitted the exponents of the frequency distribution for different values of the memory kernel extension κ and the probability of introducing a new element in the kernel (not shown) and we found that for small values of p all the exponents α are nearly constant, while for larger values of p they grow as p increases. As expected, for p ≤ 1/κ, all exponents have nearly the same value α ≈ 3/2.

A. Correlations
As we mentioned in the introduction, the sequences generated in the original Cattuto's model have long-range correlations. Since the BMPG model has a finite sized memory kernel, we do not expect long-range correlations for the generated time series x[t]. To build the time series we assigned to each class of elements a random generated number from a Gaussian distribution 22 . In order to see how the correlation length behaves in this kernel, we computed the autocorrelation function C(∆t, t) of the time series x[t], where µ(t) = x[t] and σ 2 (t) = (x[t] − µ) 2 and ... means the expectation value. Considering the time series is stationary, then µ(t) = µ(t + ∆t) = µ 0 , σ(t) = σ(t + ∆t) = σ 0 and the autocorrelation function C(∆t, t) = C(∆t), which can be computed as We found that C(∆t) decays exponentially, C(∆t) ∼ e −∆t R , as can be seen in Fig. 4. We calculated the autocorrelation function for several values of κ and p = p c = 1/κ and computed the correlation length R by fitting the exponential decay function. In the inset of Fig. 4 we show the correlation length R as a function of κ. As can be seen R grows as a power-law with exponent γ = 1.9. Fig. 4 also shows the collapse of the autocorrelation curves when the time is measured in units of the correlation length R ∼ κ 1.9 .

B. Inter-event time analysis
Inter-event time analysis is common to many natural phenomena that exhibit memory such as earthquakes 2,28 , sunspots 29 , neuronal activity 30 and in several human activities 22,31,32 . In particular, following the analogy we already established between the occurrence of words in texts and our temporal series, the inter-event time distribution can be analyzed in a similar manner than in texts 17 . All the elements in the sequence have a corresponding time of occurrence. Given an element of class g of the time series, t (g) (j) ∈ {1, 2, ..., N } corresponds to the time of the j-th occurrence of g. Accordingly, the j-th inter-event time of a particular element g is defined as: If s (g) is the total number of occurrences of g in the generated series, i.e., its popularity, then we can estimate the average inter-event time as τ (g) ≈ t is the lifetime of the elements of class g, which is usually called the Zipf's wavelength of word g in text analysis 17 . For simplicity, let us write τ instead of τ (g) and also introduce the probability density f (τ ) of inter-event times for any particular class of elements in the series. A bursty dynamics is characterized by a sequence of short periods of high activity followed by long periods of low activity 33 . The presence of burstiness in a time series can be studied through the probability distribution of inter-event times f (τ ), where any departure of f (τ ) from an exponential distribution may indicate the existence of memory in the inter-event time process. For instance, in the case of words in a text a deviation is observed, which is usually well described by the stretched exponential distribution, or Weibull function 17 , For this distribution τ = τ 0 Γ( β+1 β ), where Γ is the Gamma function and 0 < β ≤ 1. If β = 1, burstiness is not present, as is the case for the generated series with Cattuto's or the Yule-Simon's models 22 . On the other hand, 0 < β < 1 is an indicator of the presence of burstiness and, as the value of β approaches zero, the apparition of bursts in the time series increases. A more clear indication of burstiness comes when the distribution of inter-event times is described by a power-law, where the degree of burstiness is characterized by the exponent η of the power-law; the lower the exponent, the more bursty the dynamics is. Finally, the presence of burstiness in empirical systems can also be characterized by calculating the coefficient of variation σ τ / τ , where σ τ is the standard deviation of the inter-event times. Using this coefficient we compute the burstiness parameter B as 33 , This parameter is greater than zero for a bursty dynamics and lower than zero when the dynamics becomes regular. When B = 0 there is neither burstiness nor regularity like, for instance, in a Poisson process. Using the approaches described above, we analyze the inter-event time dynamics, i.e. burstiness, of several elements in series generated by the BMPG model. First we define the level of activity of a given element g as the inverse of the Zipf's wavelength, as a g = 1/ τ (g) where τ (g) is the mean inter-event time of this element. Then, analyzing the time series we found that, on average, the activity of the elements depends on their popularity through the following relation a ∼ s 1/2 , i.e., the most active elements are also the most popular. Secondly, we analyze the inter-event time distributions for different sets of class of elements. Each set is built by aggregating elements with decreasing order of popularity until reaching a specific number of total inter-event times. In this way we construct four sets of ∼ 2 × 10 6 inter-event times with decreasing level of activity.
In the top panel of Fig. 5 we show the distribution of inter-event times f (τ ) for the most active set, using κ = 500 and three different values of the parameter p (p = 0.1p c , p = p c , and p = 10p c ). As can be seen, f (τ ) has a power-law tail in the three cases with an exponent η = 3, which indicates a clear presence of burstiness. For p = p c the whole distribution is well described by a power-law. However, the deviation of p from p c affects the distribution for short times. At p = 0.1p c the series is homogeneous, i.e. composed almost entirely of elements of a single class, increasing the number of shorter interevent times. As p increases (p = 10p c ), the number of different classes of elements also increases and the interevent times are longer, resulting in a flat distribution for lower values of τ , as can be seen in this figure. We also observed that, independently of the value of κ used in the model (not shown), the decay exponent remains the same. For comparison, in this figure we also show the inter-event time distribution obtained with Cattuto's model for κ C = 500 and p = p c and the corresponding fit using a Weibull distribution with β = 1. As we mentioned, in this case the inter-event time distribution is well explained by a Poisson process.
In the bottom panel of Fig. 5 we show the distributions of inter-event times f (τ ) for κ = 500 and p = p c , for several classes of elements grouped into subsets corresponding to different levels of average popularity. All distributions present a power-law decay with exponent η = 3 and collapse when the axes are rescaled by the factor τ = s −1/2 , where s is the average of the popularity of the corresponding subset.
We also calculated the burstiness parameter B for different values of κ and p, using in this case the most active set; see the inset of Fig. 5 (bottom panel). For all κ there is a range around κp ∼ 10 where a bursty dynamics is evidenced, while for low values of κp, B < 0, since the series becomes more regular. As the distribution of inter-event times for p = p c is a power-law with exponent η = 3, we can calculate the corresponding values of B for different values of κ by computing τ and τ 2 directly from the normalized distribution, resulting in and With these expressions we computed B (Eq.

IV. KERNEL ANALYSIS
In order to understand the emergence of burstiness in finite-sized kernels we analyzed the state inside the kernel in the BMPG model as a function of the model's parameters. In analogy with phase transitions, in a first approach we analyzed the state inside the kernel by defining an order parameter 0 ≤ φ ≤ 1 as follows, where n (g) κ is the popularity of the class of elements g inside the kernel of length κ. According to this definition, when the kernel is full of elements of a single class, the order parameter is equal to one. On the other hand, when all elements in the kernel are of different classes φ = 1/κ, taking its minimum value. In Fig. 6 (top panel) we show the mean value of the order parameter as a function of pκ for several values of the kernel extension κ, where φ is computed inside N/κ disjoint segments of length κ all across the generated series (N being the length of the series), and the mean φ is calculated over all N/κ segments. One can note that all the curves collapse when plotted as a function of κp, also it can be observed a continuous transition from a disordered state φ ∼ 1/κ ≈ 0 to an ordered state φ ≈ 1 as κp decreases. The inflexion of the curves that occurs at pκ ≈ 1, suggests of a transition at this point.
In order to test the existence of some form of criticality in the transition we estimate the fluctuations of the order parameter by computing the variance σ 2 φ = φ 2 − φ 2 . The variance as a function of pκ shows a peak at pκ ≈ 1 -see inset of the bottom panel of Fig. 6-indicating that fluctuations reach a maximum at the inflexion point of the order parameter. Since all the curves collapse when plotted as a function of pκ, the magnitude of the fluctuations is independent of the size of the kernel. Using a heuristic argument we can make an analogy with statistical mechanics by associating p to the temperature-since this variable introduces disorder in the kernel-and the kernel extension κ to the size of the system. If we think that the probability of introducing a new element results from an activated process, then p ∝ exp(−∆E/T ), where ∆E is the activation energy. From this we obtain, T ∝ −1/ ln(p), or in a simple assumption T = −1/ ln(p) which introduces all the temperature range. In this scheme we can also define the susceptibility as χ = κ T σ 2 φ = − ln(p)κσ 2 φ . A plot of the susceptibility χ as a function of T = −1/ ln(p) is shown in Fig. 6 (bottom panel), where it can be seen that the peak of the susceptibility grows with the size of the kernel and is expected to diverge at T = 0 in the thermodynamic limit κ → ∞, resembling the behavior of, for instance, the one dimensional Ising model. In the top panel of Fig. 7 we show kernel configurations for the three main states: super-critical (p > p c ), critical (p = p c ) and sub-critical (p < p c ). In the subcritical state, the kernel is full with elements of the same class (φ ∼ 1), in the super-critical there are many elements coexisting in the kernel with similar weights and in the critical case one can distinguish the most popular element by inspection. Finally, a better description of the fluctuations of φ in the kernel can be obtained by calculating the distribution of the order parameter. We computed the distributions at the point where the fluctuations are maximum, i.e. p = p c = 1/κ, for several values of κ. In the bottom panel of Fig. 7 we show these distributions corresponding to κ = 100, 300 and 500. The distributions do not depend on the value of κ and broaden around φ = 0.5, further confirming the results obtained when measuring the variance. In the inset we show distributions for the sub-critical case p < p c and the super-critical case p > p c for the case of κ = 500 with p = 0.1p c and p = 10p c . Both distributions are narrow, the first one is peaked close to 1 and the second one close to 0, as expected. The distribution of elements inside the kernel can be studied by computing the kernel entropy where, as before, n (g) κ is the popularity of the g-th class of elements inside the kernel. In Fig. 8 we show the mean value of the entropy, µ S (top panel) and its fluctuations, σ S (bottom panel), as a function of pκ, and for several values of κ. Similar to what was observed for the order parameter, all the entropy curves collapse when plotted as a function of pκ. The variance of the entropy behaves as the fluctuations of the order parameter, reaching a maximum value at p ∼ p c , also indicating the existence of a transition. However, it seems to increase with the size of the kernel. The entropy fluctuations are related to the presence of burstiness, since they imply that the rate at which an element is copied is a varying quantity.
It is interesting to note that the bursty activity of solar flares has a distribution of inter-event times with a power-law tail with exponent ∼ 3, in accordance with the results of BMPG model. The distribution of solar flares has been satisfactorily explained using a time-dependent Poisson process that results from the superposition of piecewise-constant Poisson processes 34 . Within this approach, the process is decomposed in time intervals, in which the inter-event time is consistent with a constant rate Poisson process. In analogy with solar flares, the copy rate of a given class of elements varies in the BMPG model when it is near the transition, as evidenced by the fluctuations of the entropy and the order parameter. The mentioned mechanism is robust against variations of the distribution of Poisson rates. Hence, it can be easily adapted to explain the distribution of inter-event times of the BMGP model.

V. DISCUSSION
In this work we have studied and characterized a preferential growth stochastic model with a bounded memory kernel. Specifically, we have modified the Yule-Simon stochastic process by introducing a finite-sized memory kernel of extension κ and called this model the Bounded Memory Preferential Growth model (BMPG model). We studied several statistical properties of the series of elements generated by BMPG model using numerical simulations and standard tools from stochastic processes.
We found that the lifetime distributions of the different classes of elements in the series follow a power-law. We derived the master equation that rules the probability of having n copies of a given class in the kernel at time t and we found that this equation is similar to the ones proposed for models of species lifetime in ecologies 23 , which consists in a birth and death process. Moreover, the exponents of the obtained distributions from the BMPG model, ∼ 1.9, are in the range of the reported in empirical systems 25 .
The BMPG model also generates elements whose popularity is distributed according to a power-law. This is in accordance with the highly skewed and long-tailed distributions found in the original Yule-Simon model and in the modified version introduced by Cattuto et al., which includes a long-tailed memory kernel. In particular, the exponent of the distribution (3/2) in the case of the finitesized kernel can be explained in terms of a branching process ruled by a binomial distribution, where the maximum number of children each node can have is equal to the size of the kernel. This approach works well for large enough kernels and also is able to explain the exponential cut-off observed in the distributions obtained with our model.
As expected, the correlations observed in the series of elements produced by BMPG model are short-ranged, unlike the long-range correlations observed in Cattuto's model. However, the correlation length in the finite-sized kernel grows nearly quadratically with the kernel extension.
An interesting effect observed in the presence of a finite-sized kernel-that is not observed neither in the Yule-Simon model nor in Cattuto's model-is that the tail of the inter-event time distributions of elements of the series, with a defined level of activity, decays as a power-law with exponent ∼ 3 and is independent of the activity level of the pool of elements. This means that the generated sequences of a pool of elements show bursts of activity followed by periods of latency, at least for a given range of the model parameters. Moreover, the inter-event time distributions collapse when rescaled by the level of activity of the pool of elements and, in this respect, we also found that the level of activity increases with the popularity of the elements in the series. The burstiness parameter is greater than zero in a range of the model's parameters, reinforcing the evidence of the presence of burstiness in the generated series. The mechanism of this bursty dynamics can be associated to a superposition of Poisson processes with a particular distribution of rates, in particular this mechanism was used to explain the inter-event time distribution of solar flares 34 that, like in BMPG model, is distributed by a power-law with exponent 3. In contrast, our previous studies of the Yule-Simon and Cattuto's models showed that the inter-event time distribution is well explained by a Poisson process with a single rate 22 .
To explain the presence of the bursty dynamics in the sequences generated by the BMPG model, we characterized the state of the memory kernel defining an order parameter that measures the occupation ratio of the ker-nel by the most popular class of elements and, also, as a measure of the distribution of classes of elements inside the kernel, we computed the kernel entropy. Studying the mean value and fluctuations of the order parameter and the entropy, we found that the state of the kernel goes through a transition from an ordered state (low values of p) to a disordered state (high values of p) and that there is a critical point at p = p c = 1/κ where the fluctuation of both quantities are maximum. Particularly, entropy fluctuations can be closely related to the emergence of burstiness in BMPG model, since it implies that the rate at which an element is copied is a varying quantity. Moreover, since all the curves collapse when plotted as a function of pκ, the magnitude of the fluctuations of the order parameter and the entropy result independent of the size of the kernel.
Finally, in complementary results not shown in this paper, in which we repeated all the analysis using an exponential short-ranged memory kernel Q(∆t) = 1 κe exp(−∆t/κ e ) with decaying rate κ e , we found similar behaviors for the popularity and inter-event time distributions to those obtained with the finite-sized kernel. This suggests that these statistical properties are independent of the specific functional form of the memory kernel, depending instead on how fast the distribution decays, i.e. on the asymptotic behavior for large ∆t.