# Full text of "Moments of the frequency spectrum of a splitting tree with neutral Poissonian mutations"

## See other formats

arXiv:1509.06500v2 [math.PR] 2 Sep 2016 Moments of the frequency spectrum of a splitting tree with neutral Poissonian mutations. Nicolas Champagnat^’^, Benoit Henry^’^ Abstract We consider a branching population where individuals live and reproduce independently. Their lifetimes are i.i.d. and they give birth at a constant rate b. The genealogical tree spanned by this process is called a splitting tree, and the population counting process is a homogeneous, binary Crump-Mode-Jagers process. We suppose that mutations affect individuals independently at a constant rate 6 during their lifetimes, under the infinite-alleles assumption: each new mutation gives a new type, called allele, to his carrier. We study the allele frequency spectrum which is the numbers A(k^ t) of types represented by k alive individuals in the population at time t. Thanks to a new construction of the coalescent point process describing the genealogy of individuals in the splitting tree, we are able to compute recursively all joint factorial moments of (A(fc, These moments allow us to give an elementary proof of the almost sure convergence of the frequency spectrum in a supercritical splitting tree. MSC 2000 subject classifications: Primary 60J80; secondary 92D10, 60J85, 60G51, 60G57, 60F15. Key words and phrases, branching process - coalescent point process - splitting tree - Grump-Mode- Jagers process - linear birth-death process - allelic partition - frequency spectrum - infinite alleles model - Levy process - scale function - random measure - Palm measure - Campbell’s formula. 1 Introduction In this work, we study a branching population in which every individual is supposed to have a lifetime independent from the other individuals in the population. Moreover, during their lifetimes, they give birth to new individuals at Poisson rate. The genealogical tree underlying the history of the population, the so called splitting tree, has been widely studied in the past [EKinilH]. In our model, individuals also experience mutations at Poisson rate. Each mutation leads to a totally new type replacing the previous type of the individual, this is the infinitely-many alleles assumption. Every time an individual gives birth to a new individual, it transmits its type to his child. This mutation process may model the occurrence of a new species in an area or a new ^TOSCA project-team, INRIA Nancy - Grand Est, lECL - UMR 7502, Nancy-Universite, Campus scientifique, B.P. 70239, 54506 Vandoeuvre-les-Nancy Cedex, France ■^lECL - UMR 7502, Nancy-Universite, Campus scientifique, B.P. 70239, 54506 Vandoeuvre-les-Nancy Cedex, France, E-mail: Nicolas . Chainpagnat@inria.fr, benoit .henryOuniv-lorraine .f r 1 phenotype in a given species. Our study concerns the allelic partition of the living population at a fixed time t, which is characterized by the frequency spectrum of the population, where each integer A(k, t) is the number of families represented by k alive individuals at time t. A famous example is the Ewen’s sampling formula which gives the distribution of the frequency spectrum when the genealogy is given by the Kingman coalescent model [8]. Other works studied similar quantities in the case of Galton-Waston branching processes (see [3] or [H]). The purpose of this work is to obtain explicit formulas for the moments of the frequency spectrum. The model with Poissonian mutations was studied in Champagnat and Lambert HE], where many properties of the frequency spectrum and the clonal family (the family who carries the type of the ancestral individual at time 0) were obtained. The population counting process {Nt, t G M+) and the frequency spectrum {A{k,t))f^^^ belong to the class of general branching processes counted by random characteristics. This class of processes has been deeply studied by Jagers and Nerman, who give, for instance, criteria for the long time convergence of such processes [la [m n la I23j . Using these tools, Richard and Lambert [la |22] shown the almost sure convergence of Nt, properly renormalized, to an exponential random variable in the supercritical case. The almost sure convergence of the ratios was proved in [4] using similar tools. Prom this, one can easily deduce the a.s. convergence of where W{t) is the average number of individuals at time t conditionally on Nt > 0. This result was stated without proof in [6]. An important tool is the so called coalescent point process (CPP): given the individuals alive at a hxed time, the coalescent point process at time t is the tree describing the relation between the lineages of all individuals alive at time t. Here, the term lineage of an individual refers to the succession of individuals, from child to parent, backward in time until the ancestor of the population. Roughly speaking, the CPP is the genealogical tree of the lineages of the individuals. This tool goes back to Aldous and Popovic [T] who introduced it for a Markovian model. Later in m, Lambert showed the general link between coaslescent point processes and splitting trees. In this work, we use the representation of the CPP of a splitting tree as an i.i.d. sequence of random variables We introduce a new construction of the coalescent point process. Thanks to a new formula for the expectation of an integral w.r.t. a random measure with specific independence structure, this allows us to obtain explicit recursive formulas for the moments of the frequency spectrum, valid for any parameter of the model. As an application, we prove the almost sure convergence of the frequency spectrum avoiding the use of the theory of general branching processes counted by random characteristics in the supercritical case. Of course, these moment formulas can also provide many valuable informations, for instance, on the error in the aforementioned convergence, which suggest CLT-type results. Indeed, such results can be proved m but leads to many additional difficulties. Section [2] is dedicated to the description of the models and the introduction of the classical tools (from [T9|) used in the sequel. In Section [3] we state our main results (Theorems 13.II and 13.2|) giving explicit formulas for the factorial moments of the frequency spectrum {A{k, expressed in terms of the lower order moments. SectionHis devoted to the proof of an extension of the Campbell formula concerning the expectation of the integral of a random process with respect to a random measure when both objects present some local independence properties. Even if this result is used as a tool 2 for this work, it is interesting by itself. Section [5] is devoted to the proof of the moments formulas stated in Section [3l We give the key decomposition of the CPP in Subsection l5.ll The rest of Section [5] is dedicated to the proofs of theorems 13.11 and 13.21 In particular, we provide a computation of the first moment much simpler than the one of [1] . We give the asymptotic behaviour of higher moments in Section [6l Section [3 is dedicated to the proof the following law of large numbers: limllM t-)-oo W{t) almost surely, k>l, where £ is an exponential random variable with parameter 1 conditionally on non-extinction, and the constants are explicit. 2 Splitting trees and the coalescent point process We study a branching model of population dynamics called splitting tree where individuals live and reproduce independently from each other. Their lifetimes are i.i.d. following a given arbitrary distribution Py on (0, oo]. During this lifetime, an individual gives birth to new individuals, with binary reproduction (i.e. new individuals appear singly), at independent Poisson times with positive constant rate b until his death. We also suppose that the population starts with a single individual called the root or ancestor. A graphical representation of a splitting tree is shown in Figure [TJ The finite measure A := 6Py is called the lifespan measure, and plays an important role in the study of the model. Moreover, we assume that individuals undergo mutations at Poisson times with rate 0 during their lifetimes independently from each other and from their reproduction processes. Each new mutation leads to a brand new type replacing the preceding type of the individual {infinitely many alleles model). Parents yield their current type to their children. A family at a given time t is a set of alive individuals carrying the same type at time t. Our purpose is to study the distribution of the sizes of families in the population at time t. For our study, it is easier to work with the genealogical tree of the population alive a time t. Indeed, since mutations are Poissonian, the different types in the population only depend of the coalescence times of the lineages of the alive population. In order to derive the law of that genealogical tree, we need to characterize the law of the times of coalescence between pairs of individuals in the population, which are the times since their lineages have split. In [19], Lambert introduces a contour process Y, which codes for the tree, and hence its genealogy. Suppose we are given a tree T, seen as a subset of M x (Ufc>oN^) with some compatibility conditions (see m), where N refers to the set of non-negative integers. On this object, Lambert constructs a Lebesgue measure A and a total order relation ^ which can be roughly described as follows: let X, y in T, the point of birth of the lineage of x during the lifetime of the root splits the tree in two connected components, then y x if y belong to the same component as x but is not an ancestor of X (see Figure [2]). If we assume that A(T) is finite, then the application, (/?:¥—>• [0, A (T)], X ^ X{{y\yfix}), 3 t J Figure 1: Graphical representation of a splitting tree. The vertical axis represents the biological time for the population. The horizontal axis has no biological meaning. The vertical segments represent the lifetimes of the individuals; the lower bounds their birth-times and the upper bounds death-times. The dotted lines denote the filiations between individuals. is a one-to-one correspondence. In a graphical sense (see Figure [2]), (p{x) measures the length of the part of the tree which is above the lineage of x. The contour process is then defined, for all s, by X I Figure 2: Graphical representation of the set {y € T | y ^ x} (in grey). n := Hm (s)) , where IIk is the projection from M x (Ufc>oN^) to M. In the case where A(T) is infinite, one has to consider truncations of the tree above hxed levels in order to define contours (see |19] for more details). 4 In a more graphical way, the contour process can be seen as the graph of an exploration process of the tree: it begins at the top of the root and decreases with slope —1 while running back along the life of the root until it meets a birth. The contour process then jumps at the top of the life interval of the child born at this time and continues its exploration as before. If the exploration process does not encounter a birth when exploring the life interval of an individual, it goes back to its parent and continues the exploration from the birth-date of the just left individual (see Figure [3|). It is then readily seen that the intersections of the contour process with the line of ordinate t are in one-to-one correspondence with the individuals in the tree alive at time t. In [TO], Lambert shows that the contour process of the splitting tree which has been pruned from every part above t (called truncated tree above t), has the law of a spectrally positive Levy process reflected below t and killed at 0 with Laplace exponent V’(x) = X — [ (l — e“™) A{dr), x G M+. J (0,oo] The largest root of ip, denoted a, is called the Malthusian parameter and, as soon as a > 0, gives the rate of growth of the population on the survival event. The time of coalescence of two individuals alive at time t corresponds to the amount of time one needs to go back in the past along their lineages to get their first common ancestor. The time of coalescence between an individual alive at time t and the next one visited by the contour is exactly the depth of the excursion of the contour process below t between this two successive individuals (see Figure [3|). We are interested in the sequence of coalescence times shown in Figure [3l which contain the minimal information needed to reconstruct the genealogy at time t. More precisely, it follows from well known fluctuation properties of spectrally positive Levy processes (see |18] . Theorem 8.1 for spectrally negative Levy processes) that the law of the depth H of an excursions below t is given by where W is the scale function of the Levy process characterized by its Laplace transform e {r)dr = 1 ip{ty t > a. ( 2 . 1 ) Since the contour process is strong Markov, the sequence of excursion depths is i.i.d. To summarize, given the population is still alive at time t, one can forget the splitting tree and code the genealogy of the living individuals alive at time f by a new object called the coales cent point process (CPP) at time t shown in Figured] Its law is the law of a sequence where the family is i.i.d. with the same law as H, stopped at its first value greater than t, and Hq is deterministic equal to t (see Figure d])- The heights Hi,..., Hmi-i are called branch lengths of the CPP. Remark 2.1. Let N be an integer valued random variable. In the sequel we say that a random vector with random size (Xj);^<^<^ form an i.i.d. family of random variables independent of N, if and only if {Xi,...,Xn) = 5 Figure 3: Construction of the contour process and link between the excursions of the contour process and the times of coalescence in the tree. where i Xi) is a sequence of i.i.d. random variable distributed as Xi independent of N. V J i>l From the CPP at time t, the genealogical tree of alive individuals at time t is obtained considering that the ith branch coalesces with the first branch on its left such that Hj > Hi (for j < i) (see Figured]). The number Nt of alive individuals at time t in the splitting tree is then given by Nt = inf{i > 1 \ Hi > t}. From the comments above, Nt is a geometric random variable given Nt > 0. More precisely, = = V.>1. Finally, we can define the occurrence of mutations directly on the CPP as the atoms of a random measure. Let T* be a Poisson random measure on (0, t) xN with intensity measure 6X0 C where A is the Lebesgue measure on (0, t) and C is the counting measure on N. The mutation random measure on the CPP at time t is then defined by M {da, di) = lHi>t-a^i<Nt'P {di, da ), (2.2) 6 I I Figure 4; A coalescent point process for 16 individuals, hence 15 branches. The filiation relation between lineages is indicated by horizontal dashed lines. where an atom at (a, i) means that the ith branch of the CPP experiences a mutation at time t — a. Note that, when one looks at the allele distribution at time t, this construction is equivalent to the construction of Poissonian mutations on the original splitting tree [5] . We assume that each mutation gives a totally new type to its holder (infinitly-many alleles model) and that the types are transmitted to offspring. This rule yields a partition of the population by type at a given time t. The distribution of the frequency of types in the population is called the frequency spectrum and is defined as the sequence {A{k, where A{k, t) is the number of types carried by exactly k individuals in the alive population at time t (or, for short, the number of families of size k at this time) excluding the family holding the original type of the root. In the study of the frequency spectrum, an important role is played by the family carrying the type of the root. The type of the ancestor individual at time 0 is said clonal. Moreover, at any time t, the set of individuals carrying this type is called the clonal family at time t. We denote by ZQ{t) the size of the clonal family at time t. To study this family it is easier to consider the clonal splitting tree constructed from the original splitting tree by cutting every branches beyond mutations. This clonal splitting tree is a standard splitting tree without mutations where individuals are killed as soon as they die or experience a mutation. The new lifespan law Pvg is then the minimum between an exponential random variable of parameter 6 and an independent copy of V. As a splitting tree, one can study its contour process whose Laplace exponent is given, using simple manipulations on Laplace transforms, by Tpe{x) = X — 1 - e-'-") ke{dr) x'ip{x + 6) X + 6 In the case where a — 0 > 0 (resp. a — 0<0, a — 0 = 0) the clonal population is supercritical (resp. sub-critical, critical), and we talk about clonal supercritical (resp. sub-critical, critical) case. We denote by Wg the scale function of the Levy process induced by this new tree, related to V'e 7 P(Z„(t) = k\Z„(t)>0) = ^^(l-^'j . Moreover, E [Nt] satisfies the renewal equation f{t) = ¥{V >t) + b [ f{t- s)P {V > s) ds, Jo which, applied to the clonal splitting tree, allows obtaining after some easy calculations, F{Zo{t) > 0) _ e-^^W{t) P {Nt > 0) “ We{t) ’ from which one can deduce P(Z„(*) = . I «, > 0 ) = (l - . V. > 1 . (2.3) and P(Z„(t)= 0 |V,> 0 ) = l-^^. The main idea underlying our study is that the behaviour of any family in the CPP is the same as the clonal one but on a smaller time scale. For the rest of this paper, unless otherwise stated, the notation P^ refers to P (• | > 0) and Pqo refers to the probability measure conditioned on the non-extinction event, denoted Non-Ex in the sequel. Finally, we recall the asymptotic behavior of the scale functions W{t) and Wg{t), which is widely used in the sequel. Lemma 2.2. (Champagnat-Lambert Assume a > 0, there exists a positive constant 7 such that e-’^^il;'{a)W{t) -1 = 0 . In the case that 6 < a (clonal supercritical case), Wg{t) ~ t^oo In the case that 9 > a (clonal sub-critical case), In the case where 9 = a (clonal critical case). From this lemma, one can obtain that the probability that the clonal family reaches a fixed size at time t decreases exponentially fast with t. Corollary 2.3. In the supercritical case (a > 0), for any positive integer k, Pi (Zo(t) = k) = 0 (e-^*) , where 6 is equal to 6 (resp. 2a — 9) in the clonal critical and sub-critical cases (resp. supercritical case). Remark 2.4. Note that Lemma \2.2\ imvlies in particular that, for any positive integer k, tW{tf-^ = o (W{t)^^ . 3 Statement of main results In this section are stated the main results of the paper. In particular, the formulas for the moments of the frequency spectrum are given in Theorems 13.11 and 13.21 For two positive real numbers a < t, we denote by the number of individuals alive at time t — a who have descent alive at time t. In the CPP at time t, corresponds to the number of branches higher than t — a, that is Card {Hi \ i G {0,... , — 1}, Hi > t — a}. In the sequel, we use the following notation for multi-indexed sums: let K, N be two positive integers and li,... ,1 k some non-negative integers, then the notation E - refers to the sum E + ---+n^=lK In order to lighten notation, we also use the convention that for any integer n and any negative integer k, = 0 . Recalling that Pt is the conditional probability on the event {Nt > 0} and that E* is the corresponding expectation, we now state our main results. Theorem 3.1. For any positive integers n and k, we have, E. A{k, t) n = E. < f 9N it) <-a E E, niH-hn (t) =n-l Nf ^ t — a A{k, a] ni ^Zoia)=k N (t) Re. m=2 A{k, a) ftm da . 9 Theorem 3.2. Let ni, tin and ki,... ,kN be positive integers. We have where 5 refers to the Kronecker symbol and di:N,i = • • • ,bN,i)- In Subsection 15.31 we also give formulas for the moments Et[ni Zo(t)=i]- These formulas are explicit in the sense that any moments can be computed recursively from the lower order moments. As an application, these formulas we obtain an elementary proof of the following law of large numbers. Theorem 3.3. We have, £ t-j-oo il;'[a) (Ck) k>l ! a.s. and in where £ is an exponential random variable with parameter 1 conditionally on non-extinction, and Ck is given by Ck ■ = -f Oe -es Weis) 1 - 1 fc-i ds, Mk e N\{0}. Weis), But before proving such Theorems 13.11 and 13.21 we need to introduce an important tool allowing to compute expectations of integrals w.r.t. random measures presenting particular independence structures. This is the purpose of the next section. 4 Expected stochastic integral using Palm theory. In this section, we use notation and vocabulary from [7]. Let A a be Polish space. We recall that a random measure is a measurable mapping from a probability space to the space Adj, (A) of all boundedly hnite measures on A, i.e. such that each bounded set has hnite mass. The purpose of this section is to prove an extension of the Campbell formula (see Proposition 13.1.IV in m), giving the expectation of an integral with respect to a random measure when the integrand has specihc “local” independence properties w.r.t. to the measure. For this purpose, we need to introduce the notion of Palm measure related to a random measure A/". The presentation is borrowed from [7]. So let A/" be a random measure on A with intensity measure p, and (V^,, x G A) be a continuous random process with value in M_|_. Since this section is devoted to prove relations concerning only the distributions of J\f and X, we can assume without loss of generality that our random elements X and J\f are dehned (in the canonical way) on the space C(A) X Mb (A), 10 where C{X) denotes the space of continuous function on X. This space is Polish as a product of Polish spaces. We denote by T the corresponding product Borel cr-held. For the random measure A/", the corresponding Campbell measure is the measure dehned on a {J- X 13 (X)) by extension of the following relation on the semi-ring T x 13 (X), Cj^{F X B) = E[lFAf{B)], F€F, B€B{X). It is straightforward to see that Cj\f is cr-finite and for each F in F the measure Cj\f {F x •) is absolutely continuous with respect to /x. Then, from Radon-Nikodym’s theorem, for each F & F, there exist y ^ X Py (F) in (y) such that, Cm{FxB)= [ Py{F) JB uniquely dehned up to its values on /x-null sets. Since our probability space is Polish, P can be chosen to be a probabilistic kernel, i.e. for all F in F, y £ X 1 -^ Py {F) is mesurable, and for all y in X, F £ F ^ Py {F) is a probability measure. The probability measure Py is called the Palm measure of Af at point y. Since X is continuous, it is B {X) (g) F measurable, and it is easily deduced from this point that E Xx M{dx) lEp, [^x] d-{dx), (4.1) where Ep^ denotes the expectation w.r.t. Px- Formula ()4.ip is the so-called Campbell formula. We can now state, the main results of this section which are the aforementioned extensions of the above formula. Theorem 4.1. Let X be a continuous process from X to M+. Let M he a random measure on X with finite intensity measure fi. Assume that X is locally independent from N, that is, for all x £ X, there exists a neighbourhood 14 of x such that Xx is independent from J\f {Vx C •). Suppose moreover that there exists an integrahle random variable Y such that \Xx\ <Y, Vx G X, a.s. and E[yAA(T)] < oo. Then we have E [Xx] y (dx). (4.2) However, the continuity condition of the preceding theorem prevent the application of this result to our model. We need a more specihc result. 11 Theorem 4.2. Let X be a process from [0,r] x X to M+ such that is cadlag for all x and Xg^. is continuous for all s. Let M he a random measure on [0,T] x X with finite intensity measure fi. Assume that, for each s in [0,T], the family (Xs^x,x G X) is independent from the restriction of M on [0,s], that there exists an integrable random variable Y such that <Y, Vx G X, Vs G [0,t], a.s. and that E[yAr(T)] < oo. Then we have E '[o,r]xA' Ys,x Af {ds, dx) <[0,T]xX E[Xs,j:] fi{ds,dx) (4.3) Let |l,n] denotes the set Nn [l,n]. Before going further, we recall that a dissecting system is a sequence {Anj,j G [Ijof nested partitions of X, where is an increasing sequence of integers, such that lim max diam An , = 0 . n^oojfzli^Knl In the spirit of the works of Kallenberg on the approximation of simple point processes, the proof of Theorems 14.11 is based on the following Theorem which can be found in [16] or in |20| (Section WIII.9). Theorem 4.3 (Kallenberg |16jl. Let p, and v be two finite measures on the Polish space X, such that p is absolutely continuous with respect to u. Let f be the Radon-Nikodym derivative of p w.r.t. u. Then, for any dissecting system {Anj,j G [Ij-fi^nl}„>o of X, we have JXn /A \ lim ^ ls^A„ i = f{s), for p-almost all s £ X. Proof of Theorem \4-l\ Let {Anp,j G [[1,K„]}^>q be a dissecting system of X. We denote by An{x) the element of the partition (^n,j)i<j<x„ which contain x. Let also T be a denumerable dense subset of X. We use lower and upper approximations of X. More precisely, let for all positive integer k and for all a un T, Kk XW : = inf {X,|s G rnMfc(x)} = i=i _ Kk xjf : = sup{Xs|s G Tn^fc(x)} = 1=1 with xf'^ = sup{Xs|s G Aj^k nT} and = inf G Aj^k n T} . —J 12 Note that the supremum and infinimum are taken on T D Ak{a) to ensure that and are measurable, but the set T could be removed by continuity of X. We remark that, for any j, k, the measure -EF(fc) E (•) is absolutely continuous with respect to p and it follows from Campbell’s formula ()4.ip that the Radon-Nikodym derivative is Thus, it follows from Theorem 14.31 that, /i-a.e.. _(fc) x) _(fc) X E = lim xf'^M{An{x)) n^oo fl{An{x)) /7 \ Then, since W •' and X are finite sums of such random variables, E and lEp, = lim xi^W{An{x)) n^oo p (Anix)) E = lim '^^M{An{x)) n^oo p (Anix)) outside a //-null set which can be chosen independent of k by countability. Now, since it follows that lEp, < ,i„, i„f < .i,n sup < Ep. n^oo E[X(T„(x))] n^oo E [X(T„(x))] X [kj /t—almost everywhere. Now, since X is continuous, xl^^ —^ X, and Xl'^) X. k^oo k^oo it follows, from Lebesgue’s Theorem, that ^ E[X,M{An{x))] , ^ = hm , ... , // - almost everywhere. n^oo !iL[N[An{x))\ Now, since Anj is a dissecting system, there exists an integer N such that, for all n > N, An{x) C 14 That is, for n large enough, E [X^M{An{x))] E[X'(u4,,(x))] = E[X^ Finally, Ep^ [Xx] = E [X^;] , // — almost everywhere. And the conclusion comes from ()4.ip . □ 13 Proof of Theorem 14-^ Clearly, we may assume without loss of generality that T integer M, M-l X, M V k=0 X sG k fc+1 \. M ) 1. Define, for all Since X^^x is cadlag, this sequence of processes converges pointwise to {Xg^x, s G [0,1]) for all uj. Then, by Lebesgue’s theorem. E / Xs,x X'ids, dx) J[0,l\xX '[0,l]xA’ M-l fi{ds,dx), lim fc+iivV ^Ps M—>-oo /mil M k=0 X k+l M ’ fi{ds, dx). Clearly, for fixed k, {s,x) Xk+i ^ is continuous on []^,^^] x X. Hence, Theorem 14.11 can be applied to [A Mil X X Lm’ m 1 ^ ^ {s,x) I —y X k+l M to conclude the proof. □ 5 Proofs of the moments formulas The main goal of this section is to prove Theorems 13.11 and 13.21 Their proofs are given in Subsection [Ql Subsection 15.31 is devoted to the computation of the joint moments of the frequency spectrum with Ti-Zo{t)=e- Subsection 15.41 shows an application of our theorems to the computation of the covariances of the frequency spectrum. The next subsection gives the key decomposition of the CPP. 5.1 Recursive construction of the CPP Here we describe the general idea of the proof of Theorems 13.11 and 13.21 and give an alternative construction of the CPP. We consider the CPP at some time t. Suppose that a mutation occurs on branch i at a time a. Then, by construction of the CPP, the future of this family depends only on what happens on the branches {Hj,i < j < t) (see Figured]), where T = inf {j > i \ Hj > a} . In fact, this set of branches is also a CPP with scale function W stopped at a (we talk about sub-CPP), and the number of individuals carrying the mutation at time t is the number of clonal individuals in this sub-CPP. To capitalize on this fact, we introduce a construction of the CPP which underlines this inde¬ pendence. Suppose we are given a sequence of coalescent point processes stopped at time a with scale function W. Then, take an independent CPP V, where the law of the branches corre¬ sponds to the excess over o of a branch with scale function W conditioned to be higher than a. As stated in the next proposition, the tree build from the grafting of the above each branch of V is also a CPP with scale function W stopped at time t (see Figured]). 14 0 0 1 2 3 4 5 6 t 7 8 9 10 Figure 5: The future of a mutation only depends on a sub-tree of the genealogical tree. -p(l) -^(2) -plS) -pii) Figure 6: Grafting of trees. Proposition 5.1. Let be an i.i.d. sequence of coalescent point processes with scale function W at time a, and let be their respective population sizes. Let V be a coalescent point process, independent of the previous family, with scale function W{t) : = W{t + a) W{a) at time t — a, and let Nt-a denotes its population size. Let Sq := 0 and i=i Then the random vector [Hk, H. = 0 < k < S^ defined, for all k >0, by if Si < k < 5'i+i, for some i > 0 , Vi + a if k = Si, for some i > 0, 15 is a CPP with scale function W at time t, for which = Nt-a cl.s. Proof. Note that Pfo = Vq + a. To prove the result, it is enough to show that the sequence is an i.i.d. sequence with the same law as H, given by The independence follows from the construction. We details the computation for the joint law of and leave the easy extension to the general case to the reader. Let k > I he two positive integers, and let also si, S 2 be two positive real numbers. We denote by S the random set {5j, i > 1}. Hence, P {Hk < si, Hi < S 2 ) {H < si \ H < a)W> {H < S 2 \ H < a)¥ {I i S, k S) + ¥[a + H <S]^¥{H <S 2 \H <a)¥{liS, k e S) + F{H <si\H <a)F(a + H <S 2 'jr{l€S, k ^ S) + P ^£i + H < P ^a T H < S2^ P (/ G S, k G 5), where H denotes a random variable with the law of the branches of P, i.e. such that f(h>s] = ' , Vs>0. W{s + a)'‘ Now, since the random variables Si are sums of geometric random variables, we get F{Hk < si, Hi < S2) = (pF {H < si \ H < a) + {I — p)P (^a + H < ^ ^pP {H < S 2 \ H < a) + {I — p)P [a + H < 52 )), with p = P (A: G <S). Moreover we have, P < s) = ^ {P (^fc < s I GlSi_i, 5,1) P {k Gl5i_i, 5, i>l + F{Hk<s\k = S^)F{k = S^)^ =P < s I F < a) P I IJ {A: Gl5i_i, 5^} ,i>l + P (iL < s I // > a) P IJ {A; = 5i} Since the Sfs are sums of geometric random variables of parameters W{t — a)~^, they follow binomial negative distributions with parameters i and W{t — a)~^. Hence, since 0, if A; < i, '{Si = k)= ■ (t-J)' W{t - a)-* {l-W{t-a) k—i ' else. 1 16 some elementary calculus leads to P U ,i>l = P(F > a), yk € N. which ends the proof. □ Proposition 15.11 shows that, under Pf, Nj:^_}a is geometrically distributed with parameter Moreover, although Nt-a may depends on informations which do not appear in the CPP stopped at time t, only depends on the lineages of the population at time t. A very simple application of this construction is the derivation of the expectation of A{k,t). Recall that this expectation was first calculated in [4], with a much more complicated proof. Theorem 5.2. Cor. 3-4]) For any positive integer k, we have Proof. Since A{k, t) is the number of types represented at time t hy k individuals, it is equivalent to enumerate all the mutations and ask if they have exactly k clonal children at time t. This remark leads to the following integral representation of A{k,t): A{k,t)= [ lzUa)=k ■^{da,di) , (5.1) ./[0,4xN ° where M is defined in (12.2p , and Zq (a) denotes the number of alive individuals at time t carrying the same type as the type carried at time t — a on the ith branch of the CPP of the individuals alive at time t (the notation comes from the fact that ZQ{a) corresponds to the size of the clonal family in the sub-CPP induced by the ith individual at time t — a, see Figure [5|). From Proposition 15.11 it follows that lz^(^a)=k satisfies the conditions of Theorem 14.21 so / t j-t Op—9a / 1 \ k—1 9 P, (Zo(a) = k) da = W{t) (^1 - j da, using (|2.3I) . □ 5.2 Proof of Theorems 13.11 and 13.21 Let a and t be two positive real numbers such that a < t, and n a positive integer. We call fc-mutation, a mutation represented by k alive individuals at time t in the splitting tree. Let (A(®)(A:, be the frequency spectrum in the 1-th subtree of construction provided by Proposition 15.11 To count the number of n-tuples in the set of A:-mutations, we look along the tree and seek for mutations in the CPP. For each fe-mutation encountered, we count the number of (n —l)-tuples made of younger A:-mutations. The (n — l)-tuples should be enumerated by decomposition in each subtree in order to exploit the independence property of the subtrees of Proposition 15.11 Suppose that a 17 mutation is encountered at a time a, then the number of (n — l)-tuples made of younger mutations is given by N, it) E n ni-\ - \-n /-f\ =71—1 m=l at) '' t — a So the number of n-tuples of fe-mutations is given by '[0,t]xN t't-a ^ziia)=k n ni-l - \-n (t) =71—1 m=l t — a A^^\k,a) Ur, M{da, di), i / y / ^Zl{a)=k ^[0,t]xN - E n Ur, IjvW =i ■^ida,di), (5.2) where ZQ[a) was defined in the proof of Theorem 15.21 Finally, using the independence provided by Proposition 15.11 it follows from Theorem 14.21 applied to all the integrals with respect to the random measures _^A/’(da, di), that A{k, t) n = E. '[0,t]xN E n^ + ...+n u) E„ =n—1 A{k^ a) ni ^Zo{a)=k K it) riE. m=2 A{k, a) Ur M {da, di) Finally, using that the M{da, di) = lHi>t-aTii<Nt'P{di, da) where V independent from the CPP (and, hence, from it follows that E. A{k, t) n = E* m it) t — a He m=2 E E„ N, niH- \-7i (i) =71—1 n: ^ t — a A{k, a] 'ktm A{k, a] ni ^Zoia)=k = eJ mil ^ _i— C{di)9da, ^A{k, a) Ea niH- \-n cf) =n—1 Nl > t — a ni lZo(a)=fc N, it) He. -■ 771=2 A{k, a] Tim da. (5.3) which ends the proof of Theorem 13.11 The proof of Theorem 13.21 follows exactly the same lines, and we leave it to the reader. 18 5.3 Joint moments of the frequency spectrum and lzo{t)=i In order to compute the terms of the form ^Zo{t)=l involved in m, we need to extend the representation (j5.2p of to take into account the indicator function of {ZQ{t) = 1}. To do this, when integrating w.r.t. M{da, di), we need to ask that the sum of the number of clonal individuals in each subtree for which the type at time t — a is the ancestral type, is equal to k. We begin with the case E [A{k,t)lzQ(t)=i] in order to highlight the ideas. In this case, we have the following result. Proposition 5.3. E, [A{k, t)lz,it)=i] =Et [ Pa (Zoia) = k) ^ J] t /0 n \ \ n _;»•_i + Ej / zW(a)Ea (^o(a) = ^) Y. 11 “ '■+■■■+'4*1,.,-,-' (5.4) Proof. Recalling that refers to the size whole population in the lower tree V of the construction of Proposition 15.11 we similarly define Z^'^ (a) as the size of the clonal population in the same tree (with the convention that mutations that occur at time t — a, i.e. on the leaves of the tree V, do not affect ZQ\a)). It follows that A{k,t)lzg{t)=i - f -7- J[o,t]xn I i 1 Zl{a)=k cr£l a is ancestral h+-+l (t) Y n ^z;;Ha)=k^ida,dj), =l i=l (5.5) where I is the set of injections from |l,..., (a) — Bj'^ to |l,..., Bj is the indicator function of the event {the jth individual at time t — a is clonal} , and ”cr is ancestral” denotes the event that the individuals ai,, a „ at time t — a have the ■^0 W~^j 19 ancestral type. Now, using the same method as in the proof of Theorem 13.11 leads to Et \A{k,t)lLZQ(t)=i\ = Et [ Fa{Zo{a) J [0,t]xN = Et [ Pa(^o(a) J[0,t]xN = Et [ Pa(^o(a) J[0,t]xN k)-£i a is ancestral n ^a{Zo{a) (j^I - \-l (i) — I i —1 ^ E n Pa iZo{a) = k) N{da,dj) ll-\ - \-l = l i = l N{da, dj) k) E ll-\ - \-l n ^a{Zo{a) zW(a)-S^ = l 2 = 1 h) iH, >t—a^j<Nt'Pi,d(l, dj). Now, ZQ*^(a) is not independent from V, but we have that ZQ*^(a) is independent from V {[a,T] n •) for all a <T. Hence, Theorem 14.21 applies to Xa := ZQ\t — a) and V defined for all measurable set A C [0, t] by and, as in (|5.3p . Et [A{k,t)^Zoit)=i] p(A)=V{t-A), = E. '[0,4]xN Pa iZo{a) = k) E Pa (^o(a) = k) tH^>t-aAj<Ntdda C{dj). =l i=l Finally, integrating with respect to C{dj) leads to the result. □ This last proposition in not exactly a closed formula since its involves the law of the couple ^o\4)- To close the formula, we need an explicit formula for the joint generating function of N'E Let F{u,v) = Ej U^t-ay^O W , U,V G [0,1], which is given, thanks to Proposition 4.1 of Il],by F{u, v) = u Wit — a,u) ( e °^W{t — a^u) 1 - W{t-a) \ ^^+We{t-a,u) ) where W is the scale function of the lower CPP, V, defined in Proposition 15.11 Wit) (5.6) Wit,u) := Wit)-u (wit) - 1 20 and We{t, u) := u) + 6 [ W{s, ds. Jo Proposition 5.4. For all k > 1 and I > 0, Et [A{k,t)\zo{t)=}\ i) “( Tw) (vF,(a)2p(Zo(a) = 0 ) r,—6a Hj 1,1- + [ Ea {ZQ{a) Jo -‘)eG: i=i where Hj{u,v) : and 1 - 1 1-3 ( e-^^W{a) We{a)) \W 0 {af¥ {Zo{a) = 0 ) J ' We{a) ) We{a) J 9da 9da, nE }. Gj := v^-^diEt v4\^) Proof. Let Ai and A 2 denote the two terms of the r.h.s. of (15.41) . We detail the computations of Ai. The case A 2 is similar. Ai = Et (Tvil - zW(a)) Pa (Zoia) = k) X E i=i J Y, l[Ea{Zo{a)=ii)Ea{Zo{a) = 0) _ r\\^oA)—j 9da. £iH- \-£j=l i=l i ,>0 Since, from (12.3p . 3 llEa{Zo{a) i=l we get <*)=n e-^“W(a) / We{a)^ /e-®“W(o)yY 1 Y"^' V We{aY J V~Weia)) e-^“W(o)Y/ 1 Weia)^ J V fW 1-3 P, (Zo(a) = 9da 21 Finally, if we define, for all integer j, Hj{u,v) := v^diduuF{u,v) — |, and we get G, := v^-^diEt y4"Ha) . 1 ^'■V e-<^-W{a) We{a)) [We{a)^F{Zo{a) = 0)) ^V’ We {a) J + Pa ( o(a) ) ^ (j _ ij j! Wg{a)) \We{afE{Z^{a) = 0) J ( We{a) ) □ These ideas also lead to the following formula, which is proved similarly. Corollary 5.5. Let ni,..., nw and fci,..., /cw be positive integers. Let I be a positive integer. We have E, N .1=1 m N N (t) Y.E, [Nl‘\-zf(a E n E. =ny,M-h-.N,l m=Z^‘^(a)+2 ^t — a ^2"l- (t) 4 (“)+i ■ N n _i=l A{ki,a) nl 4‘^(“)+i X n E. N TT I Mh,ay 111 j^Zo(a)=l„ m=2 \_i=l N nt E„ N TT / Mh,ay ill )^Zo{a)=k^ .i=l n Oda N „ + EeT zitHa) Y. n '^=1 nl^^H- =ni,N-&i-.N,lm,=Zy\a)+l ■ N n .1=1 A{ki,a) nt- -1"^ ftl Zyya) + 1 zy\a) n E. m=2 N TT I Mh,a) 111 yi ) Zo(a)—lm ,2 .i=l ^ ^ E„ N TT I Mki,ay J-J. i rni j ZQ(a)—kK J=1 n\ Oda. 22 Proof. According to Section 15.21 we have the following integral representation. N n ‘’)=E E 'ff ft “iy A I V 2 . / j I t/[0,i]xN 2_-N I I I N X m — 1 7 1 ^ ^Tn J Now, using this equation in conjunction with the decomposition of Ti.Zo{t)=i used in Section 15.31 we have N 2=1 n. N Zo{t)=l = ^ / ^Zpa)=ki is ancestral * / ^ J[0,t]xN ^ <7G1 N E nn riV + ...+„l:iV =ni,jv-5l:JV,i ”^1 = 1 N} ^ t — a - \-i (f) A^^{ki,a) n, mi J\f{da, dj) m,=i (zW(a)-i?,)! We refer the reader to the proof of Proposition 15.31 for the definitions of I,Bj^ and the event {cr is ancestral}. The definitions of A^'^\k,a) and z'^^a) can be found in the beginning of this section. Now, we take the expectation in the last equality. Thanks to the method used in the proof of Proposition 15.31 we have Er N .i=l N m / J Ej \ / / J 1(T is ancestral K=1 Nda r N ^ n n - =ni,N-&i-.N,l "^1 = 1 ,, L*=l t — a - \-l (^\ =l A^^{ki,a) nt Zlf\a)-Bj X Ea m2=l ■ N n .*=1 A'^'^'i{ki,a) n-L 1. {a)=lm2 E„ N _ 2=1 m Z^{a)=kK M{da, dj) (Z«(a) -Bj]\ 23 where mi ^ a means that mi ^ d ^|l,..., . Now, following, as above, we get E, N n 2=1 N A{ki,t) m ^Zo{t)=l (t) K=1 I ^ ^ is ancestral [0,t]xN =ni:Ar-5l:iV,i mi=Z^^\a)-Bi + l E n ■ N n A{ki,a) .1=1 ^ h+'-'+i (t) X ]J Eo m 2 =2 A{ki,a) ■ N 1 . 1 . \ 77 ^ .i=l V '^'-ma llzo(a)—Zm2 N E„ TT fA{ki,a) i 111 Tjl j^Zo(a)=k. .1=1 ^ * ^Hi>t-a'^j<Nt(^da C{di) Z^\a)-B,)\ Then, the sum with a can be removed since there is no term depending on a. Finally, integrating with respect to C{di) leads to the result. □ Together with Theorems 13.11 and 13.21 and using the joint law of and Zg*^(a) given in (15.6p . these formulas give explicit recursion to compute each factorial moment of the frequency spectrum. Remark 5.6. Although, these formulas are quite heavy, an important interest lies in the method used to compute them. Indeed, this method should work to obtain the joint moments of A{k, t) with any quantity which can be expressed, at any time a, as the sum of contributions of each subtrees. For instance, since Nt=J2K, VaE[0,t], 2=1 where Nl^ is the number of individuals of the i-th subtrees at time a, we are able to compute the joint moments of Nt and {A{k,t))^^.^. For example, using the integral representation (15.ip of A{k,t) and following the proof of Theorem 15.21 , we have that El [A{k,t)Nt] n: (t) = Ei '[0,i]xN ida,di) eE, '[0,t] N, (i) - 1 t—a \ t—a '[0,t] W{tf 1 de Ea [iVa] Ea (^o(«) = k) da + -da / 1 \ k-1 [ 9Et nP ^ ’t—a ho,t] *- -* W{t)J Weia)^ 1 - 1 We{a] da + W{t) Ea [Nalzo{a)=k] dda [ [^-^Zo(^a)=k] '[o,t] W{a) 5.4 Application to the computation of the covariances of the frequency spectrum A quantity of particular interest is the limit covariance between two terms of the frequency spectrum. 24 Proposition 5.7. Suppose that a > 0. Let k and I two positive integers, then, Covt {A {k, t ), A (/, t)) = W[tfckci + o[W (tf) , where Ck '■ = ■-L OO Q^-es 1 - 1 fe -1 ds, Vk e N\{0}. Weisf \: We{s) Proof. In order to show how quantities in Theorem 13.21 can be manipulated, we detail the proof. Using Theorem 13.21 we obtain Et[A{k,t)A{l,t)] = / eEt N, (t) - 1 t—a \ ^t—a t (Pa [Zoia) = k) Ea [A{1, o)] + Pa {Zo{a) = 1) Ea [A{k, a)]) da + I 6EtN^^]^ (Ea [A{l,a)lzo(a)=k] +IEa [A{k,a)lzo(a)=i]) da. /o Recalling, from Proposition 15.11 that Nj:^^ is geometrically distributed with parameter under E. N, (t) t—a wjt) W{a) and Ej N, it) - 1 t—a \ ^t—a = 2 W{tf W{ay 1^\ W{t)) • Since E[A{k,a)lz,{a)=i] <E[A{k,a)]=0{W{a)), it follows by Lemma 12.21 and Theorem 15.21 that Et [A{k, t)A{l, f)] = 2 e^^{Fa{Zo{a) = l)Ea [A{k, a)] + Pa(^o(a) = k)Ea [A{1, a)] }da + O {tW{t)). By Theorem 15.21 and (12.31) . the r.h.s. is equal to The last equality follows from the identity [ g{s)ds [ f{s)ds = [ f{a) [ g{s)dsda + f g{a) [ f{s)dsda. Jo Jo Jo Jo Jo Jo The proof ends thanks to Remark 12.41 25 6 Asymptotic behaviour of the moments of the frequency spectrum In this part, we study the long time behaviour of the moments of the frequency spectrum. From this point and until the end of this work, we suppose that the tree is supercritical, that is a > 0. Proposition 6.1. For any positive multi-integers n and k in , E. ■ N n J=1 A{ki,t) m W{t)\A n\\ N n N -1 rii n<:+o 2=1 2=1 n| —1 ( 6 . 1 ) where the c^. ’s are as defined in Proposition Proof. Step 1: Preliminaries and ideas. The proposition is proved by induction. Using the symmetry of the formula provided by Theorem 13.21 we may restrict to the study of the term ^ = 1 in m- Hence, we want to study ^t-a ( 6 . 2 ) We recall that the terms of the multi-sum in the above formula correspond to the ways of allocating the mutations in the subtrees. The analysis relies on the fact that the growth of each term depends on the repartition of the mutations. In particular, the main term correspond to the case where all mutations are allocated to different subtrees. To capitalize on this fact, let the subset of _i')xAr (1^) (the space of matrices of size X N with coefficients in N), such that each n in A4^(t) satisfies the relation ^ n^ = ni-(5i,i, Vi € N. m=l The notations and n* refer to the multi-integers [n ^^,..., tively. To simplify the analysis, we highlight three cases of interest: Cl := |n G 1 \/i,n\ = 0, Vi > l,Vm >2, < 1, and This set corresponds to the case where all the mutations are taken in different subtrees and are not taken in the tree where a mutation just occurred. In fact, this corresponds to the dominant term of (j6.2p because as tends to be large, the mutations tend to occur in different subtrees. Let also C 2 := I n G I Vi, = 0 \Ci. 26 Finally, let C 3 := < n e yW N, (t) ^ . 1 ^nl >oL i=i ) Step 2: Uniform bound on the number of tuple of mutations in the subtrees. Assuming that the relation of Lemma [6.II is true for any multi-integer n* such that |n*| = |n| — 1, we have N, (t) ■ N n m=l Li=l riE. A{ki,a] uL N, (t) n m=l nil <! 2 = 1 (6.4) (6.5) Since there are at most |n| — 1 multi-integers Hm such that |nm| > 0 (because of the condition (16.,'lj) '). we can assume without loss of generality, up to reordering the indices, that n\^ = 0, for all m > |n|, and so all the terms with m > \n\ in the product of (|6.4I) are equal to one. Hence, n m=l ■ N n . 2=1 A{ki,a] n; ( 6 . 6 ) for some constant Cn depending only on the choice of n in Aijn\- Moreover, since A4|„| is finite, then N. (0 ■ N n A{ki,a) nL m=l L2=1 Step 3: Analysis of Ci. For n G Cl, and in this case only, the product < ClF(a)l”l-\ (6.7) N n 2=1 A{ki,a] nl 7^ '"m has only one term different from 1, and it follows from Theorem 15.21 that n E m=l ■ N n .2 = 1 A{ki,a) nt = W{a |n|-l N n 2=1 a g^-es 1 - Weis) ki ds The corresponding contribution in (16.21) is h ■■= ft j n —ds / 1 :^/.lF(a)H-P.(Zo(f)^.i)n(/^ri- (s)2 Weis) i ds En A(lCard(Ci) da. Now, Card(Ci) is the number of ways we can choose \n\ — 1 subtrees among the — 1 possible subtrees and choosing a way to allocate to each chosen subtree a mutation sizes ki,... ,k]\f, i.e. - 1 Card(Ci) = ' n — \n\ — 1)! 1 27 Finally, h = ft N / 0VF(a)W-ip„(Zo(t) = A:i)TT i=i a Q^-es 1 - Weisy V Weis) ki ni-SiA En ds (0 t—a (H) da, where (a:)(|„|) is the falling factorial of order |n|. Since, is geometrically distributed under Pt with parameter it follows that h = \n ft p-Oa / 1 1 - nliini-S^,i)lJo Weia) ki — l ^ / pa 0Q—0S n m=l Weis)^^ 1 - 1 Weis) + 0 ni—SiA ds da Step 4: Analysis of C 2 . We denote rW /•£ c —a h := Ei / Ail E = ^1) n nGC2 m=l ■ N n . 2=1 Aiki,a) nL da. Now, since Card(C2) = o((Ail) \ n \-2 we have using estimation (16.71) . n\ — l ( 6 . 8 ) h < I El E CWia)\A-^da 11^02 < C ^*(E1)'"' V(a)W-ida, for some positive real constant C. Using that A^l is geometrically distributed with parameter it follows that there exists a positive real number C such that rt / \ UI-1 h < 4 (Ki) Which imply that, l2=0 n| —1 Step 5: Analysis of C 3 . In the case where there is a positive n\ (Ca case), using that E„ N n j Aiki,a, [ j^Zo{a)=ki 2=1 < Ea ■ N n i=l Aiki,a] n 28 we have. which is very similar to the the other steps. This term is because the condition Y2- n| > 0 reduces the number of terms in the multi-sum. Indeed, Card(C3) = E 1 il:JV=0 S.t. Eih>0 «.2^-(t) = E il;JV=0 S.t. J2iji>0 * = 1 E Jl:iV=0 S.t. Eiii>0 nil -1+"i - <5i,i - ji) n—^ (ui 5i^i ji) n*=i {rii - - ji)\ (t) \ I’^l-i-Eii t—a N, Then, the expectation of the last quantity gives a polynomial of degree |n| — 1 in Using the same study as I 2 shows that this part is of order O . Finally, summing over I ends the proof since the leading term is while the rest is a finite sum of O -terms. By Lemma 12.21 Cfc /■t Q^-es 0 k-l ds + O (e-^*) , where 7 is equal to 9 (resp. 2a — 9) in the clonal critical and subcritical cases (resp. supercritical case). Hence, we deduce (16.11) . □ Remark 6.2. Taking the behavior o/P(Zo(a) = k) into account and using the Cauchy-Schwartz inequality for E [A{k,a)lzo(a)=i] one could actually prove that the error term in (16.Ih is of order O (lU(t)H-i) in the clonal sub-critical and super-critical cases, and O (logt in the clonal critical case. 29 Corollary 6.3. We have, conditionally on non-extinction, k>l ^ i^k)k>i distribution where E is an exponential random variable with parameter 1. Proof. Prom Lemma l6.11 we have N lim Et ':^oo Since the finite dimensional law of the process £ {ck)k>i is fully determined by its moments, it follows from the multidimensional moment problem (see m ) and from the fact that the events {Nt > 0} increase to the event of non-extinction, that we have the claimed convergence. □ 7 An elementary proof of the a.s. convergences of the frequency spectrum and the population counting process The goal of this section is to prove in Subsection 17.21 the a.s. convergence of the frequency spectrum. We begin by showing the law of large numbers for Nt. We recall once again that we are in the supercritical case (a > 0). 7.1 Convergence of the popnlation counting process. Assume that a > 0, that is W{t) ~ :^ 7 ^- The goal of this section is to prove the almost sure conver¬ gence of the population counting process. We hrst show that the convergence holds in probability, using the convergence of the process which counts at time t the number of individuals having infinite descent. More formally, recalling that a splitting tree is a subset of M x (Ufc>oN^) (see pTUjh an individual (u, t) in the tree T is said to have inhnite descent at time t if for any T > t there exist u in |Jn>o (T) uu) belong to T. Finally, to obtain the almost sure convergence, we show in Theorem 17.21 that Nt can not fluctuate faster than a Yule process. Proposition 7.1. Let {Nf°, t G M+) be the number of alive individuals at time t having alive de¬ scendant at infinity. Then, under is a Yule process with parameter a. (T) Proof. Let T,t & M+. Recalling that, for T < t, N^ ' is the number of individuals at time t who (T) have alive children at time T, we extend the notation to t > T by setting N^ = 0 in this case. Fix S a positive real number, we consider the quantity. sup nP - Nf° t<s 30 Q (T^) There exists a finite time T* such that ' = N^. This means that the progeny of the all individuals alive at time S who have finite descent are extinct at time T*. Moreover, ' = iV“ for all t < S, since, otherwise, there would exist an individual at time t who has alive descent at time but which do not have an infinite descent. Hence, for all T > , sup ^<5 nP - = 0. In particular, as T —>■ oo, converges to N°° a.s. for the Skorokhod topology of D [0, oo) and is a.s. cadlag. Now, it remains to derive from the law of the process iV°°. Let 0 < si < S 2 < ■ ■ ■ < Sn < T. By a recursive use of Proposition 15.11 we see that, under Pt’, the process , 1 < / < is a time inhomogeneous Markov chain with geometric initial distribution with parameter Ft{H>T\H>T-si), (T) (T) (T) and the law of Ngf ^ given Nsi_\ is the law of a sum of Nsi_\ i.i.d. geometric random variable with parameter Pi = F{H > T - si-i \ H > T - si), (T) i.e. a binomial negative with parameters and 1 — pi. Hence, = mi,...,NP ( i=2 ^ Moreover, we have, by Lemma l‘2.21 nii T TTli—i 1 \ mi—I /\mi — l rui I'H-l /I \r Pi (1-Pi) Pi = W{T-si) „-asi t—)-oo and This leads to. Pi = W{T) W{T-si) IT(T — S;_i) t^oo P. mi,...,NP =mr^ —asi t^oo (1 - PI' ”^i-l lA ^-ami-i(si-si_i) i=2 V / mi-1 Since the right hand side term corresponds to the finite dimensional distribution of a Yule process with parameter a, this concludes the proof. □ As is a Yule process, converges a.s. to an exponential random variable of parameter 1, denoted £ hereafter, when t goes to infinity (see for instance 12 ]). Theorem 7.2. ITe have 1 t^oo £, a.s. and L^, where £ is the a.s. limit of e which is an exponential random variable with parameter one under P^o. 31 Figure 7: Reflected JCCP with overshoot over t. Independence is provided by the Markov property. Proof. We first look at the quantity, Ei First note that can always be written as a sum of Bernoulli trials, Nt Nr = Y.Bf\ (7.1) i=l corresponding to the fact that the ith. individual has infinite descent or not. Now, by construction of the splitting tree, the descent of each individual alive at time t can be seen as a (sub-)splitting tree where the lifetime of the root follows a particular distribution (that is the law of the residual lifetime of the corresponding individual). We denote by Oi the residual lifetime of the ith individual. In particular, these subtrees are dependent only through the residual lifetimes individuals. Hence, the random variables are independents conditionally on the family (Oi)i<i<Ari' addition, the family has independence properties under P^. Lemma 7.3. Under Ft, the family {Oi, i € |l,Wl) forms a family of independent random variables, independent of Nt, and, except Oi, having the same distribution. The proof of this lemma is postponed at the end of this subsection. Hence, it follows that, under Pt, the random variables (Hj'*^)i>j>Arj are independent and identically distributed for i > 2 (in the sense of Remark 1 ^ . Let us denote by pt the parameter of , and by pt the common parameter of the others i.i.d. Bernoulli random variables. It follows from (17.ip that Et [Nn=Pt{W{t)-l)+pt and from the Yule nature of under Pqo (Proposition 17. ip that Eqo [Nf^] = e“h Now, since Eoo wn Ei wn F{Nt>0) P (Non-ex) ’ 32 we have e“* = (pi {W{t)-l)+pt) '{Nt>0) We recall from m that, and P (Non-ex) ' (Non-ex) = E , 'w{t-vy ’ (W > 0) = E W{t) where 1/ is a random variable with law Py (i.e. the lifetime of a typical individual). It then follows, from Lesbegue Theorem that, P (Nt > 0) -1 = 0 e-^M, P (Non-ex) with y = a A j where the constant 7 is given by Lemma 12.21 Hence, Pte-“‘W(t) = 1 + 0 (7.2) (7.3) Now, using (dl), we have Et [N^Nt] = Ei [Nt {Nt - 1)] pt + pANt = 2W{tfpt + O (e“') , (7.4) where the second equality comes from the fact that Nt is geometrically distributed with parameter lT(t)“^ under P*. Recalling also that Nt° is geometrically distributed with parameter under Pqo, it follows that E, {Nr-y'{a)Nty = - 4:'iP'{a)W{typt + + O (e“*) P {Nt > 0) Hence, it follows from (17.3p . (j7.4l) . ()7.2p and Lemma \T2[ that O . (7.5) Let us define now, for all integer n, tn = ^ log n. Then, by the previous estimation, it follows from Borel-Cantelli lemma and a Markov-type inequality that. lim e = 'y'{oi)£, a-s-, (7.6) rL—>oo on the survival event. From this point, we need to control the fluctuation of N between the times i^n)n>i- Th® births can be controlled by comparisons with a Yule process, but the deaths are harder to control. For this, we use that, by (I7.6p . is small, for n large. It then follows that, if the quantity inf se[+,++i] 33 takes very low negative values, then sup SE[tn ,in+l] must take very high positive value. More precisely, Pt„ sup - e-'^^Ns\ > e < P*J sup > e yG[tTi,tn + l] / yG[tTi,tn + l] / + Pt„ ( + sup > e ) y s£[t„,tn+l] J < I sup e-^^Ns - > e ) + P* J sup e-“*-+iiV. , - e-“*iV, > e L SG[tn,tn + l] ^^sG[in,iTi+l] J + Pi„ (e-“‘"iV,„ - > e ) Now, there exists a Yule process Y with parameter b such that Yq = Nt^ and for all s in [0,tri,+i —tn], Nt„ - Ns < Ys-t„ - Yo, a.s. (7.7) This Yule process can be constructed from the population at time tn by extending the lifetimes of all individuals to infinity, and constructing births from the same Poisson process as in the splitting tree. This leads to Pr sup . SG [tn lin + l] 3 Nt„ - > e < Pt„ sup Ys_tn -Yq> € e I SG [tn jin-f l] atn + Pi„ sup - Ys.tn > e + Pt„ > e ) VG[4n,4n+l] / < 2 {Yt„^, -Yt„>e e“‘") + Pt„ > e ) . Since Markov inequalities are not precise enough to go further, we need to compute exactly the probability, {Ytn+,-tn -Yo>e e“*") . From the branching and Markov properties, — Yq is a sum of a geometric number, with parameter W{tn)~^, of independent and i.i.d. geometric random variables supported on Z+ with parameter ^-Htn+i-tn)_ Hence, Yt^_^-^-t„ —Yq is geometric supported on Z+ with parameter g ^{^n + l tn'} W{tn) (l - e-MWi-in) (l - ’ and, we have ^tn (^„ + l-4n ^0 Y /c) 1 W{tn) — l) + 1 k 34 Using W{tn) = O = O (nTy W{tn) - l) = O . Pin {Yt„^i-tr. -Yo> ee“‘-) < (l- V l + Cn2/3 V for some positive real constant C. Borel-Cantelli’s Lemma then entails we have Finally, lim sup |e e “^A^s|=0, almost surely, se[t„,t„+i] which ends the proof of the almost sure convergence. Now, for the convergence in L^, we have that < 2 Et g- 2 at [a)Ntf + 2 Et - sy The first term in the right hand side of the last inequality converges to 0 according to (|7.5I) . For the second term, since and £ vanish on the extinction event, we have lim Ef t^OO ^e-»tN^_£y = lim Ec t^OO - £y The conclusion comes from the fact that (e t G M+) is a martingale uniformly bounded in L 2 . □ In the preceding proof, we postponed the demonstration of the independence of the residual lifetimes of the alive individuals at time t. We give its proof now, which is quite similar to the Proposition 5.5 of m- Proof of Lemma \ 7. 4 Let a family of independent Levy processes with Laplace ex¬ ponent V'(x) = X — (l — e '''^) A(dr), x G M+, J (0,oo] conditioned to hit (t, oo) before 0, for i G {0,..., iV^ — 1}, and conditioned to hit 0 before (t, oo) for i = Nt- We also assume that. and y(o) =t^V, iG {!,...,W}. Now, denote by Ti the exit time of the ith process of (0, t) and n—1 Tn = '^Ti, n G {0,..., w + 1} • i=0 35 Then, the process defined for all s € [OjTatJ by Nt i=0 has the law of the contour process of a splitting tree cut under t. Moreover, the quantity ' — Y^._ is the lifetime of the ith alive individual at time t. The family of residual lifetime has then the same distribution as the sequence of the overshoots of the contour above u. Thus, the independence of the Levy processes ensures us that (Oj, i € |2, A^tl) is an i.i.d family of random variables, and that Oi is independent of the other Oj’s. □ 7.2 Convergence of the frequency spectrum We end up this section with the almost sure convergence of the frequency spectrum. Theorem 7.4. We have, £ a.s. and in L^, t^oo '' ’ where £ is the same random variable as in Theorem \ 7. 4 and was defined in Proposition Proof. Using (j5.7p and the bound E [Nalzo(a)=k] Y E [Na], it follows that E. {ckNt-A{k,t)Y =2W{tf roo Q^-ea Weia) 1 - Weia) k-l da] +OiWit)). Finally, it follows from Lemma 12.21 that E, (ckNt - Aik,t)Y ~ Ce-Y^ . t^OO where 7 is equal to 6 (resp. 2 a — 9) in the clonal critical and sub-critical cases (resp. supercritical case). From this point we follow the proof of Theorem l7.2l except that the Yule process used in (17.7p must be replaced by another Yule process corresponding to the a binary hssion every time an individual experiences a birth or a mutation, i.e. the new Yule process has parameter b + 9. Indeed, the process Aik,t) can make a positive jump only in two cases: the first corresponding to the birth of an individual in a family of size k — 1, the other one correspond to a mutation occurring on an individual in a family of size A: -|- 1. □ Acknowledgement This work was partly founded by Federation Charles Hermite. The authors also wish to warmly thank anonymous referees for their many valuable comments and their careful reading. 36 References [1] David Aldous and Lea Popovic. A critical branching process model for biodiversity. Adv. in Appl Probab., 37(4):1094-1115, 2005. [2] K. B. Athreya and P. E. Ney. Branching processes. Dover Publications, Inc., Mineola, NY, 2004. Reprint of the 1972 original [Springer, New York; MR0373040]. [3] Jean Bertoin. The structure of the allelic partition of the total population for Galton-Watson processes with neutral mutations. Ann. Probab., 37(4): 1502-1523, 2009. [4] Nicolas Champagnat and Amaury Lambert. Splitting trees with neutral Poissonian mutations I: Small families. Stochastic Process. Appl., 122(3):1003-1033, 2012. [5] Nicolas Champagnat and Amaury Lambert. Splitting trees with neutral Poissonian mutations II: Largest and oldest families. Stochastic Process. Appl., 123(4):1368-1414, 2013. [6] Nicolas Champagnat, Amaury Lambert, and Mathieu Richard. Birth and death processes with neutral mutations. Int. J. Stoch. Anal., pages Art. ID 569081, 20, 2012. [7] D. J. Daley and D. Vere-Jones. An introduction to the theory of point processes. Vol. II. Prob¬ ability and its Applications (New York). Springer, New York, second edition, 2008. General theory and structure. [8] Warren J. Ewens. Mathematical population genetics. I, volume 27 of Interdisciplinary Applied Mathematics. Springer-Verlag, New York, second edition, 2004. Theoretical introduction. [9] J. Geiger and G. Kersting. Depth-first search of random trees, and Poisson point processes. In Classical and modern branching processes (Minneapolis, MN, 1994), volume 84 of IMA Vol. Math. Appl., pages 111-126. Springer, New York, 1997. [10] Jochen Geiger. Size-biased and conditioned random splitting trees. Stochastic Process. Appl., 65(2):187-207, 1996. [11] R. C. Griffiths and Anthony G. Pakes. An infinite-alleles version of the simple branching process. Adv. in Appl. Probab., 20(3):489-524, 1988. [12] Benoit Henry. Cits for general branching processes related to splitting trees. Preprint available at http://arxiv.org/abs/1509.06583. [13] Peter Jagers. Convergence of general branching processes and functionals thereof. J. Appl. Probability, 11:471-478, 1974. [14] Peter Jagers and Olle Nerman. The growth and composition of branching populations. Adv. in Appl. Probab., 16(2):221-259, 1984. [15] Peter Jagers and Olle Nerman. Limit theorems for sums determined by branching and other exponentially growing processes. Stochastic Process. Appl., 17(1):47-71, 1984. 37 [16] Olav Kallenberg. Random measures. Akademie-Verlag, Berlin; Academic Press, Inc., London, fourth edition, 1986. [17] Christian Kleiber and Jordan Stoyanov. Multivariate distributions and the moment problem. J. Multivariate Anal, 113:7-18, 2013. [18] Andreas E. Kyprianou. Fluetuations of Levy processes with applieations. Universitext. Springer, Heidelberg, second edition, 2014. Introductory lectures. [19] Amaury Lambert. The contour of splitting trees is a Levy process. Ann. Probab., 38(l):348-395, 2010 . [20] Paul-A. Meyer. Probability and potentials. Blaisdell Publishing Co. Ginn and Co., Waltham, Mass.-Toronto, Ont.-London, 1966. [21] Olle Nerman. On the convergence of supercritical general (C-M-J) branching processes. Z. Wahrsch. Verw. Gebiete, 57(3):365-395, 1981. [22] Mathieu Richard. Arbres, Proeessus de branchement non Markoviens et processus de Levy. These de doctorat, Universite Pierre et Marie Curie, Paris 6. [23] Z. Taib. Branching processes and neutral mutations. In Stoehastic modelling in biology (Hei¬ delberg, 1988), pages 293-306. World Sci. Publ., Teaneck, NJ, 1990. 38