Skip to main content

Full text of "Turnover Rate of Popularity Charts in Neutral Models"

See other formats


Turnover Rate of Popularity Charts in Neutral Models 



T.S. Evans 1,2 and A. Giomctto 1 
1 Institute for Mathematical Sciences, Imperial College London, London, SW7 2PG, 
2 Theoretical Physics, Imperial College London, SW7 2AZ, UK 



UK 



O 

(N 



o 



43 

Q-r 
i ■ 

O ■ 

o ■ 

O 

• i-H 

43 

Oh' 



> 

o 

O 



X 



It has been shown recently that in many different cultural phenomena the turnover rate on the 
most popular artefacts in a population exhibit some regularities. A very simple expression for 
this turnover rate has been proposed by Bentley et al. [Ty] and its validity in two simple models 
for copying and innovation is investigated in this paper. It is found that Bentley's formula is an 
approximation of the real behaviour of the turnover rate in the Wright-Fisher model, while it is not 
valid in the Moran model. 



I. INTRODUCTION 

In genetics, neutral models in which the different vari- 
ations provide no intrinsic advantage play a central role. 
Classic examples are the Wright-Fisher and Moran mod- 
els [H, which are Markov processes. It is not surprising 
that they are also particular limits of other statistical 
physics models such as Urn models and zero range pro- 
cesses [H H[ or network rewiring models [H, [3 (see [l2[ 
for further examples of these models). One of the most 
interesting applications is to cultural transmission 
In this case the popularity of 'artefacts' with no intrinsic 
value (uniform fitness) varies because individuals change 
their choice of artefact by either copying the artefact cho- 
sen by another individual (inheritance) or by innovating 
by choosing a new artefact (mutation). Under this hy- 
pothesis a certain cultural trait becomes more popular 
than others simply through imitation and not because 
of an intrinsic benefit it provides. Despite its simplicity, 
these models can reproduce some of the features of real 
data sets, such as Neolithicpottery [H,0, popular music 
charts [(| , baby names @, H| , patents [J and dog breeds 
1- 

Much is known about these neutral models, including 
many exact results [l|, Q . However the context of cultural 
transmission throws up new and unanswered questions 
since they are of little practical use in other applications. 
In this paper we study popularity charts, the list of the 
y most popular artefacts at any one time, and ask how 
many artefacts enter or leave this list each time the chart 
is updated, the turnover rate z. This is motivated by the 
work of Bentley et al. who find that in the Wright- 
Fisher model the turnover rate z of the top y chart is 
z = ^JJi ■ y where \i is the innovation rate. This result 
is interesting because the turnover rate z is independent 
of the population size N and the square root dependence 
on fi is reminiscent of a random walk process that might 
be addressed with a theoretical analysis. It is of practical 
use as sometimes we only have access to data on the most 
popular artefacts and we may not have a useful sample 
of the whole population. In such situations it can be 
used to provide estimates of the model parameters from 
a data set 



Individual t t+1 



1 


A 


copy 3 g 


2 


A 


copy S , 


3 


B 


copy 1 ^ 


4 


A 


innovate t-~ 

> D 


5 


C 


copy 6 ^ 


6 


A 


copy 6^ ^ 


Top 


3 chart: 


Position 


t 


t+1 


1 st 


A 


A 




B 


B 


3"' 


c 


D 



11 



The aim of this paper is to perform a 
comprehensive study of the turnover rate in the Wright- 
Fisher and Moran models. 



FIG. 1. A simple representation of the Wright-Fisher model. 
The artefacts are labelled by letters. In this example two suc- 
cessive time steps are shown for a population of six individuals 
and the top three chart with a turnover of two (TV = 6, y = 3, 
2 = 2). 



II. THE WRIGHT-FISHER MODEL 



The model investigated here and in [l0( is one of the 
family of Wright-Fisher models and is illustrated in Fig. 
HIl In it each of N individuals is characterised by an 
artefact of no intrinsic value (e.g. a brand of shoes, a dog 
breed, a name, etc.). At each time step all individuals in 
the population are simultaneously assigned a new arte- 
fact. With probability (1 — fj,) an individual will copy 
the artefact choice from the previous time step of an in- 
dividual selected uniformly at random. Otherwise with 
probability /i an individual innovates by choosing a new 
artefact. 

The analytical solutions of this model [3] show that 
the frequency of artefacts in a population is typically a 
power law with a cutoff (at least for N 3> fj,N > 1) and 
this has been fitted to data on the frequency of various 
modern cultural variants 043- 



2 



Definition of turnover 

Our definition of the turnover z in the top y chart, the 
list of the y most popular artefacts, is defined as the sum 
of the number of artefacts exiting the top chart plus the 
numbers of new artefacts entering the top chart at the 
same time step. This definition of turnover is slightly dif- 
ferent from [lCj where it is defined as the number of new 
artefacts that enter the top y chart relative to the previ- 
ous time step. In most situations the difference between 
the two definitions is given by a factor of 2. Our defini- 
tion is more informative for situations where a artefact 
exits the top chart by becoming extinct with no new arte- 
facts entering it: in this configuration we have a turnover 
z = 1, while the definition in (lo| would have z = 0. In 
our notation the result of [l(3| is that 



2 • Vm- V 



(1) 



Bentley et al. |l0| find this through numerical analysis of 
this Wright-Fisher model and also find support for this 
form in data for baby names and dog breeds. 



Simulations of the model 

During one simulation the model starts with every in- 
dividual assigned to a unique artefact and is then first 
updated r times. After reaching a steady state the fre- 
quency of every artefact in the population is computed 
at each time step and the top y chart is built using the 
quicksort algorithm for the next T steps. The tempo- 
ral average z of the turnover rate z is then computed 
by comparing two successive top y charts and recorded. 
To perform an ensemble average the model is rerun E 
time^l and the ensemble average (z) of z is computed. 
This is the estimate of the turnover rate z that is stored 
for further analysis along with an estimate of the stan- 
dard deviation in this measurement. 

We started our simulations from a configuration where 
all the individuals had a different artefact. We checked 
that our simulations had reached a steady state by study- 
ing 



F 2 {t) 



(Hk - 1)) 
N(N- I) 



££L fc(fc-i)MM)) 

N(N- 1) 



(*) = 



(2) 



where n(k,t) is the number of artefacts chosen by k in- 
dividuals at time t and the symbols (...) indicate an en- 
semble average. This quantity can be shown analytically 
to evolve in time as 

F 2 (t) = F 2 (oo) + [F 2 (0) - F 2 (oo)] ■ (A 2 )' (3) 



E is chosen so that the error on z is not larger than 10%. To 
reduce computational times the model is run for r time steps 
to reach a steady state only once. Successive iterations start 
from the last configuration of the previous ones. This procedure 
doesn't affect the results because the system is in a steady state. 



with A 2 = (1 - fi) 2 (N - 1)/N @ so that t~ 1 ~ ln(A 2 ) < 
min((2/i) _1 , N) (a similar results hold for all eigenval- 
ues). Given the values used in our simulations we chose 
t = 4/.t~ 1 and ran for T = 50 + /i -1 time steps (50 time 
steps were added to ensure a minimum amount of time 
steps even for big values of /it). 



Analysis 

Motivated by |l0| we fitted our data for function 
of n, y and N to the following form 



d ■ n a y b N c 



(4) 



We looked at around 6000 different parameter values 
taken from the ranges fj, G [5 ■ 10~ 5 , 0.115], y G [2, 1411] 
and N G [180,3993] with the constraint that y < N. 
This largely extends the range of values studied in fioj 
which come from fi G [2 ■ 10~ 4 ,0.02], y G [5,50] and 
N G [500,4000]. Estimates for coefficients a, b, c and 
d in equation (U) were obtained using a linear fit to the 
data for ln(z). 

It is found that the turnover rate z exhibits two dif- 
ferent behaviours in the two regions: Nfi < 0.15 • y and 
Nfj, > 0.15 • y, as can be seen in figure [U with the tran- 
sition between these two behaviours occurring around 
Nn~0.15-y. 




Slope 0.5 



Nn/y = 0.15 



0.001 



0.01 



0.1 



10 



FIG. 2. Existence of a critical point iV/i ~ 0.15 • y. Plotted 
curves are for fixed values of y and N. Error bars are smaller 
than symbols. 



Region Nfi < 0.15 • y: 

The observed behaviour of the turnover rate z in this 
region is: z oc [i (see figure [2] and figure [3|). Fitting the 
data points in this region with the functional form (U) 
the following set of values for the fitting parameters is 



3 



: N=773,y=16 + 




N=773, y=60 




; N=447, y=16 X 




N=447, y=60 □ 




Slope l/^^ 
Slope 0.5 


1 y rsFI ~ 
' -EH 





0.01 0.1 1 10 100 

N u 



FIG. 3. Existence of a critical point iV^i ~ 0.15 • y. Plotted 
curves are for fixed values of y and N. In the region Nfi < 
0.15 • y all curves collapse in z = 2Nfi. Error bars are smaller 
than symbols. 





10°- 












10~ 1 


+ 


_ 




10- 2 


+ 

+ 






10~ 3 


+ 


h : 


luenc 


10 4 




+ : 


CD 








LL 


10~ 5 
10~ 6 

10- 7 

10 8 




_l_ 

+ 

+ 



2 4 6 8 10 

N th most popular variant 



FIG. 4. Average frequency of artefacts (from the most pop- 
ular one to the least popular). In the steady state there are 
no more than 11 artefacts in the population. With N — 180, 
fi — 0.001 and y — 20 we have Nfi <C 0.15 • y and the num- 
ber of artefacts in the population (11) is smaller than the top 
chart size y (20). 



obtained: 



a = 0.99999(4), b = -0.00004(7), 
c= 1.0003(2), d= 1.997(2), 



t+1 



that is to say: 



2 • Nn 



(5) 



(6) 



within two standard deviations. Note that in this region 
z is independent on the top chart size y. This is best seen 
in figure [3l where the turnover rate is plotted against the 
product N/jl. 

This behaviour of the turnover rate z can be explained 
in the following way: for N/j, <C y the average number of 
new artefacts that enter the population in one time step 
is lower than the top chart size y. In this configuration 
a very small number of artefacts survives in the popula- 
tion in the steady state. It is then likely that the total 
number of artefacts at a certain time t is lower than the 
top chart size y, as it is confirmed by observations (see 
figure S]). In one time step, then, the Nfi new artefacts 
(on average) introduced in the previous time step (that 
are in the top y chart) are extinguished through copying, 
while on average N \x new artefacts enter the population 
and the top chart through innovation: the turnover rate 
z is then equal to 2 • Nfx. This mechanism is illustrated 
in figure [5] 



Region Nfi > 0.15 • y: 

This is the region studied in [l(| and we also find that 
the dependence of the turnover rate z on the innovation 
rate \i is very roughly z oc /x 5 , as can be seen in figure 
O However we have also fitted the data in this region 
to the same functional form z = d- fi a y b N c using a 



• : 


23 


• : 


22 


■ : 


7 


■ : 


8 


★ : 


1 


♦ : 


1 


Top 20: 


mm* ... 


Top 20: 


• ■♦ 



FIG. 5. A population of different symbols, the shape being 
the artefact. With Nfx <C y only a few artefacts survive in 
the population and the top chart has empty spots. New arte- 
facts introduced through innovation (on average N n = 1 per 
generation) enter the top y chart but are extinguished in one 
time step, producing a turnover equal to 2 ■ N/j,. 



linear fit to the logarithm of our parameters and z. The 
resulting values for the fitting parameters a, b, c and d 
are: 



a = 0.550(2), 6 = 0.860(1) 
c = 0.130(2), d= 1.38(2). 



(7) 



These values are not statistically compatible with the 
proposed form in equation ([l} for which a = 1/2, 6 = 1, 
c = and d = 2. The dependence on fi and y is not 
so far off that proposed in |l(| (a and 6 are 10% and - 
14% off the values in ([1])) so the practical difference in 



4 



studying a real data set may be minimal. However we 
find a significant dependence on N, even with this small 
power of c we have a 50% variation in z over the range 
N e [180, 3993]. This dependence on the population size 
N is clearly seen in figure [H 




sir" 



100 



N 10 



: y=21,N=537 


+ 




y=21,N=1337 


X 




y=21 , N=3993 






. y=102, N=2773 


□ 




y=102, N=3993 

























1 10 100 

2n ,/2 y 

FIG. 8. Turnover rate z vs 2^/Jiy for the Wright-Fisher model 
in the region Nfi > 0.15 • y. Error bars are smaller than 
symbols. 



FIG. 6. Turnover rate z vs fi for the Wright-Fisher model in 
the region Nfi > 0.15 • y. There is a clear dependence of z on 
the population size N. Error bars are smaller than symbols. 



As a final illustration we plot the turnover rate z 
against the form © using our best fit values (O to create 
a data collapse. As it can be seen in the figure, collected 
data lay on the diagonal z = fi a y b N c . The same plot 
using the form suggested in [l(| is presented in figure |U 



i y=21,N=537 


+ 


y=21,N=1337 




y=21 , N=3993 




y=102, N=2773 


□ 


y=102, N=3993 


■ ^» 







10 

d V s - y b N c 



Residuals 

Residuals from the fit are shown in figure [9l While the 
peaks for low value of the product N/j, are related to data 
near the critical point Nfi ~ 0.15 • y, there is an evident 
systematic deviation from the proposed functional form 
© for Nfi > 0.15 • y. 



10 - 



-10 



N=1337 
N=2311 
N=3993 





N |.i 



FIG. 9. Residuals from equation (j4]) vs Nfi. There is a sys- 
tematic deviation from the proposed functional form. The 
deviations for small Nfi are related to data near the critical 
region iV^ ~ 0.15 • y. 



FIG. 7. Turnover rate z vs d ■ fi a y"N c for the Wright-Fisher 
model in the region N/j, > 0.15-y. Values used were a = 0.550, 
b = 0.860, c = 0.130 and d = 1.38, the best fit values found 
l|7[). Error bars are smaller than symbols. 



The minimum point of the residuals curves lies in 
Nfic^y for all values of N and y and increases in ab- 
solute value for increasing population size N, causing a 
systematic drift from equation 

We tried many different functional forms and the best 



5 



fits we found were obtained with the following: 



Res(/i, y, N) = R a 

(y,N) = A-y + B-N 



where R m in is the value of the residual at the minimum. 
The best fit values for the parameters are i?o = 0.062(4), 
/ = 0.0096(4), A = 0.00034(5) and B = -0.0000193(8). 
Combining this expression for the residuals with the orig- 
inal form Q the following function might be used to de- 
scribe the behaviour of the turnover rate z of the Wright- 
Fisher model in the region N/j, > 0.15 ■ y: 

z = d ■ n a y b N c ■ [1 + Res(jU, y, N)] (8) 

Residuals from equation ((8]) are plotted in figure [lOj The 



15 - 



5 - 



N=1337 
N=2311 
N=3993 




N n 

FIG. 10. Residuals from equation ([8| vs Nfi. No systematic 
deviation from the proposed functional form (|8} can be iden- 
tified. The deviations for small N/j, are related to data near 
the critical region Nfi ~ 0.15 ■ y. 



functional form ([8]) can now reproduce data within 2% 
for N/j, 0.15 • y and there is no systematic drift in 
N \x ~ y for increasing values of N. 

However this form © is not very satisfactory because 
of the high number of parameters used to fit the data 
and because the range of applicability of equation §5§ is 
strongly limited, due to the negative value of B. 



Large populations 

The numerical studies of the Wright-Fisher model in 
fioj have only a small number of individuals N < 4000, 
whereas the real data sets are often much larger e.g. 
around 10 6 births per year in 0, 6 x 10 5 dog breed reg- 
istrations per year in [8|, the number of albums released 
in the US (data used in [|| [n| ) is hard to estimate but 
the Recording Industry Association of America have sug- 
gested that major labels release around 7,000 new CDs 
every year. 



Unfortunately all data sets on which this model has 
been applied considered population of millions of indi- 
viduals and this would take too long to simulate. The 
biggest population size that we have been able to sim- 
ulate is of the order of 100000 individuals. Results are 
listed in table HJ 





y 


TV 


z 






% 1 ^large 


0.0012 


200 


100000 


11.9(1) 


1.16 


1.22 


1.06 


0.0024 


200 


100000 


17.6(1) 


1.11 


1.21 


1.05 


0.0012 


400 


100000 


24.3(1) 


1.14 


1.09 


0.95 


0.0024 


400 


100000 


33.3(2) 


1.18 


1.16 


1.03 


0.0012 


200 


120000 


12.1(1) 


1.15 


1.23 


1.06 


0.0024 


200 


120000 


18.0(1) 


1.09 


1.21 


1.05 


0.0012 


400 


120000 


23.8(1) 


1.16 


1.13 


0.99 


0.0024 


400 


120000 


33.6(2) 


1.17 


1.18 


1.03 


0.0012 


200 


144000 


12.2(1) 


1.14 


1.25 


1.07 


0.0024 


200 


144000 


18.3(1) 


1.07 


1.22 


1.05 


0.0012 


400 


144000 


23.6(1) 


1.17 


1.17 


1.02 


0.0024 


400 


144000 


34.0(2) 


1.15 


1.19 


1.04 



TABLE I. Simulation results for big population size TV. The 
last three columns give the ratios of the measured value of the 
turnover z divided by one of the fitted forms z — dy a fj, b N c , 
respectively z = 2y^/Jl of [13] and Q, z S maii best fit for small 
TV of Q, and finally 2i argc best fit for large TV of ©. All 
results and best fits are for the fj,N > 0.15?/ region. 

A surprising result is that for the data in table Q] the 
simple form suggested in [l(|, z = 2y/Jly, is a better esti- 
mate for the turnover rate (typically a 15% overestimate) 
than our best fit to z = dy a /j, b N c of ^ using the data 
for small N but large (typically around 20% over- 
estimate). A factor that might produce this unexpected 
result is that in all real situations where the model has 
been tested and in the data in table Q] the size of the top 
chart is much smaller than the population size: y <C N . 
We have therefore performed a new data collection in this 
region (and above the critical point N/j, ~ 0.15 ■ y). 

Approximately 1200 data points have been collected in 
the range fj, € [0.001, 0.13], y € [5, 56], N € [1000, 13000] 
subject to the constraints that Nfi > y (to avoid data 
from the critical region Nfi ~ 0.15 • y) and y < A/100. 
The resulting values for the fit z = d ■ /i a y b N c ^ are 



a = 0.558(1), 6 = 0.879(1) 
c = 0.091(1), d= 1.79(2). 



(9) 



We plot the turnover rate z against the form ^ using 
our best fit values (j9)) to create a data collapse. As it can 
be seen in the figure, collected data lay on the diagonal 
z = fi a y b N c . The changes in the coefficients of a and 
b for this large N data of © are statistically significant 
but not especially large. Using them to derive values 
for fi and y from a real data set is not likely to cause 
difficulties. The problem lies with the dependence on N, 
as the power c in N c varies significantly and so leads 



6 



100 



N 10 



y=25, N=2657 




y=25, N=6265 




y=51,N=12100 




y=74, N=12100 


□ 


y=106, N=12100 









1 10 100 

d n a y b N c 

FIG. 11. Turnover rate z vs d- fi a y b N c for the Wright-Fisher 
model in the region Nfi > y, y < N/100. Values used were 
a = 0.558, b = 0.879, c = 0.091 and d = 1.79, the best fit 
values found ©. Error bars are smaller than symbols. 



Individual 


t t+1 


t+2 


1 


A A 


innovate 

> D 


2 


A A 


A 


■3 
O 


p copy 1 


A 


4 


A A 


A 


O 






6 


A A 


A 




Top 3 chart 




Position 


t t+1 


t+2 


tyrid 
3 rd 


A A 
B ^4 C 
C 


A 

^4 C 
D 



FIG. 12. A simple representation of the Wright-Fisher model. 
Shown are two successive time steps for a population of six 
individuals and the top chart at each time step. 



to large difference in numerical estimates from the fitted 
forms. These values for large N in © are again not 
compatible with the simple form ([T]) proposed in [i"o| . 
However the form z = dy a n b N c using the values derived 
from large TV ((7]) now reproduce the data in table U for 
very large N within 6%. It should be noted, though, that 
in the N g [1000, 13000] region used for © results the 
dependence of the turnover rate z on the population size 
N is weaker than in the first fit performed. This might 
suggest a logarithmic dependence of the turnover rate z 
on the population size N as for c <C 1: N c ~ 1 + c • In TV. 
Fits with this form work just as well as the one with the 
N c dependence. 

III. THE MORAN MODEL 

In the Wright-Fisher model at each time step all in- 
dividuals arc assigned a new artefact simultaneously. 
While this can be a good update rule in some situations, 
it is certainly reasonable to think that cultural transmis- 
sion and copying between individuals might occur in a 
more gradual way, with individuals changing taste indi- 
vidually one at a time. 

This behaviour can be introduced in the model by 
changing the update rule so that at each time step only 
one randomly chosen individual is assigned a new arte- 
fact, by either copying it from an other individual in the 
population (chosen at random) or by inventing a new 
one withprobability \i. This model is known as "Moran 
model" 0. 

In this section the regularity of turnover in the top y 
chart of the most popular artefacts in the population for 
the Moran model is investigated as it has been done for 
the Wright-Fisher model in the previous section. The 
turnover z is defined as before and most of the analysis 



is performed in the same way. Unless otherwise stated 
it is assumed that the investigation is conducted in the 
exact same way as for the Wright-Fisher model. 



Steady state 

Analytical results [l,[l2[ give that the typical time scale 
t to reach a steady state is r ~ A r min(/i~ 1 , TV), differ- 
ing from that in the Wright-Fisher model by a factor N 
reflecting the different number of individuals it is pos- 
sible to change each time step. This implies that sim- 
ulations need to be run for a longer time with respect 
to the previous model. Again the steady state was con- 
firmed numerically by studying Fi (t) of @ which evolves 
in time in the same way as in the Wright-Fisher model 
(P| but has a different eigenvalue controlling its evolu- 
tion |0^2 = 1- 2(/x/iV) - 2(1 - fi)/N 2 . For our 
parameter values we can safely choose t = ANfj, -1 as the 
starting point to compute z. Simulations have been run 
for T = 50 + Nfi^ 1 time steps (50 time steps have been 
added to ensure a minimum amount of time steps for big 
values of ji). 



Data collection and analysis 

The usual ansatz for the functional dependence of z on 
fj,, y and N is made: z — d- fi a y b N c ((4]) and linear fits to 
ln(z) are used to estimate a, 6, c and d. Approximately 
350 data points have been collected in the following range 
H g [0.001,0.357], y € [15,256] and TV g [100,506]. As 
figure fl~3l shows . there is a transition point in [i that sep- 
arates two different behaviours of the turnover rate z. 
This critical point is roughly /i c ~ (N/y) ' as can be 
seen in figure [141 In the region fi < /x c the observed be- 



7 




FIG. 13. Existence of a critical point. In the region fi < ^ c we 
have z <x fx (y and TV are fixed for each of the plotted curves). 
Error bars are smaller than symbols. 




□ 



N=225,y=33 
N=337,y=33 
N-506,y-33 
N=337,y=15 
N-506 

10 



FIG. 14. Existence of a critical point fi c ~ (N/y)~ 3 ^ 2 (y and 
N are fixed for each of the plotted curves). Error bars are 
smaller than symbols. 



Keeping in mind that the average number of new arte- 
facts introduced at each time step is equal to fi it is 
straightforward to understand that the mechanism that 
produces z = 2 • fx is similar to the one described for 
the Wright-Fisher model: for \x <C fx c a very small num- 
ber of artefacts survives in the population in the steady 
state. It is then likely that the number of artefacts in 
the population at a certain time step t is lower than the 
top chart size y, as it is confirmed by observations (see 
figure [13]) . Suppose now to let the population evolve for 
./V time steps, which is the unit of time in which all in- 
dividuals in the population are assigned a new artefact. 
In this interval, on average, A ■ (1 — fx) individuals will 
copy another artefact in the population, most likely one 
of the most popular ones, while Nfi individuals will in- 
vent a new artefact. At this point we will have a few 
very popular artefacts in the population and the N/x 
new artefacts recently introduced, all in the top chart. 
If we now take other A time steps these very unpopu- 
lar artefacts will likely be extinguished and replaced by 
other N/x invented artefacts. In the last N time steps Nfx 
artefacts exit the top chart and N/x new artefact enter it. 
The average turnover in these N time steps is therefore 

_ Njx+Ny _ 9 ,, 
z — N — & ■ fi 




6 8 10 12 14 

N th most popular variant 



haviour of the turnover rate z is z cx fx, while for fx > fi c 
the turnover rate does not obey a simple power law as a 
function of the innovation rate fx. 

Region /i < jx c 



FIG. 15. Average frequency of artefacts (from the most pop- 
ular one to the least popular). In the steady state there are 
no more than 18 artefacts in the population. With N = 268, 
fi = 0.00289 and y = 140 we have fx < fi c =± {N/y)~ 3/2 and 
the maximum number of artefacts in the population (18) is 
smaller than the top chart size y (140). 



Fitting the data points in this region with z = d ■ 
fx a y b N c gD we find 

a = 0.9998(8), b= -0.002(3), , . 

c = 0.001(3), d= 2.02(3), 

so that z = 2 • fx within one standard deviation. In this 
region z is independent on the top chart size y and the 
population size A. This is best seen in figure [TBI 



Region li > fx c 

We have seen in FigflBJ above the critical point the 
hypothesis of a power law dependence of the turnover 
rate z on the innovation rate fx is incorrect. Our results 
do suggest that the turnover rate z is a function of the 
ratio yj A, as can be seen in figure [TrU 



8 



0.1 



0.01 



N.506,y.50 








N=337,y=33 


-X-- 






N.337!y=15 


-B- 






N=506,y-22 


G 












.._B — H 












r 







0.001 0.01 0.1 1 



FIG. 16. z is a function of y/N: curves with the same value of 
the ratio y/N collapse. Error bars are smaller than symbols. 



CONCLUSIONS 



situation in real data, we have seen that the turnover 
doesn't follow a simple power law in /z, y and N. We 
have however noted that the turnover rate in the Moran 
model appears to be a function of the ratio y/N. 

Since data on cultural transmission is sometimes avail- 
able only in the form of popularity charts, the study of 
the turnover is of real practical use. However the models 
considered here and in fl(| are only the simplest exam- 
ples. It would be interesting to study the behaviour of the 
turnover in more complicated models such as examples 
which inter pola te between the Wright-Fisher and Moran 
models (ID. Il3| . Another interesting avenue is to under- 
stand how the social network between individuals (l3j | 
alters the turnover. 



ACKNOWLEDGEMENTS 

We thank the High Performance Computing Centre at 
Imperial College London for use of their cluster in the 
large N studies. 



In this paper we have investigated the dependence of 
the turnover rate z on the parameters of the Wright- 
Fisher model and the Moran model. 

We have found that in both models there is a critical 
point that separates two different regimes for the depen- 
dence of z on the innovation rate /x, the top chart size y 
and the population size N . This critical point satisfies 
the equation N/i ~ 0.15 • y for the Wright-Fisher model 
and n ~ (N/y)~ 3 / 2 for the Moran model. The two mod- 
els behave similarly below the critical point, where the 
turnover rate is given by two times the average number 
of new artefacts entering the population in one time step. 
This is equal to N/j, in the Wright-Fisher model and to 
fx in the Moran model. 

Most data sets though will be above the critical re- 
gion where we have seen that the two models behave 
differently, with the ansatz z = d ■ /j, a y b N c being a 
good approximation for the functional dependence of the 
turnover rate on the model variables for the Wright- 
Fisher model only. The simple form z = 2 ■ JJi ■ y sug- 
gested by Bentley et al. [To| is excluded statistically by 
our results. We find that the powers of \i and y differ by 
10% from the values in [l(| and that there is significant 
dependence on the size of the system N. In particular 
we have also shown that our fit in the region where the 
top chart size is much smaller than the population size 

< iV) (J9j) reproduces our simulation results for large 
populations within 6%, and we believe that this should 
be used when extracting information from real data sets. 
One outstanding issue is that our two fits ([7]) and © 
are not compatible in terms of the dependence of the 
turnover rate z on N. Further work is needed on this 
as this suggests the dependence of z on N may not be a 
simple power law. 

For the Moran model for full top y charts, the usual 



APPENDIX 

The information in this appendix is supplementary ma- 
terial which will not appear in the published version. 

Wright-Fisher model, small N 

For the Wright-Fisher model, we used F%{t) to check 
our numerical simulations reach a steady state on the 
time scale as can be seen in Fig. [T7] 



0.06 



0.04 



0.02 





N=447, n=0.02 — 




N=773, (1=0.02 




N=1 114, (1=0.03 









12 3 4 

t(l 



FIG. 17. Ensemble average of ^[^Jjj ■ The reach of a steady 
state can clearly be seen at t ~ 2fi~ 1 . 



The observed approximate values of r are listed in ta- 
ble |TT] and compared with /i -1 in the two regions Nfi < 1 
and Nfi > 1: it is found that the relation r ~ fi -1 holds 



9 



in the region iV/i > 1, while r < /i -1 for Nfi < 1. The 
observed behaviour of r in the second region can be due 
to the fact that being less than one innovator per gen- 
eration (Nfi < 1) on average, there is less variability in 
the artefacts and therefore a fastest approach to steady 
state through copying. 



\. N 
A* 


50 


100 


500 


1000 


1500 


M 1 


0.02 


50 


50 


50 


50 


50 


50 


0.01 


80 


100 


100 


100 


100 


100 


0.002 


250 


300 


500 


500 


500 


500 


0.001 


300 


400 


800 


1000 


1000 


1000 


0.00067 


300 


300 


1000 


1500 


1500 


1500 



TABLE II. Approximate values of r. Above the main diagonal 
N^i > 1, below Nfi < 1. 

The data was fitted to (j4j by applying the linear fit 
routine fit in Gnuplot to the logarithm of z and the 
variables fi, y and N 

]nz = a ■ In fi + b ■ In y + c ■ In N + In d . (11) 

on collected data points. For this purpose code and 
scripts have been set up to record ln(z) and <J\ n ^y 

To perform the fit with equally spaced data /k, y and N 
are varied in the following way fi = fio ■ q m , y = yo ■ r" and 
N = Nq ■ s p with fixed [Iq, yo, No, q, r, s and m, n,p = 
0, 1, 2 . . . . We choose q = 1.4, r = 1.3, s = 1.2. 



To perform the fit with equally spaced data /i, y and 
N are varied in the following way: fi ~ fio ■ Q m , U — Ho ■ r " 
and N = N ■ s p with fixed fio, yo, No, q — 1.5, r = 1.8, 
s = 1.5 and n = 0,1,2 ... . A further constraint is that 
the top chart size satisfies y < N. 



Moran model 



We have from analytic results [HjIlH that the time scale 
for equilibrium is t _1 ~ ln(A2) < A^min((2/i) _1 , N). 
This can be seen in our numerical simulations in figure 

m 





N=1 000,n=0.05 — 




N=1000, u=0.1 




N=500, n=0.05 




N=500, |i=0.1 - 













0.07 
0.06 
0.05 
0.04 
0.03 
0.02 
0.01 




1.5 2 

t / (N / n) 



FIG. 18. Ensemble average of The reach of a steady 

state can clearly be seen at t ~ 2N/j,~ 1 . 



10 



[1] G.A.Watterson, The Annals of Mathematical Statistics, 
32, 716-729 (1961). 

[2] W.J.Ewens, Mathematical Population Genetics: I. The- 
oretical Introduction, (Springer- Verlag New York, 2004). 

[3] C.Godreche and J. M.Luck, "Nonequilibrium dynamics 
of urn models", J.Phys.Cond.Matter 14 1601 (2002). 

[4] M.R.Evans and T.Hanney, "Nonequilibrium statistical 
mechanics of the zero-range process and related models" , 
J.Phys.A. 38 R195-R240 (2005). 

[5] F.D.Neiman, "Stylistic variation in evolutionary per- 
spective: Inferences from decorative diversity and inter- 
assemblage distance in Illinois Woodland Ceramic assem- 
blages", American Antiquity 60 1 (1995). 

[6] R.A.Bentley, H.D.G.Maschner, "Subtle criticality in pop- 
ular album charts" , Advances in Complex Systems 2 197- 
209 (1999) 

[7] M.W. Hahn, R.A. Bentley, "Drift as a mechanism 
for cultural change: an example from baby names.", 
Proc.R.Soc.Lon. B 270, S120-S123 (2003). 

[8] H.A. Herzog, R.A. Bentley, M.W. Hahn, "Random 
drift and large shifts in popularity of dog breeds.", 
Proc.R.Soc.Lon B (Suppl.) 271, S353-S356 (2004). 



[9] R.A. Bentley, M.W. Hahn, S.J. Shennan, "Random drift 
and culture change.", Proceedings of the Royal Society 
B 271, 1443-1450 (2004) 
[10] R.A. Bentley, CP. Lipo, H.A. Herzog, M.W. Hahn, 
"Regular rates of popular culture change reflect random 
copying.", Evolution and Human Behavior 28, 151-158 
(2007) 

[11] , R.A.Bentley, P.Ormerod, and M. Batty, "An evolution- 
ary model of long tailed distributions in the social sci- 
ences",, Behav. Ecol. Sociobiol. 65 537-546 (2011) 

[12] T.S. Evans, A.D.K. Plato, "Exact Solution for the Time 
Evolution of Network Rewiring Models.", Physical Re- 
view E 75, 056101 (2007) 

[13] T.S.Evans, A.D.K. Plato and T.You,, "Are Copying and 
Innovation Enough?",, in Progress in Industrial Mathe- 
matics at ECMI 2008, eds. Fitt, A. and Norbury, J. and 
Ockendon, H. and Wilson, E., 15, Mathematics In In- 
dustry, 825-831, (2010) The European Consortium for 
Mathematics in Industry, Springer- Verlag 

[14] T.S. Evans, Exact solutions for network rewiring models, 
Eur. Phys. J. B 56 (2007) 65.