arXiv: 1501.04229vl [physics.soc-ph] 17 Jan 2015
A Potts Model for Night Light
and Human Population
Gabrieli Mate * 1
institute for Theoretical Physics, Heidelberg University,
Philosophenweg 19, Heidelberg, Germany
January 20, 2015
Abstract
The Potts model was one of the most popular physics models of the
twentieth century in an interdisciplinary context. It has been applied
to a large variety of problems. Many generalizations exists and a whole
range of models were inspired by this statistical physics tool. Here we
present how a generic Potts model can be used to study complex data.
As a demonstration, we engage our model in the analysis of night light
patterns of human settlements observed on space photographs.
1 Introduction
The Potts model [I] lias enjoyed a great popularity in the second half of the
last century. As a generalization of the Ising model i2], it not only explained
different aspects of ferromagnetic systems, but it was also successfully used for
studying interdisciplinary phenomena. Examples include community structure
detection, tumor growth and especially image segmentation mg. In fact, many
Image Processing and Machine Learning applications rely heavily on approaches
similar to the Potts model. Inspired by Statistical Physics models, a whole
model-family emerged. Nowadays, these are known as Undirected Graphical
Models or Markov Random Fields (MRFs) [B\ In this sense, most of the MRF
models can be viewed as a generalization of the Potts model.
In the generic framework of MRFs, many algorithms have been developed
to perform a variety of tasks. Generally, a model with some parameters would
be set up to describe a some empirical data. Then, usually the first question
considers the parameter values for which the proposed model would capture the
* Corresponding author
email: g. mat e@ t phy s.uni-heidelberg.de
1
often complex nature of the data. To solve this tasks, specialized algorithms are
used to calculate or approximate these parameters (see for instance [ZHU]).
As a consequence of the availability of robust estimation approaches, MRFs
are mostly used to represent complex dependency structures in random variables
which describe some data, and to manipulate the data (for instance, detect
clusters) based on the obtained representations. However, in most practical
applications, the mathematical formalisms are rather abstract and the physical
meanings of the estimated parameters are lost or not of interest.
Our main goal in this study is to illustrate how generalized Potts models,
that is MRFs, can be used to model data with complex interdependencies and
how to gain access to hidden information by looking at the parameters of the
model. Concept-wise, this approach is similar to a scenario in which we are given
a sample of an ensemble generated by a Potts model at an unknown temperature
T. In this case, we can estimate the temperature T using the provided samples
and thus characterize the system to the best of our knowledge.
Starting out from the original Potts model, first we will rewrite the model
Hamiltonian, and transform it into a more general form. Then, we present
the applicability of the idea through a concrete example. We will implement
the rewritten model for the complex patterns of human settlements observed
through night-light photographs of the Earth. We will estimate the parameters
of the model, interpret them in the context of the data and, finally, based on
the estimated values, we will draw conclusions regarding the spatial structure
of the night light and settlement patterns.
2 The Used Model
2.1 The Potts model in general
In its original form, the Potts model is defined in the following way: A spin,
encoded by a scalar variable with q possible values, for instance, the integers
{l,...,g}, is placed on each site of a (usually) regular lattice. These scalar
variables represent the direction of the spins. The spins interact with their
nearest neighbors and an external magnetic field b! (if present). The interaction
energy is defined by the Hamiltonian
H = (!)
<*J> *
where the (z, j) notation means neighboring i and j lattice indexes, cq is the
state of spin i, J' is the coupling constant, defining the strength of the interaction
between two neighboring spins found in similar states, <5 is the Kronecker delta,
and s is a particular spin orientation. In this sense, the magnetic field is parallel
with the spin orientation s.
Given a set of spin-configurations, known to be a sample of an ensemble
generated by a Potts model at a given temperature T with fixed parameters
J' and h\ it is rather straightforward to estimate the parameter combinations
2
J'/T and h'/T, and thus characterize the ensemble within the error limits set
by the quality of the sample. This kind of estimation is called unsupervised
learning , as it requires no input (such as hand-labeling certain features in the
samples) from humans.
2.2 The modified model
While many modifications and extensions of the Potts model have been studied
and engaged in different applications, here we will present a generalization of
the model based on a relatively simple observation. Note that the Kronecker
delta in the first term in Equation (ID is nothing else but a q x q unity matrix
indexed by the states cq and (J 3 . According to this formalism, spins interact
only if they are in the same state, that is, if they are parallel. Replacing this
unity matrix by an arbitrary q x q symmetric matrix introduces interactions
between non-parallel states. Furthermore, we can also rewrite the last term
in the Hamiltonian, introducing an arbitrary interaction of the different states
with the external magnetic field in the following way:
tf = -£^-£^«> (2)
<*>i) *
where J is a q x q symmetric matrix, and h is a q dimensional vector. J here
defines the strength of the interactions between the different orientations. For
instance J 12 (= J 21 ) tells us how strongly spin orientations 1 and 2 interact if
they are neighbors. Similarly, J 33 drives the interactions between two neighbor¬
ing spins when they both are in state 3.
In a further step, we note that it is possible to imagine a locally changing
external field with r possible orientations. Assuming that this field may have
different orientations around the different spins, and that it may interact with
each spin orientation differently, the Hamiltonian can be extended as
tf = -£^-£ - £ K U > ( 3 )
(*.j> * »
where h is the orientation of the locally changing external field around spin i,
and K' is a q x r matrix which defines the interactions of the spins with the
locally changing field. Note that here, the orientations of the local field are also
encoded by integers, in this case k € {1,... , r}. As an example, K 21 tells us
how strongly a spin in orientation 2 interacts with a locally changing field with
an orientation 1. This is not equivalent with K[ 2 which tells us how a spin in
state 1 interacts with a field oriented in the direction 2. Similarly, K' i4 dictates
the interaction of spin state 4 with field orientation 4.
As a last generalization step, we observe that the h from the second sum in
Equation ([3]) can be absorbed into K' by adding h s to each entry in row s of
K'. As a result, we get
tf = -£^-£^b- (4)
(i,j) i
3
While the model in Equation Q is formally very similar to the original Potts
model, it is also very general and flexible because of the matrix parameters. We
can instantly get back the original Potts model from Equation © by replacing
J with J'l and K by h'L s , where I is the identity matrix and L s is a matrix
with all zeros except the entry in the s-th row and s-th column, which is 1.
Once the Hamiltonian is defined, the distribution over the configurations is
given by the Boltzmann distribution:
p(H\J,K) = ±e~ H , (5)
where Z is the partition function which normalizes the distribution. Note that
here the inverse temperature was absorbed into the parameters J and K , thus
when estimating the parameters, we in fact estimate the J//3 and I\//3 ratios.
Note also that in fact H depends on the particular configuration W of the spins
and that of the locally changing external field, which will be denoted by D. This
means that in some sense H is a function of these configurations: H = H(W, , D).
2.3 Learning
As we mentioned, the main question in most applications of MRFs is the value
of the model parameters, given some sample configuration. However, because
the partition function Z in Equation ([5]) contains exponentially many terms, the
calculation of the likelihood function is computationally intractable for systems
with many spins, therefore standard maximal likelihood estimations are not
viable. However, there are plenty of alternatives.
Even though the likelihood itself cannot be calculated, learning approaches
usually take the derivative of the log-likelihood as a common starting point.
This derivative for the model defined in Equation (|4|) can be given as
dlo gj P(0|{S o })
dO
1
dO
y, J & ifrj + y K aik - log A
<i,j> i
MS o)
d\ogZ
80
( 6 )
where 0 G {J a b\a, 6 G {1, 2,..., g}} U {K ac \a G {1,2, ...,g},cG {1,2,..., r}} is
one of the parameters of the model, So is some observed data for which we want
to estimate the parameters 0 (these observations should contain configurations
both for W and D), and 4>e(S) is what the Machine Learning literature calls the
potential corresponding to the parameter 0 in a system with configuration S.
Note that these potentials are not potentials in a physical sense. The potential
for a given J a b parameter is the negative derivative of the Hamiltonian with
respect to this parameter, and it can be given as
fijab
^ v tin a, tibcjj -
<i,j>
(7)
4
Similarly
Sa<7i 5cli ■ ( 8 )
i
The last term in the derivative (O is in fact the theoretical expected value
of the potentials:
log Z
dO
E sMS)e
~E<
-E,
Z
^2^e(S)p(S) = {(/)g)m,
s
( 9 )
where the notation (-} m is the theoretical (model) average . As we already
discussed this, this term is intractable, and its calculation or approximation is
the main problem in parameter estimations.
Different methods handle this problem with different approaches. Some
methods calculate a pseudo-likelihood [12], others, like Markov chain Monte-
Carlo (MCMC) maximum likelihood estimation, use importance sampling to
estimate the partition function [14] .
2.3.1 Stochastic Approximation Procedure
In the present study we applied a method called Stochastic Approximation Pro¬
cedure (SAP) as presented in [15]. This approach uses MCMC sampling to
estimate the model average ( <j>e)m■ SAP has been found to work very well when
implemented for Markov Random Fields.
Let us define a Markov process characterized by a transition probability
n(St — > St+ 1 ) which satisfies the detailed balance condition
p(S t )n(S t -»■ St+i) = p(S t+1 )n(S t+1 -»■ S t ). (10)
We initialize M Markov chains with a random initial configuration {S t *_i, Sf =1 ,
...,S' t ^ 1 }. The parameters 0 are also initialized with random values. We
simulate the Markov chains and after each Monte-Carlo step we calculate the
(■ 4>g{S t ))MC average of the potentials over the Monte-Carlo samples. Based on
these averages we update the parameters according to the update rule
0 = 0 + r] [<j>g(So) - (4>e{S t ))Mc] ,
where rj is the learning rate and it is decreased after each update. In case we
have a set (S'q, Sq, ..., ,Sq V } of observed data sample, we can replace the 4>g(So)
with the {(j)g(So)) average calculated over the observed samples. In this case,
the update rule is modified and it writes as
0 = 0 + ri[(MSo))-(&(&)} MC ] ■ (11)
To calculate the 7 t(5 * —>• St+ 1 ) transition probabilities, we can use the simple
Metropolis algorithm m-
5
Figure 1: Night light and population density coverage of North America. The
a resolution of the night light data (bright patterns) is 1000x400, a cell corre¬
sponds to roughly 10 kms. The gray area indicates the geographical regions for
which we have population density data. The side of a population density cell is
approximately 120 kms.
3 Describing the night-light distribution
In the following we will present how data with relatively complex interdepen¬
dencies can be modeled with the MRF presented in Equation ((Tj. For this,
we will use a relatively low resolution gridded population density data HZ! and
night-light data m for the year 2000. The latter one is practically contain¬
ing space photographs of the earth taken during night, wherein the artificial
light of human settlements is captured. Many studies relate these night light
patterns to the economic development of the regions m ■ They indicate that
the intensity and density of the night light can be viewed as an objective eco¬
nomic development index, especially for countries with a low-quality statistical
systems.
We carried out the calculations using the data as is, that is, no corrections
were applied. Without loss of generality, we chose to work only on North Amer¬
ica as presented in Figured) First we matched the grid of the population density
to that of the night light data. This meant refining the grid of the population
density from a resolution of around 85 x 35 to a resolution of 1000 x 400. In this
process no interpolation was used, we simply cut up the big population density
cells to many small ones.
We implemented the model from Equation (j4j for the night light intensities
w. In this case, the orientation (or the state) of a spins will correspond to
a given intensity level of the night lights. On the other hand, we considered
the population density d as the external field, the environment which has an
influence on the patterns in w. That is, we assumed the population density as
constant, and thought about the night light patterns as feature resulting from
the underlying spatial distribution of the population density. Since a spin in
our model can point only to a limited number of directions, and similarly, the
6
D
Figure 2: Graph representation of the MRF we implement for the example night
light data. W is the set of variables representing the discretized night light
intensities while D is the set of variables representing the discretized population
densities.
locally changing external field can have only a limited number of orientations,
we discretized both the population density data and the night light intensity
data. To achieve this, we need to project w and d to a discrete set. We will
call the elements of this set states instead of orientations as this naming fits
better in this context. In case of w, these states correspond to areas with weak,
medium and strong night light (encoded with the integers 1,2 and 3, that is,
<7i £ {1, 2, 3} for any i). Let us denote the sample of discretized light intensities
with W. In the same fashion, for d , the discretized population density values
indicate small, medium and large population densities (also encoded with the
same integers, i.e. I,, £ {1,2,3}). The sample of discretized densities will be
denoted by D.
Because of the spatial structure of the data, we arranged the variables repre¬
senting the night light according to a two dimensional square lattice, and, since
there is known spillover effect (meaning that new infrastructural/economic in¬
vestments tend to be made next to already developed regions), we connected
first order neighbors. Then we added another layer of variables which repre¬
sent the population densities d. In this case, the graph representation of the
model corresponds to the one in Figure[2] Note however, that in other scenarios,
variables might be arranged in structures other than a regular lattice. In fact,
this approach allows the usage of arbitrary networks. Note also that variables
representing the population densities are not connected to each other, that is
they don’t depend on each other in this study. We chose this implementation
only because we focus on the night light patterns and consider the population
densities as constants. As such, we did not care about how they depend on the
densities in the neighboring regions.
Again, the energy of the system is defined by Equation (J4]) . The values
of J will influence how compatible two different possible values of er are (for
7
instance, Jn will score how likely it is to observe a dim region next to another
dim region, while J23 drives the probability of finding a bright area next to
a medium lit area). On the other hand, K defines the compatibility of the
night light states with the states of the variables representing the population
densities (the latter variable being represented with red spheres in Figure[2j). For
example, while K 12 indicates how likely it is to have a dark area coupled with a
medium population density, A '31 will influence the probability of finding a bright
region with a low population density. Let us emphasize, that the configuration
of the locally changing external field is constant D , and in fact mathematically,
it can be handled as a parameter of the distribution. Then, in the formalism
of the Canonical Ensemble [20], the probability distribution over the possible
configurations in W can be given as
P(W\D,J,I<) = ^e~ H , (12)
where again the inverse temperature term was absorbed in the parameters of the
Hamiltonian J and K. While W, D 1 J and K do not show up directly on the
right hand side of Equation (fl2l) , the distribution is a function of these parame¬
ters/variables as the Hamiltonian H depends on them. Another important thing
to mention is that if the local Markov property is satisfied, meaning that a given
spin probabilistically depends only on the variables that it is directly connected
with in the graph representation, the Hammersley-Clifford theorem m states
that the probability distribution of a given night light configuration W can al¬
ways be given in the form Equation (ED- This is a fundamental point which
establishes a connection between different MRFs and ensures that algorithms
developed for a given MRF can be adapted to other models of the family.
Using the sample data, presented in Figurejl] the model parameters J and K
are then estimated through the approach presented in Section [2731 . The results
of this estimation are presented in Table |T] Note that the absolute values of
the parameters do not matter, it is their relative values which is important.
Therefore, in order to easily perceive the relative differences, the values in the
matrices are shifted by subtracting the minimums of each J and K matrix from
all of the entries of the corresponding matrix, thus the new minimums are 0 .
At this point, let us mention that, since the values of J and K are parameters
playing similar roles in the same energy function, we can compare their values.
This is one of the beneficial “side effects” of the approach presented here. Such
a comparison would not be possible if we would simply calculate neighboring
probabilities and population density and light intensity pairing probabilities, as
the these belong to different probability spaces.
Analyzing the values in Table [I] we can deduce the following: If we look
at the spillover effect of the night light, which is given by J, the larger entries
along the diagonal of J indicate that it is more probable to observe same intensity
levels than different ones in neighboring cells. This effect is the strongest for high
intensity regions. On the other hand, comparing the J values with the values
in K, we see that the prerequisite for observing low light regions is driven both
by having low light surrounding and low or at most medium density population
J
K
0.5316
0.3370
0
0.5851
0.5519
0.3780
0.3370
0.4570
0.2234
1.9544
2.0211
2.0554
0
0.2234
1.2354
0
0.6043
0.8941
Table 1: Results for the estimation of the J and K parameters.
(Jll Kll and K12 are comparable). At the same time, it is the least likely that
we will find dark areas associated with high population densities (K13 is the
smallest in row 1 of K). Although it is less salient when looking at the learned
parameters, the situation is similar for medium lit regions in the sense that
these regions will most probably have a medium lit surrounding and a medium
or high population density, the latter is scored slightly better. These regions
could correspond, for instance, to living neighborhoods, regardless whether they
are located in a metropolis area or more in the country side. Finally, for bright
regions it is much more likely to be observed close to other brightly lit regions,
and this effect is more important than the population density in the area (J33
has the biggest weight compared to other entries in the third row of J and K).
These territories most probably correspond to bright metropolitan centers.
4 Discussion and Conclusions
We presented how a generalized Potts model can be used to model and study
data with complex spatial interdependencies. Starting out from the original
Potts model, we rewrote the Hamiltonian allowing arbitrary interactions be¬
tween any two spin-states. In addition, we considered a locally changing exter¬
nal field which may have a different orientation from spin to spin. We present
the usefulness of the approach through a simple example: We investigated the
interdependencies of the night light patterns and population densities of a given
geographic territory.
The introduced modifications of the Potts model enabled us to think about
the night light as a spin configuration. In this case, the population densities
were taken into account by considering them as a locally changing external field.
Similarly to the interaction between the different spin-states, the effect of the
locally changing filed on the different spin state was also considered arbitrary.
In this setting, the aim is to estimate all these interactions.
First, we presented how parameters can be estimated in such a framework.
Then, after some necessary data preprocessing steps, which were kept minimal,
we estimated the parameters. For this, we used an in-house developed software
in our work. Once the parameters were learned based on the data sample, we
were ready to draw our conclusions. Here we exploited the advantage of the
presented framework, which consists in the fact that the parameters J and I\
are of the same nature.
Based on such calculations, we could observe how territories with certain
light intensity group while for the formation of other intensity levels the popu-
9
lation density is a more important factor. Our simple conclusions make sense
even if the data we used is rather imprecise as the resolution of the population
density map we used is extremely coarse. It is obvious that with good quality
data the outcomes would be more reliable and relevant.
5 Acknowledgments
The author would like to thank Dieter W. Heernrann, Zoltan Neda and Miriam
Fritsche for the useful discussions.
References
[1] R. B. Potts and C. Dornb. Some generalized order-disorder transformations.
Proceedings of the Cambridge Philosophical Society, 48:106, 1952.
[2] Ernst Ising. Beitrag zur theorie des ferromagnetismus. Zeitschrift fur
Physik, 31(1):253—258, February 1925.
[3] Jorg Reichardt and Stefan Bornholdt. Detecting fuzzy community struc¬
tures in complex networks with a potts model. Physical Review Letters,
93(21):218701, November 2004.
[4] Katarzyna A. Rejniak and Alexander R. A. Anderson. Hybrid models
of tumor growth. Wiley Interdisciplinary Reviews: Systems Biology and
Medicine , 3(1):115-125, 2011.
[5] Fan Chen, Kazuyuki Tanaka, and Tsuyoshi Horiguchi. Image segmenta¬
tion based on bethe approximation for gaussian mixture model. Interdis¬
ciplinary Information Sciences, 11(1):17—29, 2005.
[6] Andrew Blake, Pushmeet Kohli, and Carsten Rother. Markov Random
Fields for Vision and Image Processing. MIT Press, 2011.
[7] R. Redner and H. Walker. Mixture densities, maximum likelihood and the
EM algorithm. SIAM Review, 26(2):195-239, April 1984.
[8] B. S. Manjunath and R. Chellappa. Unsupervised texture segmentation
using markov random field models. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 13(5):478-482, 1991.
[9] Y. Boykov, O. Veksler, and R. Zabih. Markov random fields with efficient
approximations. In 1998 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, 1998. Proceedings, pages 648-655, June
1998.
[10] X. Descombes, R.D. Morris, J. Zerubia, and Marc Berthod. Estimation
of markov random held prior parameters using markov chain monte carlo
maximum likelihood. IEEE Transactions on Image Processing, 8(7):954-
963, July 1999.
10
[11] John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. Con¬
ditional random fields: Probabilistic models for segmenting andLabeling
sequence data. In International Conference on Machine Learning, pages
282-289, 2001.
[12] Fuchun Huang and Yosihiko Ogata. Generalized pseudo-likelihood esti¬
mates for markov random fields on lattice. Annals of the Institute of Sta¬
tistical Mathematics, 54(1):1 18, March 2002.
[13] Chuanshu Ji and Lynne Seymour. A consistent model selection proce¬
dure for markov random fields based on penalized pseudolikelihood. The
Annals of Applied Probability , 6(2):423-443, May 1996. Mathematical Re¬
views number (MathSciNet): MR1398052; Zentralblatt MATH identifier:
0856.62082.
[14] Ronald Charles Neath. Monte Carlo Methods for Likelihood-based Inference
in Hierarchical Models. ProQuest, 2006.
[15] Ruslan Salakhutdinov. Learning in markov random fields using tempered
transitions. In Advances in neural information processing systems , pages
1598-1606, 2009.
[16] Nicholas Metropolis, Arianna W. Rosenblutli, Marshall N. Rosenbluth, Au¬
gusta H. Teller, and Edward Teller. Equation of state calculations by fast
computing machines. The Journal of Chemical Physics, 21(6):1087-1092,
June 1953.
[17] Columbia University Center for International Earth Science Informa¬
tion Network (CIESIN) and Centro Internacional de Agricultura Tropi¬
cal (CIAT). Gridded population of the world version 3(GPWv3): Popula¬
tion density grids, http://sedac.ciesin.columbia.edu/gpw, 2005.
[18] NASA’s Visible Earth. Earth’s city lights.
http://visibleearth.nasa.gov/view.php?id=55167, August 2009.
[19] William Nordhaus and Xi Chen. A sharper image? estimates of the pre¬
cision of nighttime lights as a proxy for economic statistics. Journal of
Economic Geography , page lbuOlO, May 2014.
[20] R. K. Pathria and Paul D. Beale. Statistical Mechanics. Academic Press,
April 2011.
[21] Julian Besag. On spatial-temporal models and markov fields. In J. Koenik,
editor, Transactions of the Seventh Prague Conference on Information The¬
ory, Statistical Decision Functions, Random Processes and of the 197f Eu¬
ropean Meeting of Statisticians, number 7A in Transactions of the Seventh
Prague Conference on Information Theory, Statistical Decision Functions,
Random Processes and of the 1974 European Meeting of Statisticians, pages
47-55. Springer Netherlands, January 1977.
11