python - Fast, elegant way to calculate empirical/sample covariogram -
does know method calculate empirical/sample covariogram, if possible in python?
this screenshot of book contains definition of covariagram:
if understood correctly, given lag/width h, i'm supposed pair of points separated h (or less h), multiply values , each of these points, calculate mean, in case, defined m(x_i). however, according definition of m(x_{i}), if want compute m(x1), need obtain average of values located within distance h x1. looks intensive computation.
first of all, understanding correctly? if so, way compute assuming 2 dimensional space? tried code in python (using numpy , pandas), takes couple of seconds , i'm not sure correct, why refrain posting code here. here attempt of naive implementation:
from scipy.spatial.distance import pdist, squareform distances = squareform(pdist(np.array(coordinates))) # coordinates nx2 array z = np.array(z) # z values cutoff = np.max(distances)/3.0 # arbitrary cutoff width = cutoff/15.0 widths = np.arange(0, cutoff + width, width) z = [] cov = [] w in np.arange(len(widths)-1): # each width # each pairwise distance in np.arange(distances.shape[0]): j in np.arange(distances.shape[1]): if distances[i, j] <= widths[w+1] , distances[i, j] > widths[w]: m1 = [] m2 = [] # when distance within given width, calculate means of # points involved x in np.arange(distances.shape[1]): if distances[i,x] <= widths[w+1] , distances[i, x] > widths[w]: m1.append(z[x]) y in np.arange(distances.shape[1]): if distances[j,y] <= widths[w+1] , distances[j, y] > widths[w]: m2.append(z[y]) mean_m1 = np.array(m1).mean() mean_m2 = np.array(m2).mean() z.append(z[i]*z[j] - mean_m1*mean_m2) z_mean = np.array(z).mean() # calculate covariogram width w cov.append(z_mean) # collect covariances widths
however, have confirmed there error in code. know because used variogram calculate covariogram (covariogram(h) = covariogram(0) - variogram(h)) , different plot:
and supposed this:
finally, if know python/r/matlab library calculate empirical covariograms, let me know. @ least, way can verify did.
one use scipy.cov
, if 1 calculation directly (which easy), there more ways speed up.
first, make fake data has spacial correlations. i'll first making spatial correlations, , using random data points generated using this, data positioned according underlying map, , takes on values of underlying map.
edit 1:
changed data point generator positions purely random, z-values proportional spatial map. and, changed map left , right side shifted relative eachother create negative correlation @ large h
.
from numpy import * import random import matplotlib.pyplot plt s = 1000 n = 900 # first, make fake data, correlations on 2 spatial scales # density map x = linspace(0, 2*pi, s) sx = sin(3*x)*sin(10*x) density = .8* abs(outer(sx, sx)) density[:,:s//2] += .2 # make point cloud motivated density random.seed(10) # can repeated points = [] while len(points)<n: v, ix, iy = random.random(), random.randint(0,s-1), random.randint(0,s-1) if true: #v<density[ix,iy]: points.append([ix, iy, density[ix,iy]]) locations = array(points).transpose() print locations.shape plt.imshow(density, alpha=.3, origin='lower') plt.plot(locations[1,:], locations[0,:], '.k') plt.xlim((0,s)) plt.ylim((0,s)) plt.show() # build these main data: pairs distances , z0 z1 values l = locations m = array([[math.sqrt((l[0,i]-l[0,j])**2+(l[1,i]-l[1,j])**2), l[2,i], l[2,j]] in range(n) j in range(n) if i>j])
which gives:
the above simulated data, , made no attempt optimize it's production, etc. assume op starts, task below, since data exists in real situation.
now calculate "covariogram" (which easier generating fake data, btw). idea here sort pairs , associated values h
, , index these using ihvals
. is, summing index ihval
sum on n(h)
in equation, since includes pairs h
s below desired values.
edit 2:
suggested in comments below, n(h)
pairs between h-dh
, h
, rather pairs between 0
, h
(where dh
spacing of h
-values in ihvals
-- ie, s/1000 used below).
# real calculations covariogram # sort h , give clear names = argsort(m[:,0]) # h sorting h = m[i,0] zh = m[i,1] zsh = m[i,2] zz = zh*zsh hvals = linspace(0,s,1000) # values of h use (s should in units of distance, here used ints) ihvals = searchsorted(h, hvals) result = [] i, ihval in enumerate(ihvals[1:]): start, stop = ihvals[i-1], ihval n = stop-start if n>0: mnh = sum(zh[start:stop])/n mph = sum(zsh[start:stop])/n szz = sum(zz[start:stop])/n c = szz-mnh*mph result.append([h[ihval], c]) result = array(result) plt.plot(result[:,0], result[:,1]) plt.grid() plt.show()
which looks reasonable me 1 can see bumps or troughs @ expected h values, haven't done careful check.
the main speedup here on scipy.cov
, 1 can precalculate of products, zz
. otherwise, 1 feed zh
, zsh
cov
every new h
, , products recalculated. calculate sped more doing partial sums, ie, ihvals[n-1]
ihvals[n]
@ each timestep n
, doubt necessary.
Comments
Post a Comment