python - Fast, elegant way to calculate empirical/sample covariogram -


does know method calculate empirical/sample covariogram, if possible in python?

this screenshot of book contains definition of covariagram:

enter image description here

if understood correctly, given lag/width h, i'm supposed pair of points separated h (or less h), multiply values , each of these points, calculate mean, in case, defined m(x_i). however, according definition of m(x_{i}), if want compute m(x1), need obtain average of values located within distance h x1. looks intensive computation.

first of all, understanding correctly? if so, way compute assuming 2 dimensional space? tried code in python (using numpy , pandas), takes couple of seconds , i'm not sure correct, why refrain posting code here. here attempt of naive implementation:

from scipy.spatial.distance import pdist, squareform distances = squareform(pdist(np.array(coordinates))) # coordinates nx2 array z = np.array(z) # z values cutoff = np.max(distances)/3.0 # arbitrary cutoff width = cutoff/15.0 widths = np.arange(0, cutoff + width, width) z = [] cov = []  w in np.arange(len(widths)-1): # each width     # each pairwise distance     in np.arange(distances.shape[0]):          j in np.arange(distances.shape[1]):              if distances[i, j] <= widths[w+1] , distances[i, j] > widths[w]:                 m1 = []                 m2 = []                 # when distance within given width, calculate means of                 # points involved                 x in np.arange(distances.shape[1]):                     if distances[i,x] <= widths[w+1] , distances[i, x] > widths[w]:                         m1.append(z[x])                  y in np.arange(distances.shape[1]):                     if distances[j,y] <= widths[w+1] , distances[j, y] > widths[w]:                         m2.append(z[y])                  mean_m1 = np.array(m1).mean()                  mean_m2 = np.array(m2).mean()                 z.append(z[i]*z[j] - mean_m1*mean_m2)     z_mean = np.array(z).mean() # calculate covariogram width w     cov.append(z_mean) # collect covariances widths 

however, have confirmed there error in code. know because used variogram calculate covariogram (covariogram(h) = covariogram(0) - variogram(h)) , different plot:

enter image description here

and supposed this:

enter image description here

finally, if know python/r/matlab library calculate empirical covariograms, let me know. @ least, way can verify did.

one use scipy.cov, if 1 calculation directly (which easy), there more ways speed up.

first, make fake data has spacial correlations. i'll first making spatial correlations, , using random data points generated using this, data positioned according underlying map, , takes on values of underlying map.

edit 1:
changed data point generator positions purely random, z-values proportional spatial map. and, changed map left , right side shifted relative eachother create negative correlation @ large h.

from numpy import * import random import matplotlib.pyplot plt  s = 1000 n = 900 # first, make fake data, correlations on 2 spatial scales #     density map x = linspace(0, 2*pi, s) sx = sin(3*x)*sin(10*x) density = .8* abs(outer(sx, sx)) density[:,:s//2] += .2 #     make point cloud motivated density random.seed(10)  # can repeated points = [] while len(points)<n:     v, ix, iy = random.random(), random.randint(0,s-1), random.randint(0,s-1)     if true: #v<density[ix,iy]:         points.append([ix, iy, density[ix,iy]]) locations = array(points).transpose() print locations.shape plt.imshow(density, alpha=.3, origin='lower') plt.plot(locations[1,:], locations[0,:], '.k') plt.xlim((0,s)) plt.ylim((0,s)) plt.show() #     build these main data: pairs distances , z0 z1 values l = locations m = array([[math.sqrt((l[0,i]-l[0,j])**2+(l[1,i]-l[1,j])**2), l[2,i], l[2,j]]                           in range(n) j in range(n) if i>j]) 

which gives:

enter image description here

the above simulated data, , made no attempt optimize it's production, etc. assume op starts, task below, since data exists in real situation.


now calculate "covariogram" (which easier generating fake data, btw). idea here sort pairs , associated values h, , index these using ihvals. is, summing index ihval sum on n(h) in equation, since includes pairs hs below desired values.

edit 2:
suggested in comments below, n(h) pairs between h-dh , h, rather pairs between 0 , h (where dh spacing of h-values in ihvals -- ie, s/1000 used below).

# real calculations covariogram #    sort h , give clear names = argsort(m[:,0])  # h sorting h = m[i,0] zh = m[i,1] zsh = m[i,2] zz = zh*zsh  hvals = linspace(0,s,1000)  # values of h use (s should in units of distance, here used ints) ihvals = searchsorted(h, hvals) result = [] i, ihval in enumerate(ihvals[1:]):     start, stop = ihvals[i-1], ihval     n = stop-start     if n>0:         mnh = sum(zh[start:stop])/n         mph = sum(zsh[start:stop])/n         szz = sum(zz[start:stop])/n         c = szz-mnh*mph         result.append([h[ihval], c]) result = array(result) plt.plot(result[:,0], result[:,1]) plt.grid() plt.show() 

enter image description here

which looks reasonable me 1 can see bumps or troughs @ expected h values, haven't done careful check.

the main speedup here on scipy.cov, 1 can precalculate of products, zz. otherwise, 1 feed zh , zsh cov every new h, , products recalculated. calculate sped more doing partial sums, ie, ihvals[n-1] ihvals[n] @ each timestep n, doubt necessary.


Comments

Popular posts from this blog

php - render data via PDO::FETCH_FUNC vs loop -

c++ - OpenCV Error: Assertion failed <scn == 3 ::scn == 4> in unknown function, -

The canvas has been tainted by cross-origin data in chrome only -