python - Optimize Cython code for numpy variance calculation -

i trying optimize cython code , there seems quite bit of room improvement here part of profile %prun extension in ipython notebook:

 7016695 function calls in 18.475 seconds     ordered by: internal time     ncalls  tottime  percall  cumtime  percall filename:lineno(function)    400722    7.723    0.000   15.086    0.000 _methods.py:73(_var)    814815    4.190    0.000    4.190    0.000 {method 'reduce' of 'numpy.ufunc' objects}         1    1.855    1.855   18.475   18.475 {_cython_magic_aed83b9d1a706200aa6cef0b7577cf41.knn_alg}    403683    0.838    0.000    1.047    0.000 _methods.py:39(_count_reduce_items)    813031    0.782    0.000    0.782    0.000 {numpy.core.multiarray.array}    398748    0.611    0.000   15.485    0.000 fromnumeric.py:2819(var)    804405    0.556    0.000    1.327    0.000 numeric.py:462(asanyarray)

seeing program spending 8 seconds calculating variance hoping able sped up

i calculating variance using np.var() of 1d array length 404 ~1000 times. checked c standard library , unfortunately there no function , don't want write own in c.

1.is there other option?

2.any way reduce time spent in second item on list?

here code if helps see:

cpdef knn_alg(np.ndarray[double, ndim=2] temp, np.ndarray[double, ndim=1] jan1, int l, int w, int b):  cdef np.ndarray[double, ndim=3] lnn = np.zeros((l+1,temp.shape[1],365))  lnn = lnn_alg(temp, l, w)  cdef np.ndarray[double, ndim=2] sim = np.zeros((len(temp),temp.shape[1])) cdef np.ndarray [double, ndim=2] = np.zeros((l+1,lnn.shape[1])) cdef int b cdef np.ndarray [double, ndim=2] c = np.zeros((l,lnn.shape[1]-3)) cdef np.ndarray [double, ndim=2] lnn_scale = np.zeros((l,lnn.shape[1])) cdef np.ndarray [double, ndim=2] cov_t = np.zeros((3,3))    cdef np.ndarray [double, ndim=2] dk = np.zeros((l,4)) cdef int random_selection cdef np.ndarray [double, ndim=1] day_month cdef int day_of_year cdef np.ndarray [double, ndim=2] lnn_scaled cdef np.ndarray [double, ndim=2] temp_scaled cdef np.ndarray [double, ndim=2] eig_vec cdef double pc_t cdef np.ndarray [double, ndim=1] pc_l cdef double k  cdef np.ndarray[double, ndim=2] knn cdef np.ndarray[double, ndim=1] val cdef np.ndarray[double, ndim=1] pn cdef double rand_num cdef int nn cdef int index cdef int inc cdef int   sim[0,:] = jan1  in xrange(1,len(temp),b):      #if leap day randomly select feb 28 or mar 31     if (temp[i,4]==2) & (temp[i,3]==29):         random_selection = np.random.randint(0,1)         day_month = np.array([[29,2],[1,3]])[random_selection]     else:         day_month = temp[i,3:5]      #convert day month day of year l+1 nearest neighbors selection     current = datetime.datetime(2014, (<int>day_month[1]), (<int>day_month[0]))     day_of_year = current.timetuple().tm_yday - 1      #take out current day l+1 nearest neighbors     = lnn[:,:,day_of_year]     b = np.where((a[:,3:6] == temp[i,3:6]).all(axis=-1))[0][0]     c = np.delete(a,(b), axis=0)      #scale , center data nearest neighbors , spatially averaged historical data     lnn_scaled = scale(c[:,0:3])     temp_scaled = scale(temp[:,0:3])      #calculate covariance matrix of nearest neighbors     cov_t[:,:] = np.cov(lnn_scaled.t)      #calculate eigenvalues , vectors of covariance matrix     eig_vec = eig(cov_t)[1]      #calculate principal components of scaled l nearest neighbors ,      pc_t = np.dot(temp_scaled[i],eig_vec[0])     pc_l = np.dot(lnn_scaled,eig_vec[0])      #calculate mahalonobis distance     dk = np.zeros((404,4))     dk[:,0] = np.array([sqrt((pc_t-pc)**2/np.var(pc_l)) pc in pc_l])     dk[:,1:4] = c[:,3:6]      #extract k nearest neighbors     dk = dk[dk[:,0].argsort()]     k = round(sqrt(l),0)     knn = dk[0:(<int>k)]      #create probility density function     val = np.array([1.0/k k in range(1,len(knn)+1)])     wk = val/(<int>val.sum())     pn = wk.cumsum()      #select next days value knns using probability density function random value     rand_num = np.random.rand(1)[0]     nn = (abs(pn-rand_num)).argmin()     index = np.where((temp[:,3:6] == knn[nn,1:4]).all(axis=-1))[0][0]      if i+b > len(temp):         inc = len(temp) -     else:         inc = b      if (index+b > len(temp)):         index = len(temp)-b      sim[i:i+inc,:] = temp[index:index+inc,:]      return sim

the variance calculation in line:

 dk[:,0] = np.array([sqrt((pc_t-pc)**2/np.var(pc_l)) pc in pc_l])

any advice helpful quite new cython

i went through said calculation , think reason going slow using np.var() python (or numpy) function , not allow loop compiled in c. if knows how while using numpy let me know.

what ended doing coding calculation this:

dk[:,0] = np.array([sqrt((pc_t-pc)**2/np.var(pc_l)) pc in pc_l])

to separate function:

cimport cython cimport numpy np import numpy np libc.math cimport sqrt csqrt libc.math cimport pow cpow @cython.boundscheck(false) @cython.cdivision(true)  cdef cy_mahalanobis(np.ndarray[double, ndim=1] pc_l, double pc_t):     cdef unsigned int i,j,l     l = pc_l.shape[0]     cdef np.ndarray[double] dk = np.zeros(l)     cdef double x,total,mean,var       total = 0     in xrange(l):         x = pc_l[i]         total = total + x      mean = total / l     total = 0     in xrange(l):         x = cpow(pc_l[i]-mean,2)         total = total + x      var = total / l      j in xrange(l):         dk[j] = csqrt(cpow(pc_t-pc_l[j],2)/var)      return dk

and because not calling any python functions (including numpy) entire loop able compiled in c (no yellow lines when using annotate option cython -a file.pyx or %%cython -a ipython notebook).

overall code ended being order of magnitude faster! worth effort coding hand! cython (and python matter) not greatest additional suggestions or answers appreciated.

Search This Blog

DTr

python - Optimize Cython code for numpy variance calculation -

Comments

Post a Comment

Popular posts from this blog

c++ - OpenCV Error: Assertion failed <scn == 3 ::scn == 4> in unknown function, -

php - render data via PDO::FETCH_FUNC vs loop -

The canvas has been tainted by cross-origin data in chrome only -