python - Optimize Cython code for numpy variance calculation -


i trying optimize cython code , there seems quite bit of room improvement here part of profile %prun extension in ipython notebook:

 7016695 function calls in 18.475 seconds     ordered by: internal time     ncalls  tottime  percall  cumtime  percall filename:lineno(function)    400722    7.723    0.000   15.086    0.000 _methods.py:73(_var)    814815    4.190    0.000    4.190    0.000 {method 'reduce' of 'numpy.ufunc' objects}         1    1.855    1.855   18.475   18.475 {_cython_magic_aed83b9d1a706200aa6cef0b7577cf41.knn_alg}    403683    0.838    0.000    1.047    0.000 _methods.py:39(_count_reduce_items)    813031    0.782    0.000    0.782    0.000 {numpy.core.multiarray.array}    398748    0.611    0.000   15.485    0.000 fromnumeric.py:2819(var)    804405    0.556    0.000    1.327    0.000 numeric.py:462(asanyarray) 

seeing program spending 8 seconds calculating variance hoping able sped up

i calculating variance using np.var() of 1d array length 404 ~1000 times. checked c standard library , unfortunately there no function , don't want write own in c.

1.is there other option?

2.any way reduce time spent in second item on list?

here code if helps see:

cpdef knn_alg(np.ndarray[double, ndim=2] temp, np.ndarray[double, ndim=1] jan1, int l, int w, int b):  cdef np.ndarray[double, ndim=3] lnn = np.zeros((l+1,temp.shape[1],365))  lnn = lnn_alg(temp, l, w)  cdef np.ndarray[double, ndim=2] sim = np.zeros((len(temp),temp.shape[1])) cdef np.ndarray [double, ndim=2] = np.zeros((l+1,lnn.shape[1])) cdef int b cdef np.ndarray [double, ndim=2] c = np.zeros((l,lnn.shape[1]-3)) cdef np.ndarray [double, ndim=2] lnn_scale = np.zeros((l,lnn.shape[1])) cdef np.ndarray [double, ndim=2] cov_t = np.zeros((3,3))    cdef np.ndarray [double, ndim=2] dk = np.zeros((l,4)) cdef int random_selection cdef np.ndarray [double, ndim=1] day_month cdef int day_of_year cdef np.ndarray [double, ndim=2] lnn_scaled cdef np.ndarray [double, ndim=2] temp_scaled cdef np.ndarray [double, ndim=2] eig_vec cdef double pc_t cdef np.ndarray [double, ndim=1] pc_l cdef double k  cdef np.ndarray[double, ndim=2] knn cdef np.ndarray[double, ndim=1] val cdef np.ndarray[double, ndim=1] pn cdef double rand_num cdef int nn cdef int index cdef int inc cdef int   sim[0,:] = jan1  in xrange(1,len(temp),b):      #if leap day randomly select feb 28 or mar 31     if (temp[i,4]==2) & (temp[i,3]==29):         random_selection = np.random.randint(0,1)         day_month = np.array([[29,2],[1,3]])[random_selection]     else:         day_month = temp[i,3:5]      #convert day month day of year l+1 nearest neighbors selection     current = datetime.datetime(2014, (<int>day_month[1]), (<int>day_month[0]))     day_of_year = current.timetuple().tm_yday - 1      #take out current day l+1 nearest neighbors     = lnn[:,:,day_of_year]     b = np.where((a[:,3:6] == temp[i,3:6]).all(axis=-1))[0][0]     c = np.delete(a,(b), axis=0)      #scale , center data nearest neighbors , spatially averaged historical data     lnn_scaled = scale(c[:,0:3])     temp_scaled = scale(temp[:,0:3])      #calculate covariance matrix of nearest neighbors     cov_t[:,:] = np.cov(lnn_scaled.t)      #calculate eigenvalues , vectors of covariance matrix     eig_vec = eig(cov_t)[1]      #calculate principal components of scaled l nearest neighbors ,      pc_t = np.dot(temp_scaled[i],eig_vec[0])     pc_l = np.dot(lnn_scaled,eig_vec[0])      #calculate mahalonobis distance     dk = np.zeros((404,4))     dk[:,0] = np.array([sqrt((pc_t-pc)**2/np.var(pc_l)) pc in pc_l])     dk[:,1:4] = c[:,3:6]      #extract k nearest neighbors     dk = dk[dk[:,0].argsort()]     k = round(sqrt(l),0)     knn = dk[0:(<int>k)]      #create probility density function     val = np.array([1.0/k k in range(1,len(knn)+1)])     wk = val/(<int>val.sum())     pn = wk.cumsum()      #select next days value knns using probability density function random value     rand_num = np.random.rand(1)[0]     nn = (abs(pn-rand_num)).argmin()     index = np.where((temp[:,3:6] == knn[nn,1:4]).all(axis=-1))[0][0]      if i+b > len(temp):         inc = len(temp) -     else:         inc = b      if (index+b > len(temp)):         index = len(temp)-b      sim[i:i+inc,:] = temp[index:index+inc,:]      return sim  

the variance calculation in line:

 dk[:,0] = np.array([sqrt((pc_t-pc)**2/np.var(pc_l)) pc in pc_l]) 

any advice helpful quite new cython

i went through said calculation , think reason going slow using np.var() python (or numpy) function , not allow loop compiled in c. if knows how while using numpy let me know.

what ended doing coding calculation this:

dk[:,0] = np.array([sqrt((pc_t-pc)**2/np.var(pc_l)) pc in pc_l]) 

to separate function:

cimport cython cimport numpy np import numpy np libc.math cimport sqrt csqrt libc.math cimport pow cpow @cython.boundscheck(false) @cython.cdivision(true)  cdef cy_mahalanobis(np.ndarray[double, ndim=1] pc_l, double pc_t):     cdef unsigned int i,j,l     l = pc_l.shape[0]     cdef np.ndarray[double] dk = np.zeros(l)     cdef double x,total,mean,var       total = 0     in xrange(l):         x = pc_l[i]         total = total + x      mean = total / l     total = 0     in xrange(l):         x = cpow(pc_l[i]-mean,2)         total = total + x      var = total / l      j in xrange(l):         dk[j] = csqrt(cpow(pc_t-pc_l[j],2)/var)      return dk    

and because not calling any python functions (including numpy) entire loop able compiled in c (no yellow lines when using annotate option cython -a file.pyx or %%cython -a ipython notebook).

overall code ended being order of magnitude faster! worth effort coding hand! cython (and python matter) not greatest additional suggestions or answers appreciated.


Comments

Popular posts from this blog

c++ - OpenCV Error: Assertion failed <scn == 3 ::scn == 4> in unknown function, -

php - render data via PDO::FETCH_FUNC vs loop -

The canvas has been tainted by cross-origin data in chrome only -