Tuesday, December 20, 2011

Rescaling

This is an efficient function for rescaling n-dimensional arrays.

def rescale( data, newmin, newmax ):
    oldmin, oldmax = np.min( data ), np.max( data )
    newscale = float( newmax - newmin )
    oldscale = float( oldmax - oldmin )
    if oldscale == 0: return None
    ratio = float( newscale / oldscale )
    c = data.copy()
    c -= oldmin
    c *= ratio
    c += newmin
    return c

Thursday, December 8, 2011

Merging Segments in an Image


I'll walk though some code for merging segments in an image using a two classes, Segment and Segments. The first class is for 


class Segment:
    def __init__( self, elements, neighbors, class ):
        self.elements = [ elements ] # tuple addresses
        self.neighbors = neighbors # class labels
        self.class = class # personal class
        self.quality = None # either textured or smooth
        self.mean = None # mean reflectance
        self.texture = None # texture vector


<< OTHER CODE >>



    def merge( self, a, b ):
        # enhance readabilityness
        newclass = a.class
        oldclass = b.class
        # a subsumes b's elements 
        a.elements += b.elements 
        # set of class labels
        # the class labels neighboring a and the class label of a
        mine = set( a.neighbors + [ newclass ] )
        # set of class labels
        # the class labels neighboring b and the class lebel of b
        theirs = set( b.neighbors + [ oldclass ] )
        # set of class labels for elements of a (after adding b to a)
        ours = set( [ self.element_segment[i] for i in a.elements ] )
        # update a's new neighbors
        a.neighbors = list( mine.union( theirs ) - ours ) 
        # update class elementship of b's elements
        for m in b.elements:
            self.element_segment[ m ] = newclass 
            self.segment_element[ newclass ].append( m ) 
        # remove the oldclass from the dict of classes
        del self.segment_element[ oldclass ] 
        # remove the oldclass from the dict of segments
        del self.segments[ oldclass ]
        # update old neighbors of the oldclass
        for s in self.segments:
            if oldclass in self.segments[s].neighbors:
                while oldclass in self.segments[s].neighbors:
                    self.segments[s].neighbors.remove( oldclass )
                if newclass not in self.segments[s].neighbors:
                    self.segments[s].neighbors.append( newclass )
        
What the merge() function does, is figure out the new elements of the new segment (easy), figure out the new neighbors of the new segment (harder), and then do some cleaning operations.

In the merge() function, newclass = a.class and oldclass = b.class take the class labels of the segments to be merged, a and b. In this implementation, a is dominant, and subsumes b's class label.

Then, with a.elements += b.elements, we add the elements of b to the elements of a. These elements are tuple addresses for image chips, or individual pixels.

Imagine the following setup:

|---|---|---|---|
| 3 | 8 | 8 | 8 |
|---|---|---|---|

| 3 | 1 | 2 | 7 |
|---|---|---|---|
| 4 | 1 | 2 | 7 |
|---|---|---|---|
| 4 | 4 | 5 | 6 |
|---|---|---|---|

Here, segment a is denoted by the ones, and segment b is denoted by the twos. The line mine = set( a.neighbors + [ newclass ] ) is just the set {3,4,2,8,1}, the class labels for the n,s,e,w neighbors of segment a, which are {3,4,2,8}, and the class label for a, which is {1}. Similarly for theirs we have the class labels for the neighbors of b, and the class label for b, and this ends up being the set {1,8,7,5,2}. Then ours is the class labels in the new segment, which will be the set {1,2}, since 1 and 2 are the class labels of segment a and segment b.

Then the new neighbors of the new segment will be found by taking the union of mine and theirs, which is {3,8,4,7,5,1,2}, and then tossing ours, giving us the set {3,4,8,7,5}, and these are the tiles ordinally neighboring the tiles labeled 1 and 2. Note that tile 6 is not considered a neighbor because it is to the south-east of tile 2; only tiles that share a whole side are counted as neighbors.

After cleaning up the requisite dictionaries that keep track of these things, we should have the following diagram,

|---|---|---|---|
| 3 | 8 | 8 | 8 |
|---|---|---|---|
| 3 | 1 1 | 7 |
|---|---|---|---|
| 4 | 1 1 | 7 |
|---|---|---|---|
| 4 | 4 | 5 | 6 |
|---|---|---|---|







Tuesday, November 22, 2011

pywt versus PyWT

To be specific, I am talking about the difference between pywt (PyWavelets) and PyWT (Python Web Toolkit).

I thought I had installed the wavelets package, as I had used pywt freely for several years; however, when I executed,
$ pip install pywt
I got the web toolkit package instead, even though it goes by PyWF now. Executing the inverese:
$ pip uninstall pywt
I got the reassuring,
$ pip uninstall pywtUninstalling PyWT:Proceed (y/n)? y  Successfully uninstalled PyWT
But when I went to use the wavelet package, it was still accessing the ununinstalled web toolkit, which does absolutely nothing for wavelets. So... I imported pywt at the terminal again and looked at the path:
>>> pywt.__path__
['/usr/local/lib/python2.6/dist-packages/pywt']
And then I went in like gangbangers and shot up that joint with,
$ sudo rm -r /usr/local/lib/python2.6/dist-packages/pywt/
Then everything worked out okay.

    >>> import pywt
    >>> from pywt import dwt
    >>> dwt([1,2,3,4,5,6,7,8],'db4')
    (array([  7.06453146,  4.23073611,  1.41360717,  2.83605428,
              5.6633906 ,  8.49718595, 11.3143149  ]),
     array([  2.37131306e-02,  4.09620864e-02,  -6.46752170e-02,
             -9.90107302e-13, -2.37131306e-02,  -4.09620864e-02,
              6.46752170e-02]))
    >>> 

Monday, November 7, 2011

Matrix Inversion with HDF5 files

#!/usr/bin/env python

import sys
import h5py
import numpy as np




def doolittle_h5( a_filename, side_size ):

import h5py
import numpy as np

fa = h5py.File( a_filename, 'r' )
a = fa['chunked']

fl = h5py.File( 'l.h5', 'r+' )
l = fl[u'data']

fu = h5py.File( 'u.h5', 'r+' )
u = fu[u'data']

n = side_size

for i in range( n ):
    for j in range( i, n ):
        u[i,j] = a[i,j]
        for k in range( i ):
            u[i,j] = u[i,j] - l[i,k] * u[k,j]
    for j in range( i, n ):
        l[j,i] = a[j,i]
        for k in range( i ):
            l[j,i] = l[j,i] - l[j,k] * u[k,i]
        l[j,i] = l[j,i] / float( u[i,i] )
    print 'd: '+i

fa.close()
fl.close()
fu.close()

return None

 

def solve_L_system_h5( a_filename, side_size, b ):

import h5py
import numpy as np

f = h5py.File( a_filename, 'r' )
a = f[u'data'][ :side_size, :side_size ]

x = np.zeros_like( b ).astype( np.float32 )

for i in range( b.size ):
    x[i] = b[i]
    for j in range( i ):
        x[i] -= a[i,j] * x[j]

f.close()

return x

 

def solve_U_system_h5( a_filename, side_size, b ):

import h5py
import numpy as np

f = h5py.File( a_filename, 'r' )
a = f[u'data'][ :side_size, :side_size ]

x = np.zeros_like( b ).astype( np.float32 )

for i in range( b.size-1,-1,-1 ):
    x[i] = b[i]
    for j in range( i+1, b.size ):
        x[i] -= a[i,j] * x[j]
    x[i] /= float( a[i,i] )

f.close()

return x



def inverse_h5( a_filename, side_size ):

import h5py
import numpy as np

doolittle_h5( a_filename, side_size )

print "Done doing little."

finv = h5py.File( 'ainv.h5', 'w' )
dinv = finv.create_dataset( u'data', ( side_size, side_size ), np.float32, chunks=(side_size,1) )
inv = finv[u'data']

for i in range( side_size ):
    b = np.zeros((side_size,)) ; b[i] = 1.0
    b.astype( np.float32 )
    y = solve_L_system_h5( 'l.h5', side_size, b )
    x = solve_U_system_h5( 'u.h5', side_size, y )
    inv[:,i] = x
    print "inverting: "+str(i/float(side_size))+"% done"

finv.close()

return None


#fa = h5py.File( 'a.h5', 'w' )
#d = fa.create_dataset( 'chunked', ( 3,3 ), np.float32 )
#a = fa['chunked']

#da = [ [ 2,-1,-2 ], [ -4,6,3 ], [ -4,-2,8 ] ]
#for i in range( 3 ):
#    for j in range( 3 ):
#        a[i,j] = da[i][j]

#fa.close()

#inverse_h5('a.h5',3)

# at command line: d0=date; nice -n -20 python doolittle.py xx.h5 250000; d1=date;

inverse_h5( sys.argv[1], int( sys.argv[2] ) )

Wednesday, October 19, 2011

Deleting tons of files at once

Okay, suppose you have tons of files you need to delete.. so many that you can't open up the folder, or use ls to delete them in one fell swoop. Suppose further that your files are labeled like, "x_0_0_1_2.dat". Then you can write a loop in you bash terminal as follows:
$ for i in 0 1 2 ; do rm x_${i}_* ; done
The key thing to note here is the curly braces around the iterating variable. If you leave off the braces, it won't work.
$ for i in 3 4 5 ; do rm x_$i_* ; done
bash: /bin/rm: Argument list too long
bash: /bin/rm: Argument list too long
bash: /bin/rm: Argument list too long


Friday, April 22, 2011

An Online Physics Textbook!

It claims to be a work in progress, but it seems pretty complete to me. You can find it here at http://physics.info/

Monday, April 18, 2011

PyCUDA Documentation

Here is the PyCUDA documentation.

Tuesday, April 12, 2011

Maxima

Maxima is a free computer algebra system developed at MIT. Here is a great tutorial to get you up and running.

Thursday, April 7, 2011

L-System Fractals in Python using PIL

Here is some code from ActiveState detailing some L-System fractal stuff

Pretty neat site on fractals

Go here.

A good source for the Python Imaging Library

I've run across this site multiple times, but I don't think I've ever posted it here. Anyway, it has lots of stuff about the PIL module.

Tuesday, April 5, 2011

Python Sorting

So... sort() sorts an interable in place, whereas sorted() returns a sorted copy. This also works on an iterable of objects! So, here's the run-down.

Wednesday, March 30, 2011

PyOpenCV - Access OpenCV Through Python!

OpenCV is an open source computer vision library written in C/C++. Here is the Python module for it.

Tuesday, March 29, 2011

Create Image with NumPy

This is a good resource. Apparently better than using PIL putpixel(), etc.

Sunday, March 27, 2011

A Nice Introduction to R

Here is a really good R time series resource.

Some RPy Examples

This is a pretty good resource for RPy. It has a lot of economics data. It also uses json, which is something I've never heard of.

This is another great introduction to RPy!

Friday, March 25, 2011

Installing Packages in R, using Ubuntu

R... is the unintuitive statistics "programming language". I'm being overly disdainful; just because you don't like something doesn't mean it isn't useful. It has a fractal dimension package, fdim, which you can read about on CRAN.

I found this useful in getting fdim up and running, even though it has deprecated dependencies and everything.

Wednesday, March 23, 2011

PCA in Python

Here is the stack-overflow page on PCA in Python.

Tuesday, March 22, 2011

Hypertext Help With LaTeX

This is a great general site on LaTeX.

LaTeX Standard Environments

Here is a good page on the LaTeX standard environments.

Thursday, March 17, 2011

Example of using threading using Python

#!/usr/bin/env python

import Queue
import threading
import mahotas
import time
import sys

# this is the list where
# the final data goes
f = list()

# create a Queue
queue = Queue.Queue()

# the code that needs to be executed
# we override the init function in order to pass parameters to the threads
class MyThread( threading.Thread ):
def __init__( self, f, queue ):
self.f = f
self.queue = queue
threading.Thread.__init__( self )
def run( self ):
while True:
task = self.queue.get()
self.f.append( mahotas.features.haralick( task ) )
self.queue.task_done()

# reads the image and converts it to a numpy int ndarray
img = mahotas.imread( sys.argv[1] )

# chip size dimensions
chipx = int( sys.argv[2] )
chipy = int( sys.argv[3] )

# figure out image size
imgx, imgy, imgz = img.shape

# the number of chips in each direction
nbxchips = imgx/chipx
nbychips = imgy/chipy

start = time.time()

# create some threads
for i in range( 3 ):
t = MyThread( f, queue )
t.setDaemon( True )
t.start()

# put chips into the queue
for i in range( nbxchips ):
for j in range( nbychips ):
queue.put( img[i*chipx:(i+1)*chipx,j*chipy:(j+1)*chipy,:] )

# wait for all the work to finish
queue.join()

print 'Elapsed Time: %s' % ( time.time() - start )

print len( f )

Tuesday, March 15, 2011

Haralick Features, and other features, too!

I found this site that describes a Python module that calculates Haralick textures, SURF features, and others.

Notable algorithms:
  • watershed.
  • convex points calculations.
  • hit & miss. thinning.
  • Zernike & Haralick, LBP, and TAS features.
  • freeimage based numpy image loading (requires freeimage libraries to be installed).
  • Speeded-Up Robust Features (SURF), a form of local features.
  • thresholding.
  • convolution.
  • Sobel edge detection
PythonVision is another good site.

Python PCA (ii)

# Code from Chapter 10 of Machine Learning: An Algorithmic Perspective
# by Stephen Marsland (http://seat.massey.ac.nz/personal/s.r.marsland/MLBook.html)

# You are free to use, change, or redistribute the code in any way you wish for
# non-commercial purposes, but please maintain the name of the original author.
# This code comes with no warranty of any kind.

# Stephen Marsland, 2008

# An algorithm to compute PCA. Not as fast as the NumPy implementation
from pylab import *
from numpy import *

def pca(data,nRedDim=0,normalise=1):

# Centre data
m = mean(data,axis=0)
data -= m

# Covariance matrix
C = cov(transpose(data))

# Compute eigenvalues and sort into descending order
evals,evecs = linalg.eig(C)
indices = argsort(evals)
indices = indices[::-1]
evecs = evecs[:,indices]
evals = evals[indices]

if nRedDim>0:
evecs = evecs[:,:nRedDim]

if normalise:
for i in range(shape(evecs)[1]):
evecs[:,i] / linalg.norm(evecs[:,i]) * sqrt(evals[i])

# Produce the new data matrix
x = dot(transpose(evecs),transpose(data))
# Compute the original data again
y=transpose(dot(evecs,x))+m
return x,y,evals,evecs

Python PCA (i)

#!/usr/bin/env python

""" a small class for Principal Component Analysis
Usage:
p = PCA( A, fraction=0.90 )
In:
A: an array of e.g. 1000 observations x 20 variables, 1000 rows x 20 columns
fraction: use principal components that account for e.g.
90 % of the total variance

Out:
p.U, p.d, p.Vt: from numpy.linalg.svd, A = U . d . Vt
p.dinv: 1/d or 0, see NR
p.eigen: the eigenvalues of A*A, in decreasing order (p.d**2).
eigen[j] / eigen.sum() is variable j's fraction of the total variance;
look at the first few eigen[] to see how many PCs get to 90 %, 95 % ...
p.npc: number of principal components,
e.g. 2 if the top 2 eigenvalues are >= `fraction` of the total.
It's ok to change this; methods use the current value.

Methods:
The methods of class PCA transform vectors or arrays of e.g.
20 variables, 2 principal components and 1000 observations,
using partial matrices U' d' Vt', parts of the full U d Vt:
A ~ U' . d' . Vt' where e.g.
U' is 1000 x 2
d' is diag([ d0, d1 ]), the 2 largest singular values
Vt' is 2 x 20. Dropping the primes,

d . Vt 2 principal vars = p.vars_pc( 20 vars )
U 1000 obs = p.pc_obs( 2 principal vars )
U . d . Vt 1000 obs, p.obs( 20 vars ) = pc_obs( vars_pc( vars ))
fast approximate A . vars, using the `npc` principal components

Ut 2 pcs = p.obs_pc( 1000 obs )
V . dinv 20 vars = p.pc_vars( 2 principal vars )
V . dinv . Ut 20 vars, p.vars( 1000 obs ) = pc_vars( obs_pc( obs )),
fast approximate Ainverse . obs: vars that give ~ those obs.


Notes:
PCA does not center or scale A; you usually want to first
A -= A.mean(A, axis=0)
A /= A.std(A, axis=0)
with the little class Center or the like, below.

See also:
http://en.wikipedia.org/wiki/Principal_component_analysis
http://en.wikipedia.org/wiki/Singular_value_decomposition
Press et al., Numerical Recipes (2 or 3 ed), SVD
PCA micro-tutorial
iris-pca .py .png

"""

from __future__ import division
import numpy as np
dot = np.dot
# import bz.numpyutil as nu
# dot = nu.pdot

__version__ = "2010-04-14 apr"
__author_email__ = "denis-bz-py at t-online dot de"

#...............................................................................
class PCA:
def __init__( self, A, fraction=0.90 ):
assert 0 <= fraction <= 1
# A = U . diag(d) . Vt, O( m n^2 ), lapack_lite --
self.U, self.d, self.Vt = np.linalg.svd( A, full_matrices=False )
assert np.all( self.d[:-1] >= self.d[1:] ) # sorted
self.eigen = self.d**2
self.sumvariance = np.cumsum(self.eigen)
self.sumvariance /= self.sumvariance[-1]
self.npc = np.searchsorted( self.sumvariance, fraction ) + 1
self.dinv = np.array([ 1/d if d > self.d[0] * 1e-6 else 0
for d in self.d ])

def pc( self ):
""" e.g. 1000 x 2 U[:, :npc] * d[:npc], to plot etc. """
n = self.npc
return self.U[:, :n] * self.d[:n]

# These 1-line methods may not be worth the bother;
# then use U d Vt directly --

def vars_pc( self, x ):
n = self.npc
return self.d[:n] * dot( self.Vt[:n], x.T ).T # 20 vars -> 2 principal

def pc_vars( self, p ):
n = self.npc
return dot( self.Vt[:n].T, (self.dinv[:n] * p).T ) .T # 2 PC -> 20 vars

def pc_obs( self, p ):
n = self.npc
return dot( self.U[:, :n], p.T ) # 2 principal -> 1000 obs

def obs_pc( self, obs ):
n = self.npc
return dot( self.U[:, :n].T, obs ) .T # 1000 obs -> 2 principal

def obs( self, x ):
return self.pc_obs( self.vars_pc(x) ) # 20 vars -> 2 principal -> 1000 obs

def vars( self, obs ):
return self.pc_vars( self.obs_pc(obs) ) # 1000 obs -> 2 principal -> 20 vars


class Center:
""" A -= A.mean() /= A.std(), inplace -- use A.copy() if need be
uncenter(x) == original A . x
"""
# mttiw
def __init__( self, A, axis=0, scale=True, verbose=1 ):
self.mean = A.mean(axis=axis)
if verbose:
print "Center -= A.mean:", self.mean
A -= self.mean
if scale:
std = A.std(axis=axis)
self.std = np.where( std, std, 1. )
if verbose:
print "Center /= A.std:", self.std
A /= self.std
else:
self.std = np.ones( A.shape[-1] )
self.A = A

def uncenter( self, x ):
return np.dot( self.A, x * self.std ) + np.dot( x, self.mean )


#...............................................................................
if __name__ == "__main__":
import sys

csv = "iris4.csv" # wikipedia Iris_flower_data_set
# 5.1,3.5,1.4,0.2 # ,Iris-setosa ...
N = 1000
K = 20
fraction = .90
seed = 1
exec "\n".join( sys.argv[1:] ) # N= ...
np.random.seed(seed)
np.set_printoptions( 1, threshold=100, suppress=True ) # .1f
try:
A = np.genfromtxt( csv, delimiter="," )
N, K = A.shape
except IOError:
A = np.random.normal( size=(N, K) ) # gen correlated ?

print "csv: %s N: %d K: %d fraction: %.2g" % (csv, N, K, fraction)
Center(A)
print "A:", A

print "PCA ..." ,
p = PCA( A, fraction=fraction )
print "npc:", p.npc
print "% variance:", p.sumvariance * 100

print "Vt[0], weights that give PC 0:", p.Vt[0]
print "A . Vt[0]:", dot( A, p.Vt[0] )
print "pc:", p.pc()

print "\nobs <-> pc <-> x: with fraction=1, diffs should be ~ 0"
x = np.ones(K)
# x = np.ones(( 3, K ))
print "x:", x
pc = p.vars_pc(x) # d' Vt' x
print "vars_pc(x):", pc
print "back to ~ x:", p.pc_vars(pc)

Ax = dot( A, x.T )
pcx = p.obs(x) # U' d' Vt' x
print "Ax:", Ax
print "A'x:", pcx
print "max |Ax - A'x|: %.2g" % np.linalg.norm( Ax - pcx, np.inf )

b = Ax # ~ back to original x, Ainv A x
back = p.vars(b)
print "~ back again:", back
print "max |back - x|: %.2g" % np.linalg.norm( back - x, np.inf )

# end pca.py

Sunday, February 27, 2011

Threading and Queues in Python!

Here is a good site on the Queue module.

Here is a good site on the Threading module.

Here is the IBM site on Threading and Queues.

Sunday, February 13, 2011

Instantaneous Amplitude, Phase and Frequency

Here is a link to a good discussion on this with MATLAB code.

Here is another place with a good discussion on doing a phase shift.
The Hilbert transform is fft ---> phase shift ---> ifft.
Doing fft and ifft are easy in numpy, and a phase shift is just multiplying each term in the signal by exp(j*p), where j is the imaginary unit, and p is the desired phase shift.

Wednesday, February 9, 2011

ROC Curves

Here is a good site on ROC curves.

Sunday, February 6, 2011

Project Results!

This is the raw data
This is the data after subtracting the reference signal
This is the data after each row has been differenced
This density function represents the sum of each column of the previous image
This is the result of using a five point smoothing function on the previous data

Wednesday, January 26, 2011

NASDAQ, NYSE, AMEX Data

Here is a site that provides historical data over 10 years for three exchanges, the NYSE, NASDAQ and AMEX. Go to 'Stock Chart,' and then 'Historical Quotes' or something can be found in the sidebar. Link is here.

LaTeX Crashing Course

This is a good tutorial that walks through things pretty slowly. I have so far only used it to fill in fundamental gaps; I haven't perused it for anything fancy. Here.

This is a good tutorial on tables.

Tuesday, January 25, 2011

Best Bash Tutorials Ever. Period.

I can stop looking for Bash tutorials forever now. This is better than the expensive book I bought on this topic.

http://www.faqs.org/docs/abs/HTML/index.html

Monday, January 17, 2011

Colors

Here is a good color scheme designer.

Thursday, January 6, 2011

Paul McCartney II

Awesome opossum. Some selected tracks can be found, here.

Coming Up, the album's single, can be found, here.

Digital Signal Processing

A free book on digital signal processing (DSP) can be found, here.

Monday, January 3, 2011

Computing Talk

A page on Python software is found here.