This is an efficient function for rescaling n-dimensional arrays.
def rescale( data, newmin, newmax ):
oldmin, oldmax = np.min( data ), np.max( data )
newscale = float( newmax - newmin )
oldscale = float( oldmax - oldmin )
if oldscale == 0: return None
ratio = float( newscale / oldscale )
c = data.copy()
c -= oldmin
c *= ratio
c += newmin
return c
Tuesday, December 20, 2011
Thursday, December 8, 2011
Merging Segments in an Image
I'll walk though some code for merging segments in an image using a two classes, Segment and Segments. The first class is for
class Segment:
def __init__( self, elements, neighbors, class ):
self.elements = [ elements ] # tuple addresses
self.neighbors = neighbors # class labels
self.class = class # personal class
self.quality = None # either textured or smooth
self.mean = None # mean reflectance
self.texture = None # texture vector
<< OTHER CODE >>
def merge( self, a, b ):
# enhance readabilityness
newclass = a.class
oldclass = b.class
# a subsumes b's elements
a.elements += b.elements
# set of class labels
# the class labels neighboring a and the class label of a
mine = set( a.neighbors + [ newclass ] )
# set of class labels
# the class labels neighboring b and the class lebel of b
theirs = set( b.neighbors + [ oldclass ] )
# set of class labels for elements of a (after adding b to a)
ours = set( [ self.element_segment[i] for i in a.elements ] )
# update a's new neighbors
a.neighbors = list( mine.union( theirs ) - ours )
# update class elementship of b's elements
for m in b.elements:
self.element_segment[ m ] = newclass
self.segment_element[ newclass ].append( m )
# remove the oldclass from the dict of classes
del self.segment_element[ oldclass ]
# remove the oldclass from the dict of segments
del self.segments[ oldclass ]
# update old neighbors of the oldclass
for s in self.segments:
if oldclass in self.segments[s].neighbors:
while oldclass in self.segments[s].neighbors:
self.segments[s].neighbors.remove( oldclass )
if newclass not in self.segments[s].neighbors:
self.segments[s].neighbors.append( newclass )
In the merge() function, newclass = a.class and oldclass = b.class take the class labels of the segments to be merged, a and b. In this implementation, a is dominant, and subsumes b's class label.
Then, with a.elements += b.elements, we add the elements of b to the elements of a. These elements are tuple addresses for image chips, or individual pixels.
Imagine the following setup:
|---|---|---|---|
| 3 | 8 | 8 | 8 |
|---|---|---|---|
| 3 | 1 | 2 | 7 |
|---|---|---|---|
| 4 | 1 | 2 | 7 |
|---|---|---|---|
| 4 | 4 | 5 | 6 |
|---|---|---|---|
Here, segment a is denoted by the ones, and segment b is denoted by the twos. The line mine = set( a.neighbors + [ newclass ] ) is just the set {3,4,2,8,1}, the class labels for the n,s,e,w neighbors of segment a, which are {3,4,2,8}, and the class label for a, which is {1}. Similarly for theirs we have the class labels for the neighbors of b, and the class label for b, and this ends up being the set {1,8,7,5,2}. Then ours is the class labels in the new segment, which will be the set {1,2}, since 1 and 2 are the class labels of segment a and segment b.
Then the new neighbors of the new segment will be found by taking the union of mine and theirs, which is {3,8,4,7,5,1,2}, and then tossing ours, giving us the set {3,4,8,7,5}, and these are the tiles ordinally neighboring the tiles labeled 1 and 2. Note that tile 6 is not considered a neighbor because it is to the south-east of tile 2; only tiles that share a whole side are counted as neighbors.
After cleaning up the requisite dictionaries that keep track of these things, we should have the following diagram,
|---|---|---|---|
| 3 | 8 | 8 | 8 |
|---|---|---|---|
| 3 | 1 | 1 | 7 |
|---|---|---|---|
| 4 | 1 | 1 | 7 |
|---|---|---|---|
| 4 | 4 | 5 | 6 |
|---|---|---|---|
Tuesday, November 22, 2011
pywt versus PyWT
To be specific, I am talking about the difference between pywt (PyWavelets) and PyWT (Python Web Toolkit).
I thought I had installed the wavelets package, as I had used pywt freely for several years; however, when I executed,
>>> import pywt
>>> from pywt import dwt
>>> dwt([1,2,3,4,5,6,7,8],'db4')
(array([ 7.06453146, 4.23073611, 1.41360717, 2.83605428,
5.6633906 , 8.49718595, 11.3143149 ]),
array([ 2.37131306e-02, 4.09620864e-02, -6.46752170e-02,
-9.90107302e-13, -2.37131306e-02, -4.09620864e-02,
6.46752170e-02]))
>>>
I thought I had installed the wavelets package, as I had used pywt freely for several years; however, when I executed,
$ pip install pywtI got the web toolkit package instead, even though it goes by PyWF now. Executing the inverese:
$ pip uninstall pywtI got the reassuring,
$ pip uninstall pywtUninstalling PyWT:Proceed (y/n)? y Successfully uninstalled PyWT
But when I went to use the wavelet package, it was still accessing the ununinstalled web toolkit, which does absolutely nothing for wavelets. So... I imported pywt at the terminal again and looked at the path:
>>> pywt.__path__And then I went in like gangbangers and shot up that joint with,
['/usr/local/lib/python2.6/dist-packages/pywt']
$ sudo rm -r /usr/local/lib/python2.6/dist-packages/pywt/Then everything worked out okay.
>>> import pywt
>>> from pywt import dwt
>>> dwt([1,2,3,4,5,6,7,8],'db4')
(array([ 7.06453146, 4.23073611, 1.41360717, 2.83605428,
5.6633906 , 8.49718595, 11.3143149 ]),
array([ 2.37131306e-02, 4.09620864e-02, -6.46752170e-02,
-9.90107302e-13, -2.37131306e-02, -4.09620864e-02,
6.46752170e-02]))
>>>
Monday, November 7, 2011
Matrix Inversion with HDF5 files
#!/usr/bin/env python
import sys
import h5py
import numpy as np
def doolittle_h5( a_filename, side_size ):
import h5py
import numpy as np
fa = h5py.File( a_filename, 'r' )
a = fa['chunked']
fl = h5py.File( 'l.h5', 'r+' )
l = fl[u'data']
fu = h5py.File( 'u.h5', 'r+' )
u = fu[u'data']
n = side_size
for i in range( n ):
for j in range( i, n ):
u[i,j] = a[i,j]
for k in range( i ):
u[i,j] = u[i,j] - l[i,k] * u[k,j]
for j in range( i, n ):
l[j,i] = a[j,i]
for k in range( i ):
l[j,i] = l[j,i] - l[j,k] * u[k,i]
l[j,i] = l[j,i] / float( u[i,i] )
print 'd: '+i
fa.close()
fl.close()
fu.close()
return None
def solve_L_system_h5( a_filename, side_size, b ):
import h5py
import numpy as np
f = h5py.File( a_filename, 'r' )
a = f[u'data'][ :side_size, :side_size ]
x = np.zeros_like( b ).astype( np.float32 )
for i in range( b.size ):
x[i] = b[i]
for j in range( i ):
x[i] -= a[i,j] * x[j]
f.close()
return x
def solve_U_system_h5( a_filename, side_size, b ):
import h5py
import numpy as np
f = h5py.File( a_filename, 'r' )
a = f[u'data'][ :side_size, :side_size ]
x = np.zeros_like( b ).astype( np.float32 )
for i in range( b.size-1,-1,-1 ):
x[i] = b[i]
for j in range( i+1, b.size ):
x[i] -= a[i,j] * x[j]
x[i] /= float( a[i,i] )
f.close()
return x
def inverse_h5( a_filename, side_size ):
import h5py
import numpy as np
doolittle_h5( a_filename, side_size )
print "Done doing little."
finv = h5py.File( 'ainv.h5', 'w' )
dinv = finv.create_dataset( u'data', ( side_size, side_size ), np.float32, chunks=(side_size,1) )
inv = finv[u'data']
for i in range( side_size ):
b = np.zeros((side_size,)) ; b[i] = 1.0
b.astype( np.float32 )
y = solve_L_system_h5( 'l.h5', side_size, b )
x = solve_U_system_h5( 'u.h5', side_size, y )
inv[:,i] = x
print "inverting: "+str(i/float(side_size))+"% done"
finv.close()
return None
#fa = h5py.File( 'a.h5', 'w' )
#d = fa.create_dataset( 'chunked', ( 3,3 ), np.float32 )
#a = fa['chunked']
#da = [ [ 2,-1,-2 ], [ -4,6,3 ], [ -4,-2,8 ] ]
#for i in range( 3 ):
# for j in range( 3 ):
# a[i,j] = da[i][j]
#fa.close()
#inverse_h5('a.h5',3)
# at command line: d0=date; nice -n -20 python doolittle.py xx.h5 250000; d1=date;
inverse_h5( sys.argv[1], int( sys.argv[2] ) )
import sys
import h5py
import numpy as np
def doolittle_h5( a_filename, side_size ):
import h5py
import numpy as np
fa = h5py.File( a_filename, 'r' )
a = fa['chunked']
fl = h5py.File( 'l.h5', 'r+' )
l = fl[u'data']
fu = h5py.File( 'u.h5', 'r+' )
u = fu[u'data']
n = side_size
for i in range( n ):
for j in range( i, n ):
u[i,j] = a[i,j]
for k in range( i ):
u[i,j] = u[i,j] - l[i,k] * u[k,j]
for j in range( i, n ):
l[j,i] = a[j,i]
for k in range( i ):
l[j,i] = l[j,i] - l[j,k] * u[k,i]
l[j,i] = l[j,i] / float( u[i,i] )
print 'd: '+i
fa.close()
fl.close()
fu.close()
return None
def solve_L_system_h5( a_filename, side_size, b ):
import h5py
import numpy as np
f = h5py.File( a_filename, 'r' )
a = f[u'data'][ :side_size, :side_size ]
x = np.zeros_like( b ).astype( np.float32 )
for i in range( b.size ):
x[i] = b[i]
for j in range( i ):
x[i] -= a[i,j] * x[j]
f.close()
return x
def solve_U_system_h5( a_filename, side_size, b ):
import h5py
import numpy as np
f = h5py.File( a_filename, 'r' )
a = f[u'data'][ :side_size, :side_size ]
x = np.zeros_like( b ).astype( np.float32 )
for i in range( b.size-1,-1,-1 ):
x[i] = b[i]
for j in range( i+1, b.size ):
x[i] -= a[i,j] * x[j]
x[i] /= float( a[i,i] )
f.close()
return x
def inverse_h5( a_filename, side_size ):
import h5py
import numpy as np
doolittle_h5( a_filename, side_size )
print "Done doing little."
finv = h5py.File( 'ainv.h5', 'w' )
dinv = finv.create_dataset( u'data', ( side_size, side_size ), np.float32, chunks=(side_size,1) )
inv = finv[u'data']
for i in range( side_size ):
b = np.zeros((side_size,)) ; b[i] = 1.0
b.astype( np.float32 )
y = solve_L_system_h5( 'l.h5', side_size, b )
x = solve_U_system_h5( 'u.h5', side_size, y )
inv[:,i] = x
print "inverting: "+str(i/float(side_size))+"% done"
finv.close()
return None
#fa = h5py.File( 'a.h5', 'w' )
#d = fa.create_dataset( 'chunked', ( 3,3 ), np.float32 )
#a = fa['chunked']
#da = [ [ 2,-1,-2 ], [ -4,6,3 ], [ -4,-2,8 ] ]
#for i in range( 3 ):
# for j in range( 3 ):
# a[i,j] = da[i][j]
#fa.close()
#inverse_h5('a.h5',3)
# at command line: d0=date; nice -n -20 python doolittle.py xx.h5 250000; d1=date;
inverse_h5( sys.argv[1], int( sys.argv[2] ) )
Wednesday, October 19, 2011
Deleting tons of files at once
Okay, suppose you have tons of files you need to delete.. so many that you can't open up the folder, or use ls to delete them in one fell swoop. Suppose further that your files are labeled like, "x_0_0_1_2.dat". Then you can write a loop in you bash terminal as follows:
$ for i in 0 1 2 ; do rm x_${i}_* ; doneThe key thing to note here is the curly braces around the iterating variable. If you leave off the braces, it won't work.
$ for i in 3 4 5 ; do rm x_$i_* ; done
bash: /bin/rm: Argument list too long
bash: /bin/rm: Argument list too long
bash: /bin/rm: Argument list too long
Friday, April 22, 2011
An Online Physics Textbook!
It claims to be a work in progress, but it seems pretty complete to me. You can find it here at http://physics.info/
Monday, April 18, 2011
Tuesday, April 12, 2011
Thursday, April 7, 2011
L-System Fractals in Python using PIL
Here is some code from ActiveState detailing some L-System fractal stuff
A good source for the Python Imaging Library
I've run across this site multiple times, but I don't think I've ever posted it here. Anyway, it has lots of stuff about the PIL module.
Tuesday, April 5, 2011
Python Sorting
So... sort() sorts an interable in place, whereas sorted() returns a sorted copy. This also works on an iterable of objects! So, here's the run-down.
Wednesday, March 30, 2011
PyOpenCV - Access OpenCV Through Python!
OpenCV is an open source computer vision library written in C/C++. Here is the Python module for it.
Tuesday, March 29, 2011
Sunday, March 27, 2011
Some RPy Examples
Friday, March 25, 2011
Installing Packages in R, using Ubuntu
R... is the unintuitive statistics "programming language". I'm being overly disdainful; just because you don't like something doesn't mean it isn't useful. It has a fractal dimension package, fdim, which you can read about on CRAN.
I found this useful in getting fdim up and running, even though it has deprecated dependencies and everything.
I found this useful in getting fdim up and running, even though it has deprecated dependencies and everything.
Wednesday, March 23, 2011
Tuesday, March 22, 2011
Thursday, March 17, 2011
Example of using threading using Python
#!/usr/bin/env python
import Queue
import threading
import mahotas
import time
import sys
# this is the list where
# the final data goes
f = list()
# create a Queue
queue = Queue.Queue()
# the code that needs to be executed
# we override the init function in order to pass parameters to the threads
class MyThread( threading.Thread ):
def __init__( self, f, queue ):
self.f = f
self.queue = queue
threading.Thread.__init__( self )
def run( self ):
while True:
task = self.queue.get()
self.f.append( mahotas.features.haralick( task ) )
self.queue.task_done()
# reads the image and converts it to a numpy int ndarray
img = mahotas.imread( sys.argv[1] )
# chip size dimensions
chipx = int( sys.argv[2] )
chipy = int( sys.argv[3] )
# figure out image size
imgx, imgy, imgz = img.shape
# the number of chips in each direction
nbxchips = imgx/chipx
nbychips = imgy/chipy
start = time.time()
# create some threads
for i in range( 3 ):
t = MyThread( f, queue )
t.setDaemon( True )
t.start()
# put chips into the queue
for i in range( nbxchips ):
for j in range( nbychips ):
queue.put( img[i*chipx:(i+1)*chipx,j*chipy:(j+1)*chipy,:] )
# wait for all the work to finish
queue.join()
print 'Elapsed Time: %s' % ( time.time() - start )
print len( f )
import Queue
import threading
import mahotas
import time
import sys
# this is the list where
# the final data goes
f = list()
# create a Queue
queue = Queue.Queue()
# the code that needs to be executed
# we override the init function in order to pass parameters to the threads
class MyThread( threading.Thread ):
def __init__( self, f, queue ):
self.f = f
self.queue = queue
threading.Thread.__init__( self )
def run( self ):
while True:
task = self.queue.get()
self.f.append( mahotas.features.haralick( task ) )
self.queue.task_done()
# reads the image and converts it to a numpy int ndarray
img = mahotas.imread( sys.argv[1] )
# chip size dimensions
chipx = int( sys.argv[2] )
chipy = int( sys.argv[3] )
# figure out image size
imgx, imgy, imgz = img.shape
# the number of chips in each direction
nbxchips = imgx/chipx
nbychips = imgy/chipy
start = time.time()
# create some threads
for i in range( 3 ):
t = MyThread( f, queue )
t.setDaemon( True )
t.start()
# put chips into the queue
for i in range( nbxchips ):
for j in range( nbychips ):
queue.put( img[i*chipx:(i+1)*chipx,j*chipy:(j+1)*chipy,:] )
# wait for all the work to finish
queue.join()
print 'Elapsed Time: %s' % ( time.time() - start )
print len( f )
Tuesday, March 15, 2011
Haralick Features, and other features, too!
I found this site that describes a Python module that calculates Haralick textures, SURF features, and others.
Notable algorithms:
Notable algorithms:
- watershed.
- convex points calculations.
- hit & miss. thinning.
- Zernike & Haralick, LBP, and TAS features.
- freeimage based numpy image loading (requires freeimage libraries to be installed).
- Speeded-Up Robust Features (SURF), a form of local features.
- thresholding.
- convolution.
- Sobel edge detection
Python PCA (ii)
# Code from Chapter 10 of Machine Learning: An Algorithmic Perspective
# by Stephen Marsland (http://seat.massey.ac.nz/personal/s.r.marsland/MLBook.html)
# You are free to use, change, or redistribute the code in any way you wish for
# non-commercial purposes, but please maintain the name of the original author.
# This code comes with no warranty of any kind.
# Stephen Marsland, 2008
# An algorithm to compute PCA. Not as fast as the NumPy implementation
from pylab import *
from numpy import *
def pca(data,nRedDim=0,normalise=1):
# Centre data
m = mean(data,axis=0)
data -= m
# Covariance matrix
C = cov(transpose(data))
# Compute eigenvalues and sort into descending order
evals,evecs = linalg.eig(C)
indices = argsort(evals)
indices = indices[::-1]
evecs = evecs[:,indices]
evals = evals[indices]
if nRedDim>0:
evecs = evecs[:,:nRedDim]
if normalise:
for i in range(shape(evecs)[1]):
evecs[:,i] / linalg.norm(evecs[:,i]) * sqrt(evals[i])
# Produce the new data matrix
x = dot(transpose(evecs),transpose(data))
# Compute the original data again
y=transpose(dot(evecs,x))+m
return x,y,evals,evecs
# by Stephen Marsland (http://seat.massey.ac.nz/personal/s.r.marsland/MLBook.html)
# You are free to use, change, or redistribute the code in any way you wish for
# non-commercial purposes, but please maintain the name of the original author.
# This code comes with no warranty of any kind.
# Stephen Marsland, 2008
# An algorithm to compute PCA. Not as fast as the NumPy implementation
from pylab import *
from numpy import *
def pca(data,nRedDim=0,normalise=1):
# Centre data
m = mean(data,axis=0)
data -= m
# Covariance matrix
C = cov(transpose(data))
# Compute eigenvalues and sort into descending order
evals,evecs = linalg.eig(C)
indices = argsort(evals)
indices = indices[::-1]
evecs = evecs[:,indices]
evals = evals[indices]
if nRedDim>0:
evecs = evecs[:,:nRedDim]
if normalise:
for i in range(shape(evecs)[1]):
evecs[:,i] / linalg.norm(evecs[:,i]) * sqrt(evals[i])
# Produce the new data matrix
x = dot(transpose(evecs),transpose(data))
# Compute the original data again
y=transpose(dot(evecs,x))+m
return x,y,evals,evecs
Python PCA (i)
#!/usr/bin/env python
""" a small class for Principal Component Analysis
Usage:
p = PCA( A, fraction=0.90 )
In:
A: an array of e.g. 1000 observations x 20 variables, 1000 rows x 20 columns
fraction: use principal components that account for e.g.
90 % of the total variance
Out:
p.U, p.d, p.Vt: from numpy.linalg.svd, A = U . d . Vt
p.dinv: 1/d or 0, see NR
p.eigen: the eigenvalues of A*A, in decreasing order (p.d**2).
eigen[j] / eigen.sum() is variable j's fraction of the total variance;
look at the first few eigen[] to see how many PCs get to 90 %, 95 % ...
p.npc: number of principal components,
e.g. 2 if the top 2 eigenvalues are >= `fraction` of the total.
It's ok to change this; methods use the current value.
Methods:
The methods of class PCA transform vectors or arrays of e.g.
20 variables, 2 principal components and 1000 observations,
using partial matrices U' d' Vt', parts of the full U d Vt:
A ~ U' . d' . Vt' where e.g.
U' is 1000 x 2
d' is diag([ d0, d1 ]), the 2 largest singular values
Vt' is 2 x 20. Dropping the primes,
d . Vt 2 principal vars = p.vars_pc( 20 vars )
U 1000 obs = p.pc_obs( 2 principal vars )
U . d . Vt 1000 obs, p.obs( 20 vars ) = pc_obs( vars_pc( vars ))
fast approximate A . vars, using the `npc` principal components
Ut 2 pcs = p.obs_pc( 1000 obs )
V . dinv 20 vars = p.pc_vars( 2 principal vars )
V . dinv . Ut 20 vars, p.vars( 1000 obs ) = pc_vars( obs_pc( obs )),
fast approximate Ainverse . obs: vars that give ~ those obs.
Notes:
PCA does not center or scale A; you usually want to first
A -= A.mean(A, axis=0)
A /= A.std(A, axis=0)
with the little class Center or the like, below.
See also:
http://en.wikipedia.org/wiki/Principal_component_analysis
http://en.wikipedia.org/wiki/Singular_value_decomposition
Press et al., Numerical Recipes (2 or 3 ed), SVD
PCA micro-tutorial
iris-pca .py .png
"""
from __future__ import division
import numpy as np
dot = np.dot
# import bz.numpyutil as nu
# dot = nu.pdot
__version__ = "2010-04-14 apr"
__author_email__ = "denis-bz-py at t-online dot de"
#...............................................................................
class PCA:
def __init__( self, A, fraction=0.90 ):
assert 0 <= fraction <= 1
# A = U . diag(d) . Vt, O( m n^2 ), lapack_lite --
self.U, self.d, self.Vt = np.linalg.svd( A, full_matrices=False )
assert np.all( self.d[:-1] >= self.d[1:] ) # sorted
self.eigen = self.d**2
self.sumvariance = np.cumsum(self.eigen)
self.sumvariance /= self.sumvariance[-1]
self.npc = np.searchsorted( self.sumvariance, fraction ) + 1
self.dinv = np.array([ 1/d if d > self.d[0] * 1e-6 else 0
for d in self.d ])
def pc( self ):
""" e.g. 1000 x 2 U[:, :npc] * d[:npc], to plot etc. """
n = self.npc
return self.U[:, :n] * self.d[:n]
# These 1-line methods may not be worth the bother;
# then use U d Vt directly --
def vars_pc( self, x ):
n = self.npc
return self.d[:n] * dot( self.Vt[:n], x.T ).T # 20 vars -> 2 principal
def pc_vars( self, p ):
n = self.npc
return dot( self.Vt[:n].T, (self.dinv[:n] * p).T ) .T # 2 PC -> 20 vars
def pc_obs( self, p ):
n = self.npc
return dot( self.U[:, :n], p.T ) # 2 principal -> 1000 obs
def obs_pc( self, obs ):
n = self.npc
return dot( self.U[:, :n].T, obs ) .T # 1000 obs -> 2 principal
def obs( self, x ):
return self.pc_obs( self.vars_pc(x) ) # 20 vars -> 2 principal -> 1000 obs
def vars( self, obs ):
return self.pc_vars( self.obs_pc(obs) ) # 1000 obs -> 2 principal -> 20 vars
class Center:
""" A -= A.mean() /= A.std(), inplace -- use A.copy() if need be
uncenter(x) == original A . x
"""
# mttiw
def __init__( self, A, axis=0, scale=True, verbose=1 ):
self.mean = A.mean(axis=axis)
if verbose:
print "Center -= A.mean:", self.mean
A -= self.mean
if scale:
std = A.std(axis=axis)
self.std = np.where( std, std, 1. )
if verbose:
print "Center /= A.std:", self.std
A /= self.std
else:
self.std = np.ones( A.shape[-1] )
self.A = A
def uncenter( self, x ):
return np.dot( self.A, x * self.std ) + np.dot( x, self.mean )
#...............................................................................
if __name__ == "__main__":
import sys
csv = "iris4.csv" # wikipedia Iris_flower_data_set
# 5.1,3.5,1.4,0.2 # ,Iris-setosa ...
N = 1000
K = 20
fraction = .90
seed = 1
exec "\n".join( sys.argv[1:] ) # N= ...
np.random.seed(seed)
np.set_printoptions( 1, threshold=100, suppress=True ) # .1f
try:
A = np.genfromtxt( csv, delimiter="," )
N, K = A.shape
except IOError:
A = np.random.normal( size=(N, K) ) # gen correlated ?
print "csv: %s N: %d K: %d fraction: %.2g" % (csv, N, K, fraction)
Center(A)
print "A:", A
print "PCA ..." ,
p = PCA( A, fraction=fraction )
print "npc:", p.npc
print "% variance:", p.sumvariance * 100
print "Vt[0], weights that give PC 0:", p.Vt[0]
print "A . Vt[0]:", dot( A, p.Vt[0] )
print "pc:", p.pc()
print "\nobs <-> pc <-> x: with fraction=1, diffs should be ~ 0"
x = np.ones(K)
# x = np.ones(( 3, K ))
print "x:", x
pc = p.vars_pc(x) # d' Vt' x
print "vars_pc(x):", pc
print "back to ~ x:", p.pc_vars(pc)
Ax = dot( A, x.T )
pcx = p.obs(x) # U' d' Vt' x
print "Ax:", Ax
print "A'x:", pcx
print "max |Ax - A'x|: %.2g" % np.linalg.norm( Ax - pcx, np.inf )
b = Ax # ~ back to original x, Ainv A x
back = p.vars(b)
print "~ back again:", back
print "max |back - x|: %.2g" % np.linalg.norm( back - x, np.inf )
# end pca.py
""" a small class for Principal Component Analysis
Usage:
p = PCA( A, fraction=0.90 )
In:
A: an array of e.g. 1000 observations x 20 variables, 1000 rows x 20 columns
fraction: use principal components that account for e.g.
90 % of the total variance
Out:
p.U, p.d, p.Vt: from numpy.linalg.svd, A = U . d . Vt
p.dinv: 1/d or 0, see NR
p.eigen: the eigenvalues of A*A, in decreasing order (p.d**2).
eigen[j] / eigen.sum() is variable j's fraction of the total variance;
look at the first few eigen[] to see how many PCs get to 90 %, 95 % ...
p.npc: number of principal components,
e.g. 2 if the top 2 eigenvalues are >= `fraction` of the total.
It's ok to change this; methods use the current value.
Methods:
The methods of class PCA transform vectors or arrays of e.g.
20 variables, 2 principal components and 1000 observations,
using partial matrices U' d' Vt', parts of the full U d Vt:
A ~ U' . d' . Vt' where e.g.
U' is 1000 x 2
d' is diag([ d0, d1 ]), the 2 largest singular values
Vt' is 2 x 20. Dropping the primes,
d . Vt 2 principal vars = p.vars_pc( 20 vars )
U 1000 obs = p.pc_obs( 2 principal vars )
U . d . Vt 1000 obs, p.obs( 20 vars ) = pc_obs( vars_pc( vars ))
fast approximate A . vars, using the `npc` principal components
Ut 2 pcs = p.obs_pc( 1000 obs )
V . dinv 20 vars = p.pc_vars( 2 principal vars )
V . dinv . Ut 20 vars, p.vars( 1000 obs ) = pc_vars( obs_pc( obs )),
fast approximate Ainverse . obs: vars that give ~ those obs.
Notes:
PCA does not center or scale A; you usually want to first
A -= A.mean(A, axis=0)
A /= A.std(A, axis=0)
with the little class Center or the like, below.
See also:
http://en.wikipedia.org/wiki/Principal_component_analysis
http://en.wikipedia.org/wiki/Singular_value_decomposition
Press et al., Numerical Recipes (2 or 3 ed), SVD
PCA micro-tutorial
iris-pca .py .png
"""
from __future__ import division
import numpy as np
dot = np.dot
# import bz.numpyutil as nu
# dot = nu.pdot
__version__ = "2010-04-14 apr"
__author_email__ = "denis-bz-py at t-online dot de"
#...............................................................................
class PCA:
def __init__( self, A, fraction=0.90 ):
assert 0 <= fraction <= 1
# A = U . diag(d) . Vt, O( m n^2 ), lapack_lite --
self.U, self.d, self.Vt = np.linalg.svd( A, full_matrices=False )
assert np.all( self.d[:-1] >= self.d[1:] ) # sorted
self.eigen = self.d**2
self.sumvariance = np.cumsum(self.eigen)
self.sumvariance /= self.sumvariance[-1]
self.npc = np.searchsorted( self.sumvariance, fraction ) + 1
self.dinv = np.array([ 1/d if d > self.d[0] * 1e-6 else 0
for d in self.d ])
def pc( self ):
""" e.g. 1000 x 2 U[:, :npc] * d[:npc], to plot etc. """
n = self.npc
return self.U[:, :n] * self.d[:n]
# These 1-line methods may not be worth the bother;
# then use U d Vt directly --
def vars_pc( self, x ):
n = self.npc
return self.d[:n] * dot( self.Vt[:n], x.T ).T # 20 vars -> 2 principal
def pc_vars( self, p ):
n = self.npc
return dot( self.Vt[:n].T, (self.dinv[:n] * p).T ) .T # 2 PC -> 20 vars
def pc_obs( self, p ):
n = self.npc
return dot( self.U[:, :n], p.T ) # 2 principal -> 1000 obs
def obs_pc( self, obs ):
n = self.npc
return dot( self.U[:, :n].T, obs ) .T # 1000 obs -> 2 principal
def obs( self, x ):
return self.pc_obs( self.vars_pc(x) ) # 20 vars -> 2 principal -> 1000 obs
def vars( self, obs ):
return self.pc_vars( self.obs_pc(obs) ) # 1000 obs -> 2 principal -> 20 vars
class Center:
""" A -= A.mean() /= A.std(), inplace -- use A.copy() if need be
uncenter(x) == original A . x
"""
# mttiw
def __init__( self, A, axis=0, scale=True, verbose=1 ):
self.mean = A.mean(axis=axis)
if verbose:
print "Center -= A.mean:", self.mean
A -= self.mean
if scale:
std = A.std(axis=axis)
self.std = np.where( std, std, 1. )
if verbose:
print "Center /= A.std:", self.std
A /= self.std
else:
self.std = np.ones( A.shape[-1] )
self.A = A
def uncenter( self, x ):
return np.dot( self.A, x * self.std ) + np.dot( x, self.mean )
#...............................................................................
if __name__ == "__main__":
import sys
csv = "iris4.csv" # wikipedia Iris_flower_data_set
# 5.1,3.5,1.4,0.2 # ,Iris-setosa ...
N = 1000
K = 20
fraction = .90
seed = 1
exec "\n".join( sys.argv[1:] ) # N= ...
np.random.seed(seed)
np.set_printoptions( 1, threshold=100, suppress=True ) # .1f
try:
A = np.genfromtxt( csv, delimiter="," )
N, K = A.shape
except IOError:
A = np.random.normal( size=(N, K) ) # gen correlated ?
print "csv: %s N: %d K: %d fraction: %.2g" % (csv, N, K, fraction)
Center(A)
print "A:", A
print "PCA ..." ,
p = PCA( A, fraction=fraction )
print "npc:", p.npc
print "% variance:", p.sumvariance * 100
print "Vt[0], weights that give PC 0:", p.Vt[0]
print "A . Vt[0]:", dot( A, p.Vt[0] )
print "pc:", p.pc()
print "\nobs <-> pc <-> x: with fraction=1, diffs should be ~ 0"
x = np.ones(K)
# x = np.ones(( 3, K ))
print "x:", x
pc = p.vars_pc(x) # d' Vt' x
print "vars_pc(x):", pc
print "back to ~ x:", p.pc_vars(pc)
Ax = dot( A, x.T )
pcx = p.obs(x) # U' d' Vt' x
print "Ax:", Ax
print "A'x:", pcx
print "max |Ax - A'x|: %.2g" % np.linalg.norm( Ax - pcx, np.inf )
b = Ax # ~ back to original x, Ainv A x
back = p.vars(b)
print "~ back again:", back
print "max |back - x|: %.2g" % np.linalg.norm( back - x, np.inf )
# end pca.py
Sunday, February 27, 2011
Threading and Queues in Python!
Sunday, February 13, 2011
Instantaneous Amplitude, Phase and Frequency
Here is a link to a good discussion on this with MATLAB code.
Here is another place with a good discussion on doing a phase shift.
The Hilbert transform is fft ---> phase shift ---> ifft.
Doing fft and ifft are easy in numpy, and a phase shift is just multiplying each term in the signal by exp(j*p), where j is the imaginary unit, and p is the desired phase shift.
Here is another place with a good discussion on doing a phase shift.
The Hilbert transform is fft ---> phase shift ---> ifft.
Doing fft and ifft are easy in numpy, and a phase shift is just multiplying each term in the signal by exp(j*p), where j is the imaginary unit, and p is the desired phase shift.
Wednesday, February 9, 2011
Sunday, February 6, 2011
Project Results!
Wednesday, January 26, 2011
NASDAQ, NYSE, AMEX Data
Here is a site that provides historical data over 10 years for three exchanges, the NYSE, NASDAQ and AMEX. Go to 'Stock Chart,' and then 'Historical Quotes' or something can be found in the sidebar. Link is here.
LaTeX Crashing Course
Tuesday, January 25, 2011
Best Bash Tutorials Ever. Period.
I can stop looking for Bash tutorials forever now. This is better than the expensive book I bought on this topic.
http://www.faqs.org/docs/abs/HTML/index.html
http://www.faqs.org/docs/abs/HTML/index.html
Monday, January 17, 2011
Thursday, January 6, 2011
Paul McCartney II
Monday, January 3, 2011
Subscribe to:
Comments (Atom)




