Thursday, November 14, 2013

Fernando Perez: An ambitious experiment in Data Science takes off:...

Fernando Perez: An ambitious experiment in Data Science takes off:...: Today, during a White House OSTP event combining government, academia and industry, the Gordon and Betty Moore Foundation and the Alfred P...

Wednesday, September 4, 2013

Those funny offset numbers in matplotlib

If you are a heavy matplotlib user, you are bound to have seen the funny offset numbers in the top left of the plot window:

They are obviously there to help the viewer focus on the level where the numbers are really changing, removing the area where there's no change happening.

But I am claiming that due to pattern recognition, there are quite a few cases where this confuses more than it helps. In this example I (and the people in my team) are used to see 5-digit numbers and it takes quite some time to figure out here, that these are indeed 5-digit numbers.

Therefore I researched how to switch this behavior off.


First, one imports the ScalarFormatter class from the matplotlib.ticker module:

from matplotlib.ticker import ScalarFormatter
Then, one creates a formatter object with the use of offset numbers switched off:

y_formatter = ScalarFormatter(useOffset=False)

Finally, you apply it to an axis object that you either receive via the fig.subplot() command, via plt.gca() (acronym for Get Current Axis) or you catch it when it is being returned after a plot command:

ax.yaxis.set_major_formatter(y_formatter)
There you go, hope this helps someone.

Here is the stackoverflow issue that helped me to find the solution.

Update (2013-10-20) :

An easier way is to catch the axis object from the plot command and apply the following command:
ax.ticklabel_format(useOffset=False)
I initiated a github issue to have this included in matplotlib, which has been responded already with a solution, so this will be configurable in the future, yay!

Update 2, same day:
Weird, I thought I had the above shortcut working at some time, now it doesn't. If anyone knows the circumstance under this can work and can not, please comment.

Wednesday, June 5, 2013

polyfit

A follow-up to the previous post.

Polynomial fitting is also very easy with the numpy packages polyfit and poly1d.


In [196]: x = range(100)
In [197]: y = randn(100)
In [198]: plot(x,y)
Out[198]: []
Here I am asking polyfit to fit me a 2nd degree polynomial.
In [199]: polyfit(x,y,2)
Out[199]: array([-0.00018313,  0.01669275, -0.09621319])
The polyfit function returns the polynomial coefficients in a list.
If I want to use them directly as a fit function, just embed them in a new polynomial object:
In [200]: fitfunc = poly1d(polyfit(x,y,2))
In [201]: plot(x,fitfunc(x))
Out[201]: []
Saving the plot like this
In [202]: savefig('/Users/maye/Desktop/blog_polyfit.png')
and looks like this:


Friday, May 31, 2013

Polynomials with Python

Seriously, can it be any easier? ;)

If you are not in a pylab session, import the module like this:
In [148]: from numpy import poly1d
 Otherwise, just "import poly1d" should work.
Now let's get a polynomial for the coefficients of [3,2,1] (always in decreasing order!):
In [149]: p = poly1d([3,2,1])
Printing it provides a semi-analytical printout:
In [150]: print p
   2
3 x + 2 x + 1
Applying new x values to it is easy, because the poly1d object is a function:
In [152]: newx = linspace(0,10,10)
In [153]: p(newx)
Out[153]:
array([   1.        ,    6.92592593,   20.25925926,   41.        ,
         69.14814815,  104.7037037 ,  147.66666667,  198.03703704,
        255.81481481,  321.        ])
Lots of other things are possible with this object. IPython's object inspection makes it easy to discover them:
In [154]: p.
p.coeffs    p.deriv     p.integ     p.order     p.variable

In [155]: p.deriv()
Out[155]: poly1d([6, 2])
In [156]: pderiv = p.deriv()
In [157]: print pderiv
6 x + 2
Roots for this polynomial can be either determined by the roots function that is imported in a pylab session (or importable like from numpy import roots)
In [158]: roots(p)
Out[158]: array([-0.33333333+0.47140452j, -0.33333333-0.47140452j])
In [159]: p.r
Out[159]: array([-0.33333333+0.47140452j, -0.33333333-0.47140452j])




PS: One of these days I really have to find out how to do code high-lighting in Blogger, or, preferably, go all the way and do IPython notebook posts.

Friday, February 25, 2011

A scientific Python starter

As requested by some, here is a list of important websites, doc-sites and modules for a successful start in Python.

1. Getting Python
If there's one problem with Python, it is how some of the available modules depend on other modules to be the right version. Therefore I recommend wholeheartedly to install big packages that include all the modules you need as one big chunk.
I personally made good experiences with http://www.enthought.com., they provide academic licenses for free and also offer 64-bit version free on personal email request (they did for me at least).
There are other packages like PythonXY, but I think Enthought is the only one, that creates a package for Mac, Windows AND Linux.

2. First steps
The tutorial on http://docs.python.org is really excellent! When I worked through it years ago, I continued the next day with Python GUI programming tutorials and was coding my own graphical user interfaces with Python within 1 day! The clarity of Python enables this, I believe. Haven't learned a language before that is that easy to learn (and I used BASIC, C, C++, FORTRAN77, Java and IDL).

Read the tutorial until inclusive section 5 (Data Structures, up to here is a MUST!) and then you can go on for now to the scientific tutorials, but if you feel puzzled sometime later you really should continue this tutorial at least until section 9 inclusive to see what classes are all about and, important, how to read files in section 7).

3. A warning for IDL switchers:
Before we go on to Sciency stuff in Python, a warning:

If you do this in IDL:
a = [1,2,3]
b = a

then you have a new array b, copied from a, and you can do what you want with it without influence on a.
This has advantages for ease of use, but makes your IDL very fast very slow, because you carry your data around multiple times after a while.
As Python is a real programming language, it tries to be memory efficient and it does so, by avoiding copies if not explicitly asked for.
So in Python when you do:

a = [1,2,3]
b = a

then you don't get a copy but a link to the a-array. So if you do b[0]=4, 'a' has changed as well! (Try it out!)
So what do you do, if you really want a copy without changing the original?
Well, you ask for a copy:
b = a.copy()

4. Numpy and Scipy
Now on to the most import Python modules for scientists called 'numpy' and 'scipy'.
Scipy uses Numpy so they are closely linked.

The documentation for both can be found here:
http://docs.scipy.org/doc/ or just start some browsing around at http://www.scipy.org it's very interesting.
I recommend to start with the last linked document, the Scipy Reference guide on the docs page.
Why? Because it has a tutorial for Numpy, Scipy and also introduces you to the plotting module 'matplotlib' at the same time! So really worth reading.

Before I leave you alone in your python adventures (i put all links together again at the end), one more import comment for beginners confusion:
An import difference between numpy arrays (which look a lot like lists) and the original Python lists.
Python lists can take anything, so a list like this is possible:

myList = ['aString', 3.1415, (atuple, atuple)]

but the Python compiler needs some time to be able to deal with all this different things, so if one wants efficient arrays that only deal with the same type of elements at a time, then one needs numpy arrays.
So one important difference is the type of elements (many in Python lists, only one for numpy arrays), the other is how they react on mathematical operations.

For Python lists it can be quite handy, that it is possible to do:
[3,4]*3
to get
[3,4,3,4,3,4]

For writing text files in certain formats this is quite useful.
But for scientific calculations that doesn't make any sense of course, that's why numpy arrays do exactly what you expect for this:

import numpy as np
a = np.array([3,4])
print a*3

and you get
array([9, 12])

What you saw here as well, that the np.array function can transform normal Python lists to numpy arrays without problems (that also works the other way around in case you need it).

Ok, have fun in Python and don't shy to ask questions, the Python community is very helpful, I think because everybody is so happy about it. ;)

Here are the promised links of all important Python websites for scientists:

http://www.enthought.com
(to get a full scientific Python environment, that runs exactly the same on Win, Mac and Linux)
http://docs.python.org
(Overview of only the core Python stuff itself, home of the Tutorial !)
http://www.scipy.org
(the home of Scientific Python, very nice to browse through...)
http://docs.scipy.org/doc/
(the docs page of scipy)
http://matplotlib.sourceforge.net/
(The home of the matplotlib Python module, a very powerful plotting library. I always look at the gallery to find what I need and one get's the example code for each graph! Very helpful)

Maybe one last tip: The enthought environment installs an Examples folder as well, so you get a lot of example code installed on your computer, for the times when you are not online!

Now, enjoy!