Archive for the ‘Python’ Category

Algebraic Mesher in Python

Monday, July 14th, 2008

This year I have tried to make it a habit to track how much time I spend on numerical programming. I’d like to put in 10 hours or so a week and this lets me know if I am doing that. Anyway, I went back to my records from February when I was developing the Algebraic Mesher in C. It took me 20.9 hours to complete, with about half of that time spent in a frustrating effort to track down numerous memory problems. The promise of scripting languages is they allow you to develop programs much more efficiently, at the price of slower run time performance. So I decided to create the mesher in Python and see how long it took me.

Method

Four months had elapsed since I finished the C version and I couldn’t recall how I had programmed it. All I remembered is it was a miserable experience. Thus it seemed that I could create a Python version from scratch without undue influence from the previous effort. To get started, I reviewed the equations linking the ID numbers for vertices, faces and cells from earlier, but nothing more. I didn’t look at the C code or my follow up blog entry on the experience.

So I opened up the IDLE editor and looked at a perfectly blank page. I wasn’t sure where to start. Then I thought: I need a cell object. So I made a very simple one:

class cellclass: def __init__(self,CID): self.CID = CID

All it does is store the ID number of the cell. I made a list of cells and stored some numbers. So far so good. I then made a vertex object, and created a list of vertices:

verts=[] for i in range(NumRows+1): for j in range(NumCols+1): VID = (NumCols+1)*i + j x = j*delx; y = i*dely temvert = vertclass(VID,x,y) verts.append(temvert)

In this way I added features incrementally, in baby steps, without really any planning. Proceeding along in this fashion I had the mesh generator finished in 1.5 hours. I couldn’t believe how fast it went. The code wasn’t especially pretty – nearly everything is lumped in one routine – but it seemed to run fine.

Testing

I wanted to make sure I tested the code as thoroughly as I had tested the C version, so I opened up the C code at this point, reviewed the tests there, and replicated them, one by one in Python. With the C version, every test revealed new bugs which had to be laboriously tracked down and fixed. With the Python version, the code passed the first four tests the first time. The fifth test found a bug where I was writing multiple copies of vertices to the face objects. I fixed that in a couple of minutes.

In a little more than 2 hours I had a working mesher. And it was fun to write. The code just seemed to flow out of my finger tips (and I am not especially skilled at Python). Yes, this code probably should be rewritten into a more modular style. And perhaps it could be made more ‘Pythonic’. So maybe another hour or two are needed to really polish it up. But there is something extremely satisfying about getting a rough version working with so little effort.

One major disappointment with the C version of the program was the memory problems. I had spent some time developing a generic list structure in C, so I could make my programming in that language less painful. The list and FOR_EACH macro were supposed to make C more like Python. The thought was good, but the experience was not.

Stats

The table shows the time and lines of code needed for each version of the code. From Prechelt’s study, we expect Python to be 3.8x faster and 3.6x shorter than C/C++. Roughly the speed up in development time mirrors that in lines of code. In my case, the speed up was nearly 9x, while the needed lines of code was about 2x less. This may be due to the inordinate amount debugging time the C code required. In any case the general trend holds: Python is a lot faster to develop in.

. Time --Lines of code -- Lang (hrs) Mesher Test Code Total ------------------------------------------------------------------------------------- C 20.9 299 163 462 Python 2.4 124 89 213 ------------------------------------------------------------------------------------- Ratio 8.7x 2.4x 1.8 2.2x

Run Times

The big downside of fast development time is slow execution time. Python does all kinds of nice things for you (like memory management), but that comes at a price. To measure this, I had both the C and Python meshers create a million cell mesh and timed them. Here are the results:

. Time RAM Lang (sec) (GB) ------------------------------------------------------------------------------------ C 6.9 0.366 Python 344 1.33 ------------------------------------------------------------------------------------ Ratio 50x 3.6x

This test was run on an Intel Core2 Duo (T7300) at 2 GHz, with 2GB of RAM. I think much of the time, for both programs, went into memory allocation. This was corroborated when I tried compiling the C version with various optimizations and got no change in the timing. There may some ways to allocate memory in bigger chunks, and thus speed up the code.

There are tools for speeding up the execution of Python programs. Tools like Psyco and Pyrex can offer dramatic improvements, at least under some conditions. That might be a good experiment for next time.

File: PyAlgMesher.py

Programmer Efficiency

Monday, April 14th, 2008

Programming is a slow process. You write a few lines, test them, find some errors and write a few more lines. The C programming language seems especially slow to me. Being statically typed, you must declare every variable. This takes time. Being a low level language, each statement typically doesn’t do a whole lot. And since its compiled, you periodically must leave the editor and compile and test it. But what really eats the time is the very ugly mistakes you can easily make in C. If you forget to allocate some memory, there is no pleasant reminder to set you back on track. Instead you get a cryptic message and mysterious crash. Thus a lot of time is spent tracking such issues.

Another thing that eats a lot of time is the whole pass by value convention of C. Do I pass var, *var, **var, &var, etc? When the data structures are complex and interconnected it can be darn confusing. You pretty much have to draw it all out on a piece of paper.

When caught up in the frustrations of C programming I often think: there must be a better programming language. Recently I did an internet search to see if anyone has studied programmer efficiency as a function of programming language. While I didn’t find much, there is a very interesting study by Lutz Prechelt at Karlsruhe University: Are Scripting Languages Any Good? A Validation of Perl, Python, Rexx, and Tcl against C, C++, and Java

Prechelt’s Study

In this study Prof. Prechelt gave programmers a specific programming task. They had to convert phone numbers into words from a German dictionary using a particular encoding scheme. The programmers had different levels of experience and came from different backgrounds, but Prechelt was able to extract meaningful results from this experiment.

Prechelt looked at many different things, like memory consumption, run time, and reliability of the programs. In those measures, the scripting languages did surprisingly well. Of course a well written C program will be faster than a well written Python program, but in this study there was a huge variability in the results and many of the Python programs were as fast as many of the C/C++ or Java programs. However, our interest is in programmer efficiency. Look at this graph:

Line of Code

It shows that the scripting languages (TCL, Rexx, Python and Perl) required an average of 98 lines of code to complete the program (not including comments). The non-scripting languages (C, C++, Java) needed around 277 lines. That is a huge difference. If you look at the shortest programs, the results are even more dramatic. The shortest C/C++ program was 150 lines, while the shortest Python script was 42 lines. That is 3.6 times more code for the C/C++ program.

There is a rule of thumb in programming that it takes about the same amount of time to program a line of code regardless of language. If true, this suggests that the script writers completed their task more quickly than the compiled language programmers. That was indeed the case. Here is the plot from Prechelt’s paper:

Programming Time

Sure enough, the same trend is seen. The average scripting time was 3.8 hours while the non-scripting languages are at 14.3 hours. The minimum reported times were 1.5 hours for Python and 10.5 hours for C/C++. We probably shouldn’t put much weight on the minimum values, since the times were self reported. But the averages are worth comparing. The scripting programs were developed 3.8x faster. That is amazing when you think about it. A task that might take a week of slogging in C could be done in a little over a day with Python.

Amazing, but True?

So how believable are these results? If we trust Prof Prechelt’s methodology, they are believable for the phone-code problem, but do they apply to numerical programming? There is a hint in the paper:

The shorter program length of the script programs can
be explained by the fact that most of the actual search is
done simply by the hashing algorithm used internally by
the associative arrays. In contrast, the non-script programs
require most of the elementary steps of the search process
to be coded explicitly by the programmer. This is further
pronounced by the effort (or lack of it) for data structure
and variable declarations.

This suggests that the scripting languages were more efficient because they had high level tools, like associative arrays, that were well suited to the phone-code problem. One could argue that had suitable tools existed in the non-scripting languages, they would have enjoyed similar efficiencies.

There is a slight problem with this argument. Namely some of the non-scripting languages do have such tools, but the programmers didn’t use them:

It is an interesting observation that despite the existence
of hash table implementations in both the Java and
the C++ class libraries none of the non-script programmers
used them (but rather implemented a tree solution by hand),
whereas for the script programmers the hash tables built
into the language were the obvious choice.

The implication of this is that it is not just the presence of high level tools that gives scripting languages their efficiency; it is also the ease with which you can use those tools. Python is a good example of this. After you import a module, you can access that module’s documentation from the command line:

>>>import somemodule >>>help(somemodule)

Not only is this very handy, it also encourages one to explore what is available in modules. In addition, the language is easy to read, so you can peruse the source code for additional insights. This greatly lowers the threshold for reusing other people’s code, and that is a great productivity booster.

What about Speed?

Scripting languages are slow, typically 10x to 100x slower than compiled languages, in terms of execution speed. Is it really helpful to create a numerical program in record time if it takes two months to run? Scripting advocates point to the Pareto principle: 20% of the routines take 80% of the execution time. Thus it makes sense to code everything in the scripting language, see where the bottlenecks are, and rewrite those in C, C++ or Fortran. Sounds good. I would like to try some experiments to see for myself.