Monday, November 8, 2010

Scripps class - Introduction to Computers at SIO

A great post from Kurt. I earned my stripes in grad school taking a computational numerics course that forced us to use fortran.  Valuable learning about how computers actually think and use numbers especially for someone who came in (like many) without a computer programming background- before that I *gasp* basically just used excel for data processing. However, I think there is a lot to be said about going through that process.  It certainly helped me appreciate script languages a lot more.  Matlab is great for quick prototyping and for a lot of what I do the inefficiencies are an easy trade off even when I was doing real-time forecast models of mine burial for the Navy, matlab could easily handle the loads - and that was with a lot of lazy kludgy code. Matrix math is powerful and an important lesson to learn that matlab really helped me understand.  Other advantages is that there is a lot of existing code especially in the oceanographic community so that is a bonus.  There are free alternatives such as R that many I know have gone to but I haven't switched...guess I've become an old fogy.  I guess it comes down a bit to what the discipline herd tends to use- atmospheric folks seem to use IDL and a lot of biologist i've known use mathematica.

One thing to consider especially for a field or lab observational need is that many of these are good for number crunching or modeling but not good for data acquisition.  I learned a lot about how instruments work by learning LabVIEW and in fact I got my first job out of grad school because of my LabVIEW skills.  This G-language was a very different approach to programming and in some ways a welcome and intuitive way of seeing problems and sketching out the work flow.  Inherently parallel that sometimes was a problem both for efficiency and because it could make debugging hard.  We used it a lot with our first AUV and I used it to develop the web GUI for the original VIMS OOS and the mine burial model but I've since drifted from it.

Anyways that's my $0.02

starKurt's Weblog
November 8, 2010 3:25 PM
by Kurt

Scripps class - Introduction to Computers at SIO

I was just answering a quick question from a friend and ran into <a href="http://magician.ucsd.edu/~ltauxe/">Lisa Tauxe's web page</a> on <a href="http://mahi.ucsd.edu/class233/">SIO 233: Introduction to Computers at SIO</a>. When I was at SIO, the class was taught just by <a href="http://mahi.ucsd.edu/shearer/">Peter Shearer</a> (<a href="http://mahi.ucsd.edu/shearer/COMPCLASS/">COMPCLASS</a>). Lisa is bringing Python to the class, which is really exciting. Being a programmer before I came to SIO (who had seen too much F77 code), I didn't take the course, but looked over some of the other students assignments to see that they were learning some great material. <br /><br /> I personally argue that <a href="http://mahi.ucsd.edu/shearer/COMPCLASS/fortran.txt">Fortran</a> should not be the first language for anybody. And python is a great place to start. Yeah, I'm definitely biased. I might have pushed Lisa towards python while doing my thesis <img src="http://schwehr.org/blog/moods/smilies/smiley.gif" alt=":)" border="0" <br /><br /> I'm in the process of trying to work similar material into the Research Tools course at CCOM for Fall 2011. If you are willing to deal with typos and really rough drafts, you are welcome to look at the material I am putting together. The command-line chapter is the farthest along. <br /><br /> <a href="http://vislab-ccom.unh.edu/~schwehr/Classes/2011/esci895-researchtools/">2011/esci895-researchtools</a> <br /><br /> My style is different than Lisa and Peters', so I encourage people to check out their lecture notes. I especially like their introduction: <br /><br /> <a href="http://mahi.ucsd.edu/class233/intro.txt">intro.txt</a> <br /><br /> <pre> ----- Why you should learn a "real" language ----- <!-- --> Many students arrive at SIO without much experience in FORTRAN or C, the two main scientific programming languages in use today. While it is possible to get by for most class assignments by using Matlab, you will likely be handicapped in your research at some point if you don't learn FORTRAN or C. Matlab is very convenient for quick results but has limited flexibility. Often this means that a simple FORTRAN or C program can be written that will perform a task far more cleanly and efficiently than Matlab, even if a complicated Matlab script can be kluged together to do the same thing. In addition, Matlab is a commercial product that does not have the long-term stability of other languages, including large libraries of existing code that are freely shared among researchers. <!-- --> Your research may involve processing data using a FORTRAN or C program. If you do not understand the program, you will not be able to modify it to do anything other than what it can already do. This will make it difficult to do anything original in your research. You may resort to elaborate kludges to get the program to do what you want, when a simple modification to the code would be much easier. Worse, you may drive your colleagues crazy by continually requesting that the original authors of the program make changes to accommodate your wishes. <!-- --> Finally, you will be in a more competitive position to get a job after you graduate if you have real programming experience. </pre> I would suggest replacing "Python" every time you see FORTRAN above. They have a section comparing languages. Here is what they have on FORTAN and C: <br /><br /> <pre><b>FORTRAN</b> Advantages Large amount of existing code Preferred language of most SIO faculty Complex numbers are built in Choice of single or double precision math functions <!-- --> Disadvantages Column sensitive format in older versions Dead language in computer science departments <!-- --> <b>C</b> Advantages Large amount of existing code Preferred language of incoming students, some younger faculty Free format, not column sensitive More efficient I/O Easier to use pointers <!-- --> Disadvantages Less user-friendly than FORTRAN (I think so, but others may debate this) Fewer built in math functions (but easy to fix) No standard complex numbers (but easy to fix) Easier to use pointers </pre> Here is my take on Python based roughly on their format: <pre><b>Python</b> Advantages One standard version of the language (many C and Fortran compilers in use: GNU, IBM, Microsoft, and many other companies have their own versions) Features easy to use <a href="http://en.wikipedia.org/wiki/Procedural_programming">Procedural</a> or <a href="http://en.wikipedia.org/wiki/Object-oriented_programming">object-oriented programming (OOP)</a> Easy to call C or C++ code (possible, but not fun, to call Fortran) Less strict type system Standard math and science libraries: <a href="http://docs.python.org/library/math.html">math</a>, <a href="http://numpy.scipy.org/">numpy</a>, and <a href="http://www.scipy.org/">scipy</a> Lots of integrated libraries for getting, parsing, and manipulating data A standard way to install extra packages: <a href="http://pypi.python.org/pypi/distribute/">Distribute</a> that pulls from <a href="http://pypi.python.org/pypi">12,000 packages</a>. (most of which are open source) Interactive shells that work like a terminal shell (e.g. like bash) Automatic memory management Encourages development of libraries of reusable code Formatting controls the structure of the code (indentation matters) <!-- --> Disadvantages Formatting controls the structure of the code (indentation matters)... this drives some people crazy Sometimes slower to execute scripts (but often easy to replace a slow loop with C or C++ code) </pre> And here is my take on matlab: <pre><b>Matlab</b> Advantages Built in Integrated Development Environment (IDE) that is standard and has built in documentation. Lots of built in math capabilities Strong plotting capabilities in a standard interface Strong community of people writing matlab code (python has matplotlib, modelled after this) Easy to prototype complicated systems <!-- --> Disadvantages "Everything is a array (aka Matrix)", which can be in a structure module. Expensive (very expensive for non-academics) Requires a license key (this can really bite you on a ship or if the $ runs out) Hard to embed into other software or systems Slow and inflexible design (hard to create proper data structures) Encourages poor programming practices Harder to automate The MATLAB <a href="http://en.wikipedia.org/wiki/MATLAB#Classes">objected oriented system is not great</a>. Built in IDE (yeah, I'm an emacs guy) Not much of an open source community </pre> And yes, I've written a lot of code in all of the above languages. If you disagree with my take, write your own comparison and post it.


Art Trembanis
CSHEL
Department of Geological Sciences
University of Delaware

No comments: