I've written this page chiefly to offer some advice to anyone thinking of developing the abcMIDI package. The code has been written with the aims of being portable and easy to maintain, but working with such a large program is a daunting task, particularly if you haven't tried anything similar before. Also, the code is written in C, so you should be prepared to work in that language. If you plan on writing your own separate utility, then you have a free hand to do pretty much what you like in terms of re-organizing the code. However, if you are making changes in the code that you want to 'hand back' to other users of the original code, then you need to make sure your changes disrupt the code as little as possible, or else the handing back may become impossible. A simple change to the code can be handed on as a patch; a small file showing which lines have to be inserted, deleted or changed. The patch can be generated automatically using the diff utility. There is also a utility called patch which will apply such patches, but usually the patch can be applied by hand.
Here I describe the tools and techniques that I use for working with the code. This is part documentation, part essay on software engineering; I hope you find something interesting. You will probably find places where the code fails to live up to the high ideals I have set down.
I like to think of a complicated program as a big machine built out of "black boxes", where each function is a black box. A black box is a unit that does a well-defined job (described by a short comment at the start in the case of the function). If we open up the machine, we find it is made up of a small number of black boxes connected together in a fairly simple manner (at this level we are just examining the code for main()). Suppose we are trying to change some aspect of the program. There will be one of the black boxes that does not behave quite as we want. We will need to open up that new black box, see how it works and which component black box or boxes are not behaving as desired, open them up and so on down until we reach the code that does what we are interested in.
Naturally enough, the black boxes will not always be laid out next to each other and you will have to jump around in the code a bit to trace them. This is where an editor with a search facility becomes vital. However, I have tried to group related routines together.
What I've described is really just a top-down approach to reading a program. From the program maintainer's point of view, the moral is that by using a large number of short functions, each with a well-defined job, you can get away with reading only a small proportion of the code to understand it well enough to start modifying it. This brings me to the first 2 aims for the abcMIDI coding style:
I have also tried to use a consistent indentation style. I use braces round if .. else .. clauses, even if they only contain one statement, so that it is easier to check for correct nesting visually. One quirk of my style is that case statements are not indented within a switch statement. If you do indent, it will look as if there is a missing brace at the end of the switch statement.
Of course, a very important tool is your C compiler. I use and recommend DJGPP, a port of gcc, the GNU C compiler, to DOS/Windows. This not only performs compilation, but also comes with a number of useful utilities including make and split/merge, utilities for breaking up a large file into components small enough to go on a floppy and then combining them back into the original file afterwards. These latter two are needed to install DJGPP from floppy. The GNU C compiler also has options to do checks on C code that were traditionally done by lint in the past. There are also a number of other programs, including symify, an extremely useful post-mortem tool that can pinpoint where pointer errors are causing segmentation faults.
Of course, there are other compilers, many of which are smaller and easier to install. I have tried to make the code portable, which means that I have avoided features that are only provided by one particular compiler or operating system. PCC is a smaller compiler which will compile abcMIDI and is much simpler to install.
If I am adding a complicated unit to the code (for instance, the queue- handling procedures in abc2midi), I generally try to verify the operation of the unit separately before incorporating it into the code. I don't want to add a faulty sub-system which may cause subtle errors later.
Pointers tend to be a source of mysterious errors, though much less so once I discovered symify. I have tried to follow a fairly strict discipline with pointer variables. When they do not hold a valid pointer, they are assigned the value NULL. When they are dereferenced, I usually first check for a NULL value.
For some things, providing this sort of extensibility is not worth doing because you can choose an upper bound which is only going to be exceeded by incorrect or very bizarre input. For example, the level of bracket nesting in a part specifier is unlikely to exceed the hard limit of 10.
If you look at the source for abc2ps (not written by me), you will find is consists of many files, but they are all referenced using #include statements by one 'master' file. Therefore, what the compiler sees after pre-processing is one massive file. Doing it this way is perfectly valid C, but it does have some drawbacks when compared to using modules.
Perhaps the most obvious reason for using modules is to break a very large program into manageable chunks. Most editors have a limit on the size of file they can handle. Also, many compilers (including PCC) have a limit to the size of file they can compile before they overflow their internal tables. Using modules means that only the linker need deal with the whole thing (and linkers are usually written to be capable of this). An added fringe benefit is that by using a makefile to do the compilation, you only need to re-compile the modules that have changed since your last compilation.
Dividing the code into modules also breaks up the code into logical units, which makes it easier to read. The C language only allows access to variables in another module if they are declared with an extern statement. This enforces the logical separation of the modules since the programmer cannot inadvertantly access global variables in other modules. Doing things the other way round, a global variable declared in one module can be made local to that module and invisible to the other modules by declaring it as static, re-inforcing the "black box" approach.
Another reason why I used modules was to be able to keep the midifile code (not written by me) as a separate unit that couldn't be affected by my own coding changes. This meant that the code was modular right from the start. The midi2abc code has remained small, but the abc2midi code grew so large that I had to break up one of the large files into smaller modules to get it to compile with PCC.
If you wish to add a new body of code to abc2midi, one way you might consider doing it is to write a new module and link it into the main code with a small number of changes to the main code providing the interface to your module. This way, plugging in and unplugging your module becomes a simple matter.
To program with modules, you do need to understand how C handles module interaction (in particular the extern and static keywords), but it gives a number of advantages. A good way of thinking about a module is to think of it supplying a set of routines in the same way that a system library does.
There is a utility called diff3 which will merge together two variants of a program, but I usually find my own hand-editing is good enough to do the job.
One thing to be wary of is making lots of changes for no good reason; for example using indent or some other program to pretty-print the code in a style which is more to your personal taste. Doing this is likely to make it impossible to use diff to pinpoint new code and result in a variant abc2midi strain which cannot be merged back with the original.
Ideally, when I add a new feature to one of the programs, I should add a new test to show whether that feature is working properly. However, I have been fairly lax about this. From time to time, I change the way error-handling is done and the tests show differences between the output and the reference file. As long as I can convince myself that the new output is correct, I update the reference file.
I always try to put out bug fixes for reported problems fairly quickly. However, there were a few times when I released bug fixed versions which had worse problems than the original bug. This is what convinced me of the need for a quick automatic test.
If you are working with the code and adding new functionality, it is better to release a series of small updates than to release your masterpiece after six months of work. This way, improvements in the code can be blended in relatively painlessly, and you are unlikely to be duplicating someone else's work. This is my philosophy at least and the reason why I need 3 numbers to specify the version. The automatic tests can be applied quickly and either expose problems or give me a lot of confidence in the current version.