This is the title of chapter 11 from the book “The mythical man-month”, by Frederick Brooks. It is not exactly my case but I always have in mind the concept: you should be ready to change your mind, throw away the job done, and rollback or start from scratch.
You can do interview to users, analyze processes, write stories, you never know how a project will develop. In this case I’m doing a porting of a program written in the late 80s so I need to understand the original programmer workflow. My idea on how to port the program had to be validated spending as little time as possible. Some were succesful, now I’m facing the first one that didn’t, and it is a show-stopping one.
I repeated my idea several times: unify all the executables into one. The idea got a stop when I discovered, merging the first external group of source files, that the original programmer used function with identical names in different sources and that the code inside the functions is different!
Let’s have a look at the lockok() function:
FUNCTION lockok() SELECT dbf02 IF ! my_file_lock( ... ) RETURN .F. ENDIF SELECT dbf05 IF ! my_file_lock( ... ) RETURN .F. ENDIF ... RETURN .T.
It is a really simple code: it tries to put a file level lock on several files. This function is defined in 3 different source files (linked in 3 different executables, of course) with a different list of databases. It is used in places where the programmer wanted to be sure that nobody else could change the data during an update that spans 3 to 8 databases. Unfortunately, if the function reports that non all files could be locked, the program just skips the big update…
We have two easy ways to handle this situation. Both need that you understand to which executable belong the source files that have the function definitions and, more important, the ones where the function is just used. Once you trace them, you can simply prefix the function name with a identifier and change all the calls that belong to that executable, or refactor the function to receive the databases alias as a parameter.
In this case the second option is probably better, so that similar looking code is deduplicated. I also discovered that there are a couple of other functions, with similar sounding names, that perform the same job: probably the programmer was creating duplicates in his own programs..
This is a really simple case, one that can be handled very quickly. But the second is really more delicate. Infact the second duplicate function I analyzed was about 150 lines long and it looked really similar, I’d say that they were perfectly identical… I was about to port the function to the application library but in the end decided to use a diff tool and I finally noticed that a couple of lines were a bit different: 2 out of 150 lines were different in 2 IFs that had 2 expressions vs 3 expressions. I don’t know what the code really does, and if this check is really important to the data I’ll use and I don’t know if I could add it to the missing one. Solution: add a prefix to the function names.
In the previous post I already reported about one function duplicated, different, but used only in the source file they were defined: adding STATIC solved this case.
Another case was a function whose code is completely different but functionally equivalent. Imagine you have to report the sum of the values in a array and use a FOR loop once and a aEval() the other time. But to deduplicate the code I should write test cases to see that there are not edge cases handled differently.
Too many different cases: first 4 checked, 4 different problems and 4 different solutions. And there are 47 to go! Probably it’s time to change strategy. Merging this code I had a couple of compilation problems I solved with a temporary patch: disabling that code! So I decided to step back and fully mimic the original program, compile all different source code trying to clean up compilation errors and in meanwhile try to experiment a robust solution for passing the subdir to use as current working directory.
So I decided to throw away the changes done to deduplicate the functions and to start in a new direction.