Category Archives: refactoring

Can you open my data files?

In the previous posts I described the steps to have the program compile, link and start… I don’t use the verb run but start. Infact the main menu is displayed but whatever selection I do I get one of the following: the program exits or a file open error.

The program exits when it needs to call an external executable. It is normal and expected. What is not expected is the file not found message. Why, since my_netuse() correctly opened file01.dbf?

Let’s introduce function smart_netuse()

smart_netuse() uses a lookup table to know which data files and indexes to open. Suppose you have 2 dbf: clients and invoices. Clients.dbf has indexes cli01 and cli02. Invoices.dbf has indexes ind01, ind02, ind03.
There is a table, dbftab, with several columns holding the infos related to each file, keyed by an alias (that in my case is the same of the dbf name). For example, the row related to file clients has these fields:

ALIAS: clients
DBFNAME: clients
INDEX1: CLI01
INDEX2: CLI02
INDEX3: CLI03

All the fields are C 8.

smart_netuse() is called in this way:

result := smart_netuse( action, ;
          "CLIENTS,INVOICES", ;
          "DBFTAB", ;
           exclusive, can_exit, timeout,...)

The function opens the lookup table passed as third parameter (DBFTAB) and then splits (using sub-optimal code) the second parameter and uses the values to lookup the proper rows.

The function code could have been really simple but it is not. First of all it opens the lookup table. Then it starts a loop from record 1 to LastRec() and in the loop, for each record, it extracts the first value of the comma delimited string passed as second parameter (in this case, CLIENTS) and checks if an alias() with that name is already opened. If it is not already open, if the value matches the alias field of the current record, it uses the field values to open the file and indexes. But it can’t open them and reports an error…

If I correctly read the code, I think that there are some problems to be aware, something I should keep in mind if I’ll need to add some new features. The order of the alias in the comma delimited list (second parameter) must match the order of the records in DBFTAB. I confirmed this requirement looking at the dbf and doing some greps. So, if I want to refactor this code in the future I must check if this requirement is needed in some part of the code, that may expect that the data files are opened in a certain order in adjacent workareas.

Then there is the reason of the error, a probable incompatibility between Clipper 87 and Harbour. The function does the following:

file = FIELD->DBFNAME
key1 = FIELD->INDEX1
key2 = FIELD->INDEX2
key3 = FIELD->INDEX3
IF my_netuse( file, ... )
   SET INDEX to &KEY1, &KEY2, &KEY3

Do you remember that the fields are C 8 and that my_netuse( file ) performed a simple USE &file ALIAS &file? Well, the program reported a missing file, and I should use monospaced fonts to explain:

The file not found error is for CLIENTS .DBF

Did you notice the space?

Since the error was reported by my_netuse() I went there and added a

file := alltrim( file )

to be able to open the file.

Library compiled, linked, run: now there is another error, missing index files… I’m sure you think to know where the problem is… and you are probably right, but not in this moment: the index files are really NOT present on the disk !

I used a backup to restore the dbf files and backups don’t have indexes included (they may be recreated…) so they are not present on my disk… it’s time to create them…

… but the indexes are created by a different executable…

It’s time to start integrating external source files.

Do you refactor?

In the previous posts I reformatted the source code, had a first look at the code to see that it was compilable, had a look at how the data structure is organized in the directories, compiled the first executable writing or decompiling the missing functions.

The first executable, the main one, has the main menu but only a few of the functions are internal: in a lot of situations it just sets errorlevel and exits. Well, I discovered that it does this trick also when a workarea is chosen! You are presented with a list of workareas (client/year) on screen, select one and the program first sets the current work directory, then sets the errorlevel and finally exits: the batch file checks the errorlevel value and in this case restarts the main menu. But I want to eliminate the batch file, since I want to port all the code inside one executable…

Let’s start changing code. The first step is to define the strategy of the data dirs layout, something we already spoke about in
this post. There needs to be an environment variable that points to the root data dirs and all the subdirectories (one each for client/year) must be its children.
Current code uses a mix of curdir(), curdrive() and environment variable to determine the root data dir, then uses aDir to read the directory contents into an array (predefined with 512 items), loops on the array looking for a directory and when finds one, checks to see if there are 2 files and if both are found, opens one of them and reads a value that is added to an array. When the loop is completed, the maximum length of the strings in the array is calculated and finally the user is shown a form with an aChoice.

Would you write this code yourself, in this style? No, I won’t. And I’m trying to stay away for rewriting the whole function.

At first rewriting the whole function seems the smart thing to do. The code will look modern, in your personal coding style. But I don’t thing it will be a rewarding idea. This function seems to be a self contained one, with very weak relations with other part of the code. Infact it just sets a new working directory and exits! It will be used 5, 10 times per day, and if it is a few millisecond slower than the optimum it won’t be a problem. It’s a bit ’87 style, it uses aDir() instead of the more modern Directory(), it does some loop that is not necessary but the code works flawlessy, and it has been working for the last 20 years…

There are just a few things that I feel should be done and can be done quite safely. The programmer defined all the variables as PRIVATE. There are no LOCALs and no STATICs. This must raise a big alert flag for when I will put all the code in one executable. The programmer used a lot of macro expression: I can see a lot of them in the code, and they may refer to some of these PRIVATEs.
In this specific source file (but I think the situation is common) the PRIVATE variables are defined at the top of the file, in the first procedure (implicitely defined), also if they are used only in one function!
For a6/a7 variables a RELEASE statement let me understand that the variables are not used elsewhere, but just in that function. So I moved their definition from the first function to the function they are used:

PRIVATE a7[512],a6[512],nDir

becomes

FUNCTION ChooseWorkarea
LOCAL a7[512], a6[512], nDir

I could move some more variables from PRIVATE to LOCAL because a grep returned that the name was not used anywhere else, in none of the 200 source files. The program was going to exit anyway so probably it could be safe to LOCALize some more, but I want to be on the safe side at the moment. Refactoring is my passion, but I had to stop to continue the porting job!

The code is now modified as described in the previous post. The code that set the errorlevel was removed and a RETURN was added in the nidified code.

// set the working directory
CHDIR( drive + a8[ choice ] )
// FP: instead of setting errorlevel and exiting, return
RETURN

The code works. After the user choice a workarea, the control goes back to the main menu. I can confirm that the workarea is set because the function that handles the screen painter can find file1.dbf, open it and read the company name to display on-screen.

aMaxStrLen(): how many ways to write it?

In the previous post I published my version of Funcky aMaxStrLen(). Maurizio commented posting a different version, written using a completely different strategy. I think there is a third way, using FOR EACH. And probably a fourth and a fifth. And probably more.

I put the two versions here, and I will comment on my version, that is correct and wrong at the same time.

// My version
// AMAXSTRLEN: get longest string length in array
FUNCTION aMaxStrLen( a )
   LOCAL n
   LOCAL i
 
   IF Len( a ) == 0
      RETURN 0
   ENDIF
   n := Len( a[ 1 ] )
   FOR i := Len( a ) TO 2 STEP -1
      IF HB_ISCHAR( a[ i ] ) .AND. Len( a[ i ] ) > n
         n := Len( a[ i ] )
      ENDIF
   NEXT
   RETURN n

Now the version posted by Maurizio

// Maurizio version
FUNCTION aMaxStrLen( aArr )
   LOCAL nMaxLen := 0
 
   AEval( aArr, { | e | nMaxLen := Max( nMaxLen, IIf( HB_IsString( e ), Len( e ), 0 ) ) } )
   RETURN nMaxLen

aMaxStrLen() is used just once in the code (actually twice, but in duplicated code), on an array that I’m sure that has all string items: it is created a few lines as an empty string and then populated with aAdd using string values.
So when I started coding the function I was aware of this. And infact I wrote this line of code:

n := Len( a[ 1 ] )

and it is perfectly ok, and is the “correct” part.

The “wrong” part starts just two lines after, when I wrote:

IF HB_ISCHAR( a[ i ] ) .AND. Len( a[ i ] ) > n

Why am I checking if the item is a string if I know that all of them are strings?
So, or I trust the array passed as parameter, or I don’t trust it, and I should check also the first item:

IF HB_ISCHAR( a[ 1 ] )
   n := Len( a[ 1 ] )
ELSE
   n := 0
ENDIF

Of course I won’t write code this way, it is just an example.

Did I commit the over-optimize error? Yes, probably I did, separating the three cases of array lenght: zero lenght, length equal to 1 and length more than 1.
In a real clone of aMaxStrLen, where there may be an array with up to 4096 items (is it the upper limit of Clipper 87 arrays?), it may be nice to have these checks. But this is a special case, where I know that the items won’t be more than a couple dozen.

And now the real, philosophical question: the use of aMaxStrLen in this program is in a special case, within specific bounds. The implementation I did was partly optimized for the specificity of the program. Wouldn’t be better to change the function name to something that makes clear that the recreated function is a version for a specific task?

What do you think: it is better to spend time to write a perfect clone of aMaxStrLen (doing comparisons using tests compiled with Clipper 87 version, since the docs are not very clear on edge cases) or create a specific version giving it a new, derived name? MyAMaxStrLen for example?

How would you write this function?

PS: I’m going to remove this function call. In the loop where the array is populated, I can check the length of the item to be added and keep track of the longest one: no need to duplicate the loop!

The missing sources part 2

Now I have a more clear idea how to continue the work, let’s document the next steps.

First of all I want to remember that the first test of building the application listed about 30 missing functions, all used only in the first source file but I’m sure others are used in other source files. The missing functions may be part of the Funcky library or an application level library of which I only have the .lib file. A friend decompiled a couple of executables so that I could recover some of these missing sources. A blinker generated file I found in the archive lists all the object files linked in one exectuable and the functions it contains so that I could create the source code files using the same name.

I used the online Norton Guide of the Funcky libs to detect which functions were used from the library, and from the first list of missing functions I had the following list:

CLS: clear entire screen
ROLOC: reverse attribute
PRINT: print a string
BORDER: set border color (CGA/EGA/VGA)
CLOCK24, UNCLOCK24: display/turn off a 24 hour military format clock
ONDO: execute expression  in the list
FLRESET: reset default font/mode
CENTER: add leading spaces to center in  width
AMAXSTRLEN: get longest string length in array
CURDRIVE: get the logged drive letter (d:) 
ALEN: length of array without undefined elements
CHDIR: change directory

Looking at the samples in the NG and their use in the code I created the following code:

#include "set.ch"


// Unused, no more valid, functions

// CLOCK24, UNCLOCK24: display/turn off a 24 hour military format clock

FUNCTION clock24()
   RETURN .T.

FUNCTION unclock24()
   RETURN .T.

// BORDER: set border color (CGA/EGA/VGA)

FUNCTION border( n )
   RETURN .T.

// FLRESET: reset default font/mode

FUNCTION FLReset()
   RETURN .T.


// STUB functions (need changes)

// CURDRIVE: get the logged drive letter (d:)
// PROBLEM: returns a fixed drive letter
FUNCTION curdrive()
   RETURN "f:"

// CHDIR: change directory
FUNCTION chdir( cPath )
   hb_cwd( cPath )
   RETURN .T.

// CENTER: add leading spaces to center in <nn> width
// PROBLEM: samples in NG says that there should be a trim()
FUNCTION center( c, n )
   RETURN PadC( c, n )

// ROLOC: reverse attribute
// PROBLEM: Returns a fixed value
FUNCTION roloc( x )
   RETURN "w"

// PRINT: print a string
// PROBLEM: nColore not used
FUNCTION print( y, x, cTesto, nColore, nRepl )
   IF ! HB_ISNUMERIC( nRepl )
      nRepl := 1
   ENDIF
   @ y, x SAY Replicate( cTesto, nRepl )
   RETURN .T.

// CLS: clear entire screen
// PROBLEM: cChar not used, not all cases covered
FUNCTION cls( cColor, cChar )
   @ 0, 0 CLEAR TO MaxRow(), MaxCol()
   RETURN .T.

// Completed functions, may need changes

// ONDO: execute expression <n> in the list
FUNCTION ondo( nValue, ... )
   LOCAL b, c
   IF ! HB_ISNUMERIC( nValue )
      ? "ERRORE ONDO valore non numerico"
      QUIT
   ENDIF
   IF nValue > Len( hb_AParams() ) - 1
      ? "ERRORE ONDO, passato valore", nValue, "ma solo ", Len( hb_AParams ) -1, "valori"
      QUIT
   ENDIF
   c := hb_AParams()[ nValue + 1 ]
   b := hb_macroBlock( c )
   Eval( b )
   RETURN .F.

// AMAXSTRLEN: get longest string length in array
FUNCTION aMaxStrLen( a )
   LOCAL n
   LOCAL i

   IF Len( a ) == 0
      RETURN 0
   ENDIF
   n := Len( a[ 1 ] )
   FOR i := Len( a ) TO 2 STEP -1
      IF HB_ISCHAR( a[ i ] ) .AND. Len( a[ i ] ) > n
         n := Len( a[ i ] )
      ENDIF
   NEXT
   RETURN n

// ALEN: length of array without undefined elements
FUNCTION aLen( a )
   LOCAL i
   LOCAL n := 0

   FOR i := 1 TO Len( a )
      IF ValType( a[ i ] ) != "U"
         n++
      ENDIF
   NEXT
   RETURN n

Then I created funcky.hbp file

# I want a library
-hblib
# I want to call it "funcky"
-ofuncky
# 
src\funcky
</pre>

And built it
<pre>
hbmk2 funcky.hbp
</pre>

The library was successfully created.

I then updated exe01.hbp file adding the following lines:
<pre>
-L.
-lfuncky
</pre>
to link the library just created. I could have used a .hbc file. I will, in the future.

Now the build process reports only the functions that should be in the missing application-level library. I create a .hbp file for this new library (I called it applib) adding all the decompiled functions that I'm aware of, using the blinker generated file as a guide.

1
# I want a library
-hblib
# I want to call it "applib"
-oapplib
# source files
src\decomp01
src\decomp02
..
src\decomp13

There are still missing functions reported by the linker, and all used in the first source file only. Looking at the names and how they are used in the code, they are almost all related to screen handling, to create fancy effects, like imploding or exploding windows... Just one is another clone of padr function.
So I create all empty stub functions, except for the clone function, and finally added a function I usually use for debugging:

#pragma BEGINDUMP
#include <windows.h>
HB_FUNC( OUTPUTDEBUGSTRING )
{
    OutputDebugString( (LPCSTR) hb_parcx( 1 ) ) ; 
}
#pragma ENDDUMP

Now the last step, trying to build the program and finally get the new missing functions from the next source files.... ready ?

hbmk2 exe01.hbp

Nice sensation when you get the errors you were looking for !!!
The errors had the same pattern: if there is only one missing function the linker processes the following file, otherwise it stops. So I have an error for file02.prg and file03.prg but 3 for file04.prg so I have to solve them before getting new missing functions...

Now I have to loop: compile and get the list of the missing functions. If the function is from Funcky lib, check its docs and its usage, then create a clone.
If not from Funcky lib, check in the decompiled files. At the moment I had just 2 minor executables decompiled; the main one, with almost all functions, fails to decompile. I know this will be a problem in the future... I know...

The job was quite repetitive, but in the process I found something interesting...

I had to correct this code snippet that failed to compile:

   Chr( K_CTRL_HOME ) + Chr( K_DOWN )  // fails to compile in Harbour

there was obviously a missing KEYBOARD()... I don't have Clipper 87 installed so I can't check if it also failed to compile... but anyway, it never worked as intended...

I found several Funcky functions used to set/delete the volume label, or check if a drive is ready (A:) or if the passed string points to a valid dir. I created stubs to always return .T. These functions are used just in a couple of source files to perform backups, move data between clients/year or create specially formatted data to upload/send for tax filings or other law related matters.

Then there was a subtly named function, used in this way

CASE ISCHAR( LastKey() ) .OR. LastKey() = K_ENTER

I quickly, and wrongly, converted ISCHAR to HB_ISCHAR. Wrong, wrong, wrong !
Since Lastkey() always returns a numeric value, HB_ISCHAR will be always .F. Instead, the real job of the function is to return .T. if the pressed key is a "char", a letter of the alphabet. I will check better in the following lines to understand if the Lastkey() value is used for some other job, in the meantime I created this:

FUNCTION ischar( nValue )
   IF nValue >= 32 .and. nValue <= 128
      RETURN .T.
   ENDIF
   RETURN .F.

Another function I found is Funcky ReadScreen(): it returns the string at passed screen coordinates. It's used only once and I plan to replace it, when I will look at that code. At the moment I just exit the application.

FUNCTION ReadScreen()
   ? "Function ReadScreen not implemented"
   QUIT

Then there is one last function that I hope I can replace easily. This function is not listed in the blinker generated files. Looking at the few places where it is used, it seems to be a sort of "change directory": it is used in the sourcse that handle index creation and file copying. I will have to understand it better. In the meantime I will create a stub function that prints a debug line and returns .T.

To have my program compile and link I had to add a couple of functions from the decompiled executables, to write 6 application functions and some more Funcky stubs (see code below).

Of the 6 rewritten functions, one is related to licensing (that was already disabled in the code so the code could also be removed); another is a clone of pad, some are screen related, all created as stubs.

The first executable is now ready to be tested. The environment variable set to the root data directory, program launched....

Now I'm were I was a couple of days ago, but with clean, well formatted code, the process documented in this blog and some ideas on how to continue on the drawing board...

The program runs, the menu appears on screen... now the next step is to let the software find its data files...

During the compile/link/error/add_functions loop I added the following functions to funcky.prg

// get status of Clipper's various set commands
// PROBLEM: incomplete
FUNCTION Status( nIndex )
   DO CASE
   CASE nIndex == 6
      RETURN( _SET_CURSOR )
   END CASE
      clear screen
      ? "Richiesto status valore ", nIndex
      QUIT
  RETURN .F. 

// csrput() - place cursor at specified row/column
// PROBLEM: is it correct ?  
FUNCTION csrput( y, x )
   @ y,x SAY ""
   RETURN .T.


// palette() - access 64 colors on EGA and VGA adaptors
FUNCTION PALETTE()
   RETURN 1

FUNCTION SWAP( p1, p2 )
   LOCAL t := p1

   p1 := p2
   p2 := t
   RETURN .T.