aMaxStrLen(): how many ways to write it?

In the previous post I published my version of Funcky aMaxStrLen(). Maurizio commented posting a different version, written using a completely different strategy. I think there is a third way, using FOR EACH. And probably a fourth and a fifth. And probably more.

I put the two versions here, and I will comment on my version, that is correct and wrong at the same time.

// My version
// AMAXSTRLEN: get longest string length in array
FUNCTION aMaxStrLen( a )
   LOCAL n
   LOCAL i
 
   IF Len( a ) == 0
      RETURN 0
   ENDIF
   n := Len( a[ 1 ] )
   FOR i := Len( a ) TO 2 STEP -1
      IF HB_ISCHAR( a[ i ] ) .AND. Len( a[ i ] ) > n
         n := Len( a[ i ] )
      ENDIF
   NEXT
   RETURN n

Now the version posted by Maurizio

// Maurizio version
FUNCTION aMaxStrLen( aArr )
   LOCAL nMaxLen := 0
 
   AEval( aArr, { | e | nMaxLen := Max( nMaxLen, IIf( HB_IsString( e ), Len( e ), 0 ) ) } )
   RETURN nMaxLen

aMaxStrLen() is used just once in the code (actually twice, but in duplicated code), on an array that I’m sure that has all string items: it is created a few lines as an empty string and then populated with aAdd using string values.
So when I started coding the function I was aware of this. And infact I wrote this line of code:

n := Len( a[ 1 ] )

and it is perfectly ok, and is the “correct” part.

The “wrong” part starts just two lines after, when I wrote:

IF HB_ISCHAR( a[ i ] ) .AND. Len( a[ i ] ) > n

Why am I checking if the item is a string if I know that all of them are strings?
So, or I trust the array passed as parameter, or I don’t trust it, and I should check also the first item:

IF HB_ISCHAR( a[ 1 ] )
   n := Len( a[ 1 ] )
ELSE
   n := 0
ENDIF

Of course I won’t write code this way, it is just an example.

Did I commit the over-optimize error? Yes, probably I did, separating the three cases of array lenght: zero lenght, length equal to 1 and length more than 1.
In a real clone of aMaxStrLen, where there may be an array with up to 4096 items (is it the upper limit of Clipper 87 arrays?), it may be nice to have these checks. But this is a special case, where I know that the items won’t be more than a couple dozen.

And now the real, philosophical question: the use of aMaxStrLen in this program is in a special case, within specific bounds. The implementation I did was partly optimized for the specificity of the program. Wouldn’t be better to change the function name to something that makes clear that the recreated function is a version for a specific task?

What do you think: it is better to spend time to write a perfect clone of aMaxStrLen (doing comparisons using tests compiled with Clipper 87 version, since the docs are not very clear on edge cases) or create a specific version giving it a new, derived name? MyAMaxStrLen for example?

How would you write this function?

PS: I’m going to remove this function call. In the loop where the array is populated, I can check the length of the item to be added and keep track of the longest one: no need to duplicate the loop!

The missing sources part 2

Now I have a more clear idea how to continue the work, let’s document the next steps.

First of all I want to remember that the first test of building the application listed about 30 missing functions, all used only in the first source file but I’m sure others are used in other source files. The missing functions may be part of the Funcky library or an application level library of which I only have the .lib file. A friend decompiled a couple of executables so that I could recover some of these missing sources. A blinker generated file I found in the archive lists all the object files linked in one exectuable and the functions it contains so that I could create the source code files using the same name.

I used the online Norton Guide of the Funcky libs to detect which functions were used from the library, and from the first list of missing functions I had the following list:

CLS: clear entire screen
ROLOC: reverse attribute
PRINT: print a string
BORDER: set border color (CGA/EGA/VGA)
CLOCK24, UNCLOCK24: display/turn off a 24 hour military format clock
ONDO: execute expression  in the list
FLRESET: reset default font/mode
CENTER: add leading spaces to center in  width
AMAXSTRLEN: get longest string length in array
CURDRIVE: get the logged drive letter (d:) 
ALEN: length of array without undefined elements
CHDIR: change directory

Looking at the samples in the NG and their use in the code I created the following code:

#include "set.ch"


// Unused, no more valid, functions

// CLOCK24, UNCLOCK24: display/turn off a 24 hour military format clock

FUNCTION clock24()
   RETURN .T.

FUNCTION unclock24()
   RETURN .T.

// BORDER: set border color (CGA/EGA/VGA)

FUNCTION border( n )
   RETURN .T.

// FLRESET: reset default font/mode

FUNCTION FLReset()
   RETURN .T.


// STUB functions (need changes)

// CURDRIVE: get the logged drive letter (d:)
// PROBLEM: returns a fixed drive letter
FUNCTION curdrive()
   RETURN "f:"

// CHDIR: change directory
FUNCTION chdir( cPath )
   hb_cwd( cPath )
   RETURN .T.

// CENTER: add leading spaces to center in <nn> width
// PROBLEM: samples in NG says that there should be a trim()
FUNCTION center( c, n )
   RETURN PadC( c, n )

// ROLOC: reverse attribute
// PROBLEM: Returns a fixed value
FUNCTION roloc( x )
   RETURN "w"

// PRINT: print a string
// PROBLEM: nColore not used
FUNCTION print( y, x, cTesto, nColore, nRepl )
   IF ! HB_ISNUMERIC( nRepl )
      nRepl := 1
   ENDIF
   @ y, x SAY Replicate( cTesto, nRepl )
   RETURN .T.

// CLS: clear entire screen
// PROBLEM: cChar not used, not all cases covered
FUNCTION cls( cColor, cChar )
   @ 0, 0 CLEAR TO MaxRow(), MaxCol()
   RETURN .T.

// Completed functions, may need changes

// ONDO: execute expression <n> in the list
FUNCTION ondo( nValue, ... )
   LOCAL b, c
   IF ! HB_ISNUMERIC( nValue )
      ? "ERRORE ONDO valore non numerico"
      QUIT
   ENDIF
   IF nValue > Len( hb_AParams() ) - 1
      ? "ERRORE ONDO, passato valore", nValue, "ma solo ", Len( hb_AParams ) -1, "valori"
      QUIT
   ENDIF
   c := hb_AParams()[ nValue + 1 ]
   b := hb_macroBlock( c )
   Eval( b )
   RETURN .F.

// AMAXSTRLEN: get longest string length in array
FUNCTION aMaxStrLen( a )
   LOCAL n
   LOCAL i

   IF Len( a ) == 0
      RETURN 0
   ENDIF
   n := Len( a[ 1 ] )
   FOR i := Len( a ) TO 2 STEP -1
      IF HB_ISCHAR( a[ i ] ) .AND. Len( a[ i ] ) > n
         n := Len( a[ i ] )
      ENDIF
   NEXT
   RETURN n

// ALEN: length of array without undefined elements
FUNCTION aLen( a )
   LOCAL i
   LOCAL n := 0

   FOR i := 1 TO Len( a )
      IF ValType( a[ i ] ) != "U"
         n++
      ENDIF
   NEXT
   RETURN n

Then I created funcky.hbp file

# I want a library
-hblib
# I want to call it "funcky"
-ofuncky
# 
src\funcky
</pre>

And built it
<pre>
hbmk2 funcky.hbp
</pre>

The library was successfully created.

I then updated exe01.hbp file adding the following lines:
<pre>
-L.
-lfuncky
</pre>
to link the library just created. I could have used a .hbc file. I will, in the future.

Now the build process reports only the functions that should be in the missing application-level library. I create a .hbp file for this new library (I called it applib) adding all the decompiled functions that I'm aware of, using the blinker generated file as a guide.

1
# I want a library
-hblib
# I want to call it "applib"
-oapplib
# source files
src\decomp01
src\decomp02
..
src\decomp13

There are still missing functions reported by the linker, and all used in the first source file only. Looking at the names and how they are used in the code, they are almost all related to screen handling, to create fancy effects, like imploding or exploding windows... Just one is another clone of padr function.
So I create all empty stub functions, except for the clone function, and finally added a function I usually use for debugging:

#pragma BEGINDUMP
#include <windows.h>
HB_FUNC( OUTPUTDEBUGSTRING )
{
    OutputDebugString( (LPCSTR) hb_parcx( 1 ) ) ; 
}
#pragma ENDDUMP

Now the last step, trying to build the program and finally get the new missing functions from the next source files.... ready ?

hbmk2 exe01.hbp

Nice sensation when you get the errors you were looking for !!!
The errors had the same pattern: if there is only one missing function the linker processes the following file, otherwise it stops. So I have an error for file02.prg and file03.prg but 3 for file04.prg so I have to solve them before getting new missing functions...

Now I have to loop: compile and get the list of the missing functions. If the function is from Funcky lib, check its docs and its usage, then create a clone.
If not from Funcky lib, check in the decompiled files. At the moment I had just 2 minor executables decompiled; the main one, with almost all functions, fails to decompile. I know this will be a problem in the future... I know...

The job was quite repetitive, but in the process I found something interesting...

I had to correct this code snippet that failed to compile:

   Chr( K_CTRL_HOME ) + Chr( K_DOWN )  // fails to compile in Harbour

there was obviously a missing KEYBOARD()... I don't have Clipper 87 installed so I can't check if it also failed to compile... but anyway, it never worked as intended...

I found several Funcky functions used to set/delete the volume label, or check if a drive is ready (A:) or if the passed string points to a valid dir. I created stubs to always return .T. These functions are used just in a couple of source files to perform backups, move data between clients/year or create specially formatted data to upload/send for tax filings or other law related matters.

Then there was a subtly named function, used in this way

CASE ISCHAR( LastKey() ) .OR. LastKey() = K_ENTER

I quickly, and wrongly, converted ISCHAR to HB_ISCHAR. Wrong, wrong, wrong !
Since Lastkey() always returns a numeric value, HB_ISCHAR will be always .F. Instead, the real job of the function is to return .T. if the pressed key is a "char", a letter of the alphabet. I will check better in the following lines to understand if the Lastkey() value is used for some other job, in the meantime I created this:

FUNCTION ischar( nValue )
   IF nValue >= 32 .and. nValue <= 128
      RETURN .T.
   ENDIF
   RETURN .F.

Another function I found is Funcky ReadScreen(): it returns the string at passed screen coordinates. It's used only once and I plan to replace it, when I will look at that code. At the moment I just exit the application.

FUNCTION ReadScreen()
   ? "Function ReadScreen not implemented"
   QUIT

Then there is one last function that I hope I can replace easily. This function is not listed in the blinker generated files. Looking at the few places where it is used, it seems to be a sort of "change directory": it is used in the sourcse that handle index creation and file copying. I will have to understand it better. In the meantime I will create a stub function that prints a debug line and returns .T.

To have my program compile and link I had to add a couple of functions from the decompiled executables, to write 6 application functions and some more Funcky stubs (see code below).

Of the 6 rewritten functions, one is related to licensing (that was already disabled in the code so the code could also be removed); another is a clone of pad, some are screen related, all created as stubs.

The first executable is now ready to be tested. The environment variable set to the root data directory, program launched....

Now I'm were I was a couple of days ago, but with clean, well formatted code, the process documented in this blog and some ideas on how to continue on the drawing board...

The program runs, the menu appears on screen... now the next step is to let the software find its data files...

During the compile/link/error/add_functions loop I added the following functions to funcky.prg

// get status of Clipper's various set commands
// PROBLEM: incomplete
FUNCTION Status( nIndex )
   DO CASE
   CASE nIndex == 6
      RETURN( _SET_CURSOR )
   END CASE
      clear screen
      ? "Richiesto status valore ", nIndex
      QUIT
  RETURN .F. 

// csrput() - place cursor at specified row/column
// PROBLEM: is it correct ?  
FUNCTION csrput( y, x )
   @ y,x SAY ""
   RETURN .T.


// palette() - access 64 colors on EGA and VGA adaptors
FUNCTION PALETTE()
   RETURN 1

FUNCTION SWAP( p1, p2 )
   LOCAL t := p1

   p1 := p2
   p2 := t
   RETURN .T.

The missing sources and data directory

As I said in previous posts, before delving more into the code base I decided to do a simple test: try to compile and link the first program run by the batch file, just to see what to expect.

Porting the .lnk file was quite easy. The original file was

BLINKER INCREMENTAL OFF

FI EXE01,FUNCKY,RESIDENT

SEA FUNBLINK

BEGINAREA

  FILE file01
  FILE file02
...
  FILE file40

  ALLOCATE OVERCL
  ALLOCATE APPLIB
  ALLOCATE FUNCKY
  ALLOCATE EXTEND
ENDAREA

LIB CLIPPER

The converted file is (with the first 3 lines not strictly necessary)

-gui
-gtwvt
-inc

-oEXE01

src/file01
src/file02
...
src/file40

I was sure I was headed to hit a wall… but I was deliberately looking for a failure!

The results were (from the log I saved):
hbmk2: Suggerimento: aggiungere l’opzione ‘hbxpp.hbc’ per la funzione(i) mancante: CurDrive()
hbmk2: Suggerimento: aggiungere l’opzione ‘hbct.hbc’ per la funzione(i) mancante: Center()
hbmk2: Suggerimento: aggiungere l’opzione ‘hbblink.hbc’ per la funzione(i) mancante: BliOvlClr()

And then other 30 missing functions.

Some that I found in Funcky docs: CLS(), ROLOC(), PRINT(), BORDER(), UNCLOCK24(), ONDO(), FLRESET(), CLOCK24(), ALEN(), CHDIR(), AMAXSTRLEN().

And some – I won’t name them – that were in that missing application lib I already told about in a previous post. Among the 700+ files present in the directory there were a couple generated by the linker that list the object files and where they come from so I could confirm I had no sources for them.
A couple of the function’s names referred to the screen: setting attributes, showing messages on screen, setting fonts (we are talking about EGA/VGA times….).
Others were file related functions, to open database files. Others were utility functions and a couple were clones of padr/padl/padc functions that were not present in Clipper 87.

Using the Funcky docs found online I was able to quickly rewrite the most important ones, with very basic functionality, discarding the ones like the clock (on-screen clock was good to have on text-mode programs, not in GUI programs…).

Looking at the code I was able to quickly create stub functions for the others: it has been a very interesting process, trying to understand by the function call which was their use, parameters and the expected return values.

During this process I had to look at several source code files and noticed that there were calls to functions that were expected to be in the missing library and that I had still not rewritten! More, the only missing functions were from file01.prg… possible that the other 39 files used no missing functions?

This surprised me a lot. So I swapped the order of the files in the .hbp file, and put file03.prg as the first one. Now the build process reported different missing functions – but always related to file03. I could not explain exactly why it was happening and had several hypothesis, none fully verified. For one of them I thought to have an answer: browsing the source code I saw that Funcky ondo() command was used.

Ondo has this syntax:

ONDO(n,'file03()','file04()','file05()','file06()','file07()')

Depending on the value of the n variable, it calls the function listed in the n+1 parameter, using the macro operator. This means that you must have it linked to work at runtime. My idea was that Blinker was linking them while mingw linker wasn’t… and some other strange ideas…

But this was not the problem. It’s really a lot easier: if there is only one missing function to link in the file currently being linked, the linker continues and reports errors from other object files. As soon it reports more than 1 error in a linked file, the other errors from other object files are not reported.

It’s impossible (unless there is a linker switch to use that I’m not aware of) to have a full reports of all the missing functions in all the files.

Doing the tests and looking at the blinker generated reports I found, I could list more than 20 functions I had no source code. I’m not sure the software house has these sources or can provide them to me. So I decided to do something I didn’t want to do: use a decompiler. I don’t own one but a trusted friend of mine does, so I sent one of the executables and I got back well formatted, easy to read, source files for the application. Using the blinker generated file I could recreate the source files reproducing the same structure as the original. For a moment I also wanted to use the decompiler to generate more readable code for ALL the source code but I fear that it may introduce problems – and I will lose all the comments, remarked code, and so on.

Among the decompiled functions there are the routines used to open data tables, create index, create temporary tables, lookup values. Some of them could apparently be easily rewritten, but some have some not easily understandable features or side effects. Original (well… decompiled) code is better.

Ok, so finally the program run, the main menu appeared on the screen but any option I selected would stop the program. I made sure the modules were correctly linked and now the program stopped after a few more program lines.

In the second post I talked about the structure of the data files: one root directory pointed by a environment variable and one subdirectory for each client/year and the data files that can be in both root and subdirectory (the latter has precedence). The main menu, when you choose an action and a workspace is not selected, lists all the available subdirs and then uses chdir() to set the working directory. Then, depending on the selection, executes the menu action or sets the errorlevel and exit the program letting the calling batch file to execute the proper executable.

The program could not locate the data files… so it arrived the time to start to look at the code.

The code of this part is not very complex but it gave me an idea of the difficulties I may need to solve in the future. Let’s have a look at the first few lines of this function, keeping in mind this is Clipper 87 code!

drive=CURDRIVE()+'\'
dir=SUBSTR(CURDIR(),4)

dir=SUBSTR(gete('ROOTDIR'),4)

n=AT('\',dir)
IF n>0
  drive=drive+LEFT(dir,n)
  dir=SUBSTR(direct,n+1)
ENDIF

ADIR(drive+'*.*',a6,.T.,.T.,.T.,a7)
nDir=ALEN(a7)

The first thing to note is that there is a really strange code: drive and dir variables are set, then dir variable is overwritten, but not drive. I know that ROOTDIR environment variable points to the root directory of the application and from the code I see that is quite mandatory: without that env variable aDir() reads the root directory of the current drive not the root directory of the application. It can be a valid setup but really not a good idea to have all the subdirectoirs in the drive root dir… unless it is a drive dedicated to the data.

My idea is to change this code to have the env variable mandatory and exit the program if not set. In this way, any data setup you want, you may have it. Now the code is just 2 lines (check of ROOTDIR presence is done at startup):

drive := gete( "ROOTDIR" ) 
ADIR(drive+'*.*',a6,.T.,.T.,.T.,a7)

I don’t know yet if this change is good or not since it needs to be validated with other issues I will introduce now.

The code after this snippet loops on the a6/a7 arrays and when it finds a directory it does:

 IF CHDIR( drive + a6[nIndex] )
   IF FILE( 'file1.DBF') .AND. FILE( 'file2.DBF')
      my_netuse( 'file1' )
      company = FILE1->companyname
      CLOSE ALL
      // not shown: add the data to the array to be used
      // later for aChoice
   ENDIF
 ENDIF

From this code I learn that file1 and file2 must reside in the subdirectory and that chdir() is used to set it. But with my previous browsing in the code I remembered to have seen something of interest…

Infact in one of the initialization functions, this code is executed:

_temp=GETE('ROOTDIR')
_path='.;'+_temp
SET PATH TO &_path

This code tells Clipper to try to open the data files in the current directory but if it doesn’t find them, to try in the data root directory.
Looking at the docs, the file() function respects this directive, and checks if the file is present in both directories…
my_netuse function opens the data file and can’t handle paths, infact use its parameter to name the alias:

 use &par alias &par

I hope you start to see the problem I’m facing: the code expects to be able to set a working directory AND to be able to keep this setting between executables (something I was not able to achieve using hb_cwd calls) and completely rely on Clipper to handle where the needed file actually is.

So, related to this particular aspect of the application, I’m starting to think about the following points:
– include all the code into one executable (instead of the 15/16 currently necessary)
– have a public variable to hold the current working directory, just for reference, if necessary
– change the program to use the (properly validated) directory in the env variable as the main data directory, with one subdirectory for each client/year; hb_cwd should work in one executable, so that no other changes in file handling code would be necessary.
– use Harbour, 2015 era, directory functions that must also work in linux

Since we have just one executable we won’t need to keep the settings between executables. I just hope that somewhere in the code there isn’t something stopping this idea…

Quest for formatted code

There was a post already scheduled to go online during the weekend that describes the first tests done with the code: just to check if the code was complete, which libraries used, if compiled, etc, I did some quick and dirty changes to the code. The changes were not documented, and done one after the other with the goal to have the code compile and link. Documenting the steps I probably misplaced something, so I want to redo everything in order, with proper documentation.

In the meanwhile I also started to document the progres in this blog so I took a decision: start from a clean situation and document all the changes. And the first step is to reformat the source code with a common standard. Actually there is code I received, code decompiled and code I wrote (the latter 2 will be described in next posts)

As I wrote, I received a compressed archive, decompressed it and moved files in different directories and then all of them (and I mean ALL of them, including useless clipper .obj files) were added to a mercurial repository. I also the decompiled code and now it is time to cleaning up the code.

The tool of choice is hbformat, of course. I already know that committing the hbformatted code will create a monster diff but I think we, as programmers, should work with source code that is easy to work with, pleasant to read…

And my idea was that a quick hbformat *.prg would be the solution… I had to change my idea…

hbformat *.prg

The result was depressing… There are at least 3 problems with hbformat running on this code.

The first problem is that code uses short 4-chars or abbreviated commands: among them ENDC is used instead of ENDCASE (or END). ENDC is not recognized by hbformat. Changing it manually to ENDCASE works ok.

The second problem is that it doesn’t reformat the code (all lines start at column 0) until it finds one procedure or function statement. And almost none of my 200 files has a procedure or function statement as first line of code, using implicit declaration.

The third one is that hbformat uses stdio (OutStd()) for progress report and stderr (OutErr()) for errors and there isn’t the filename in the error message. Since you can’t redirect stdio and stderr to the same file, it means that you have a file with the progress log and one with just the error type and line number… useless.
So I went to hbformat source code and added code to print the filename when an error occours:

OutErr( cFilename + ": error", oRef:nErr, "on line", oRef:nLineErr, ":", oRef:cLineErr, hb_eol() )

Now I know where the errors are! Problem 3 solved.

Now there are 87 files to manually check for errors, one by one… I don’t trust global source and replace…

After checking about 10 files I noticed they all had the ENDC command that was misleading hbformat logic, so I had a look at hbformat source code formatter routines to try to add support for ENDC but I stopped after 20 minutes… better changing by hand…

So, revert to the original files and from the error log, with a bit of vim magic, create a batch file to load all 87 files and start to manually change the files. But this time I noticed that the programmer also used OTHERW in a case statement. A quick check and I confirmed that OTHER was not recognized and not properly indented by hbformat.

Ok, after 20 minutes all ENDC and OTHERW were converted to the longer forms.

Now let’s address problem 2: add a procedure/function line to each source file so that hbformat can indent the code. As usual I will do it manually, so that I can do a little check of each file. In the process I discovered several duplicated files, some tests, some empty, some really strange. There is need for a cleanup. It took almost an hour but now the code is ready to be formatted:

hbformat *.prg

Finally, no errors reported. But is it the code properly formatted? Unfortunately it isn’t.

In the middle of the code there are these lines

EXTE HELPER, CALCOL
SET KEY K_F1 TO HELPER
SET KEY K_ALT_F1 TO CALCOL

EXTE is the abbreviation of EXTERNAL, used to tell the linker to include that functions. I checked in Harbour and preprocessor correctly converts the SET KEY to code blocks so it isn’t needed. I can’t be sure for Clipper 87. Anyway, in hbformat code, when EXTE/EXTERNAL is found, the internal state machine is reset and following lines are at column 0, like if we were out of a procedure. This is another problem of hbformat since these commands can be anywhere in the code.

I moved the EXTE directives at the top of the 16 source files in which they were present. Then a new formatting round:

hbformat *.prg

I was expecting that hg stat reported just the 16 modified files. Instead it reported more, for this kind of changes:
diff01

I browsed 30 files and found no visible errors in the formatting so, finally, it’s time to commit. The diff is a mess as expected but in the future I may strip the previous commits and have this one as the first.

Just to have the source code in a form that is easy, or confortable to work with, took several hours and needed a change in a tool to know exactly where the errors were. More changes are needed since the tool was not able to cope with completely valid Clipper/Harbour code: that time-consuming manual job was needed.