Approximate Date: 2001-2002 Computer: Intel Based Pentiums
Tools & Technologies that didn’t exist: -----
Historical remarks: There has been very little significant change in the technology landscape since that time. .Net was in Beta. I bought my Beta copy, which I still have, in late 2002. .Net released in early 2003. Java was starting to become a player as a backend webserver language but was still heavily reliant on the old AWT classes. Java JSP was competing against the non .net version of ASP which couldn’t begin to compare from a language perspective. Raw JavaScript still ruled the browser front end.
Sybase SQL had been purchased by Microsoft and we were now on MS SQL 2000. MS SQL contained a built in ETL tool in the form of DTS (Data Transformation Services) which was very powerful in its day but rudimentary having really been meant as a tool to get data into an SQL database.
“Reading” Excel workbooks
The “Asset Refresh” project was a bank initiative to replace all of the banks older servers and desktop computers with brand new units. The Asset management team was working very closely with IBM sales staff to maintain and provide weekly updates to an internal catalog of replacement equipment for staff who had been selected for new equipment, to select from.
The first challenge was that the Asset Management team was making the catalog available to staff via a Lotus Notes database while IBM was providing their catalog updates via an Excel spreadsheet and confessed they didn’t have access to any means to transform the data for the Asset team even though the Excel spreadsheet had itself been compiled though IBM internal automation! Put simply, this often meant Asset Management staff had to spend hours and occasionally days manually transposing data from the IBM spreadsheet into the banks Lotus Notes database.
I was eventually asked to explore ways to automate this data transposition.
The spreadsheet was a multi-sheet workbook with the main sheet outlining products and subsequent sheets dedicated to pricing of that single product and all of its components and accessories. Needless to say the number of sheets in the workbook were dictated by the number of products being offered in the catalog that week.
After a couple days and playing with simplified proof of concepts, I called a meeting to establish routes to the solution. Would IBM be willing to alter the layout of their spreadsheet to make it easier to code an ETL for or would I have to code an Excel workbook reader that would be able to seek out and identify specific verbiage on various worksheets in unpredictable locations? After all, these workbooks were designed for human consumption, not to be read by an ETL. Needless to say, IBM was unable so I got to work.
After about 9 days, I had completed an engine that would open an Excel workbook, read the product list, search out the associated product specific price and accessory sheet, identify which items on the sheets were items and capture their names and prices while skipping over titles and headings. It was an enormous success and did the job so fast that the person who used to do the data entry spent a couple of months wasting a third of the same time he used to waste on manual entries, verifying that the import was successful. He only ever found a single error which resulted from a typo in the IBM spreadsheet.
Dynamic On demand Reporting
The Asset Management team needed to build regular reports for various entities within the bank. To simplify the task, Diane, the manager of the Asset Management team, would ask me for custom views within her Lotus Notes database. Add to this Diane never wanted to delete old indexes. The challenge here isn’t with the Lotus Notes tool specifically, it’s with the technology of indexes.
An index is a very simple technique used to speed access to specific records in any data store, be that flat or RDBMS. The most common use of an index is to represent the sort order of records in your data store. Instead of showing you the record list in its raw form, the data engine consults the index which will have the record numbers listed in the order they need to be displayed in and the engine will then list them in that order.
If your tool gives you access to numerous sort orders (i.e., numerous indexes), that means your tool could be loading numerous indexes all at the same time! If there’s a lot of data to be indexed this could kill your database access. Diane’s database had 93,000 records in her flat Lotus Notes database and her performance was being hammered! I saw this problem building and started to work testing a solution.
What started out as a simple proof of concept for the approach ended up being completed in 10 days as a 3 page website based on Diane’s database that completely resolved her issue.
The first page offered a list of user friendly field names. Neither Diane nor her team would understand the programmer short names that had been used for the database fields so I needed to provide a translation to something more user friendly. So I came up with a profile document that took a list of field names the users were allowed to search by and a corresponding list of user friendly names to present to the user. Adding new fields to the reporting system or removing them was as easy as editing the profile record.
With the mechanism in place, users were presented with a list of user friendly field names and a blank list. They simply double clicked the field names they wanted and those names would be moved to the blank list. The first name in their “Selected names” list was designated as the sort column and they could actively move fields back and forth until they were happy with their list of selected fields.
The next page was a search criteria page and was modeled after my very old Dbase “Search by example” screen from the late 1980’s. The page was filled with the field names selected by the user and a blank field next to it. In the blank field the user could type anything they felt would help narrow the results based on the search value in the specific field. For example, they could enter part of a person’s last name in the last name field and an area code in the phone number field and the result set they would get back would be limited to results that matches in those specific fields only.
The fields search screen also included selection users could use to specify if a column should act as a totals column or the basis for some form of categorization.
The final page was simply for output selection. The user could choose to have the report displayed in MS Word, MS Excel or as a standard web page.
The reporting system worked exceptionally well. It was slower to build the more complex reports but all reports were custom and it completely alleviated the need for numerous custom indexes within the database itself thus restoring performance to the 93,000 record database.