Summary:
Since the original article was written, there have been many changes and improvements to the J language. While the mathematically oriented users no doubt drove these, many of the changes have not only simplified the code, but also provided considerable speed up to the operations typical of commercial processing. As a result I have updated the original article with discussion of these changes.
J, like its precursor APL, is used in business for financial and mathematical modeling, analysis of large databases, engineering, actuarial work, forecasting, operational research studies, and other applications where a knowledgeable person is mostly employing it in their own work or as a teaching and communications tool. Quite often those employing it are very reticent about how they use it, or just exactly what the application consists of. They view it as conferring a definite edge, a competitive advantage in their business that they would prefers others not be aware of. When competing for fixed price applications development contracts, others are equally disinclined to broadcast the fact that they employ such languages to gain a speed and cost advantage. All this makes it rather difficult for an IT or business manager to evaluate whether J is applicable to their own business.
The J language is a powerful interpretive language with a concise syntax. Its very power and conciseness makes it difficult for a novice to master, but also makes it of interest to a manager because little code and effort is required to achieve results. This article discuses the features of J, which make it applicable to typical commercial processing, tasks. It provides simple code examples to illustrate for a manager how little code may be required to implement powerful data manipulation capabilities.
Hopefully the article will be of value to those evaluating high-level languages suitable for commercial processing in a distributed, client/server or mainframe environment.
The J language is an interpreted language, like others such as Java, and Visual Basic. The interpreter is written in C so as to be portable to multiple operating systems. It currently runs under the various flavors of Windows, Mac, and Sun OS, and various versions Unix and Linux. In hardware it runs on Intel's X86 architecture and equivalents, as well as the Power PC, Alpha and all the way down in scale to ARM in a pocket PC running CE. There is a Window and forms capability. On the PC, it can interface with DDE, OLE, Java, ODBC, and so on, so as to permit its ready integration into environments using multiple tools, languages and data structures. Integration can also be in the other direction, a J engine being called from more traditional applications.
Run-time only versions of applications software can be distributed to multiple systems with no per-seat charge. This can result in developed applications having a definite cost benefit advantage over purchased ones when many users are involved. The base developers edition with support is priced for the PC market rather than mainframes, which also means a relatively low cost. To support evaluation of the product, the full version is available for down load over the Internet so that an evaluation can be performed at the cost only of time.
If maximum advantage is taken of code reusability and object oriented design, use of the language in commercial applications removes code development and maintenance as a constraint in the IT function. The conciseness, reusability, data independence, modularity, and ease of extension with a problem specific syntax, means very little code is needed to run a business. In many ways, J, or rather, tools built with J, should be part of the every analyst’s tool kit, since most programming is potentially bypassed in the application development process. At most, extensive commercial development in J needs a small (1 or 3 or 5) member core of J experts supporting a much larger complement of analysts and application developers.
While not all analysts or programmers are adept at the type of thinking required to master J, this is compensated for by the fact that its use greatly reduces the numbers needed. As stated by one programmer who must have preferred typing to thinking, "it's OK if you want a quick solution". Right. But I have yet to encounter a manager who wants a slow solution.
If for some reason J is not wanted as the "main" language, it (or better still, tools created with it), can be used for Rapid Applications Development (RAD), with several design iterations in production use before the code is frozen and treated as the system specification for another, more "acceptable", language. Of course, given the backlog of applications usually needed by the organization, one might wonder why resources are available to convert a perfectly satisfactory implementation, rather than moving on to the next problem.
The performance of today's hardware is such that one should expect what might be termed Real Time Commercial (RTC) processing instead of long batch processes. By employing larger main memory and the high-speed disk transfer now possible for contiguously located data, virtually every process should be essentially instant - instant in the context of the expectation. Payroll for 1000 employees in the time it takes to lift a coffee cup after hitting the enter key. One thousand invoice images or EDI transactions created in about the same time. Fortunately the J language provides a number of features to make this possible. Besides its array processing capabilities, there is a built in means of measuring the memory usage, elapsed time, and processing speed (CPU usage) of individual modules to simplify performance tuning. It also simplifies the evaluation of programmer competence, since lousy code is easy to identify.
During development it is a relatively trivial matter to measure these performance parameters for any or all modules. Because it is interpreted, each piece of code is executable on line from the developer’s keyboard and screen. Each module can be tested, optimized to minimize or maximize a given parameter, or set of parameters, and then incorporated into another module. The whole application can be similarly performance tested with minimal effort. All of this allows development of a high performance design, or easy analysis of an existing one that does not seem to be performing as anticipated.
The high-speed performance objective is sometimes denigrated as unnecessary. If Material Requirements Planning for a plant runs once a week and takes several hours, does it really matter? But it is only run once a week because it takes several hours. If it took one minute or less, it could be run any time, on demand. Then one can really manage a "Just In Time" (JIT) operation. Speed, in other words, is desirable in commercial processing and any technology that makes it easier to achieve, is also desirable.
Besides the better management that can be achieved with fast response, response time is also a factor in the cost of running the IT operation itself. Critical batch processes that take a long time to complete cost more in terms of check point/restart procedures, extra duplicated files, operators and business analysts on call or standby in case of failures, and so on. If no process takes more than 20 minutes, most of this extra cost disappears. If it drops to a few seconds or minutes, all such processes can become the responsibility of the owning department - with no IT involvement. Such a change has a major impact on cost and complexity. Use of J makes such a result possible.
One could argue that similar results can be achieved with a variety of other languages. This is true, but the array processing and easy measurement features of J make it relatively easy for a small group of developers to provide such benefits to an organization.
One of the major problems that IT management has in overseeing an applications development effort is the amount of detail involved in the code. When it requires pages and pages of code for even simple data manipulation, few, (very few) IT managers will bother to look at it, let alone to evaluate the code. Unfortunately, "God is in the details". And the "got you" is there as well. Even the program structure and problem solving approach is difficult to infer from the mass of code needed for most programming languages.
With a high level language such as J, very few modules require even as much as half a page. Indeed, J suffers from the opposite problem, that a single line of symbols can represent the whole of a critical code module. While few managers may wish to decipher such code, it is easily tested, demonstrable as to performance, and understandable as to function within a visible structure of a few pages. Because it is so much more compact than C, COBOL, Visual Basic, or Java, the total application can often be presented in a few pages along with "where used" lists to show which modules are invoked where, the calling sequence hierarchy, and so on.
In other words, while the language may appear indecipherable to the casual reader, a manager can readily infer the total picture, get the gestalt view, while the details can also be demonstrated and tested on line, with little effort. All that is required is meaningful names for the verbs used to process the data and some simple conventions as to structure and naming for reasonable readability.
Another advantage of J as a development language is the ease of using either a top down or bottom up design approach, or of mixing the two. Functions operate on arguments passed as global values or directly, and results are returned similarly. One can therefore create a whole structure with dummy functions that pass predetermined results, or create specific lower level modules to test out approaches and performance.
Critical modules can be assembled in a simplistic manner just to have something working, and then passed to the more expert for rewriting in a more compact or faster form. Any team working on the design and implementation need not all have the same expertise, but best of all, it can be small. The smallness of the code is mirrored in the smallness of the required development team. This reduces the communication needs and further expedites the implementation process. While the team size can often be one person, continuity of support and better design choices would indicate more than one person for most critical development.
What is it that makes J suitable for commercial processing? What built in features of the language give it its relevance? It is somewhat difficult to provide a management overview of why program code is suitable without getting into too much detail for the reader. Nevertheless, let us try.
Suppose the requirement is to determine the sales dollars by numbered sales Group. Such a requirement is representative of many commercial applications. The data might consist of 2 lists, a list of sales Group identification numbers, along with the values of the related sales dollars. The result desired is the list of unique Group numbers and associated total sales dollars. Calculation means sort or otherwise manipulate the data to arrange the lists, determine how many distinct Groups exist, add up the values belonging to each and returning two lists. This is not a completely trivial calculation, and would require quite a number of lines of code in most languages.
In J, it requires a few symbols, namely " +//. " to add the sales values by Group number, " ~. " to determine the unique Group numbers, and " ,. " to convert the 2 lists into an n x 2 table.
There is no need to determine the size of the lists being handled, allocate memory space, manage any stepping through the values, etc. It does not matter if there are 10 values or a million.
To the uninitiated, this "code" in J may look like a sequence of meaningless symbols. To a manager, the important insight is the very small amount required to achieve a significant result. If other data manipulation is similarly simple, why should development require such large resources?
Consider another typical case, the need to fill in a template with data from a record. This might be to create a customer statement or invoice to be mailed, an HTML form to be displayed, or some other such form. Replacing specific character locations on the template by character data from the record can do this. Where the "_" represents the characters to be replaced, code something like the following does the job.
into =: (# i.@#)@:=&'_'@:] }"1
That was the code needed when the original article was written. Now an even shorter equivalent is:
into =: I.@:=&'_'@:]}"1
This is simpler than before and incidentally, it is also faster and requires less space.
With a thing called "Record" containing an ASCII version of the record data with the same number of characters, as there are "_" positions on the character template called "Form". The phrase:
Record into Form
Then yields the desired result of a filled in form.
While the sequence of symbols in the code may appear meaningless, such a reaction is no different than saying an unfamiliar language is meaningless to those unfamiliar with it. Each symbol or combination of symbols has a precisely defined function. The important result is the conciseness, and associated ease of development and testing.
Again, ignore the apparent jumble of symbols. Just note that only 22 were required to achieve the result and now it is only 15 plus the name given to it. The same capability in some other typical languages could require pages of code.
Suppose the Form is a statement from an electrical company, while the Record is the customer address and related electricity usage and charge data for the month. Then " Record into Form " can yield an image of the invoice to be mailed to the customer. More important, where Record is a suitably formatted file of 10000 customer records and Form is the single template for all invoices with embedded CR and FF codes, " Record into Form " yields all the images for 10000 invoices. It used the identical code and the only extra required was the 2 symbols "1 to make it applicable to 1 or many sets of data. The point is:
One should also note that the same small piece of code defined by " into " could be used with any number of record sets and corresponding form templates. There are no embedded sizes in the code which make it specific to a given record and form, no naming (in the code) of each of the fields of data to be transferred, and so on. The data sizing and naming requirements of more conventional languages are not present, which is what makes the language so applicable to the creation of data independent, re-usable code for commercial applications. In addition, the form content itself can be generated dynamically from dictionary data and some simple layout rules.
A further advantage of this concise code is that code distribution over a network is fast. Typically applications such as data capture to be performed on a client systems might require as few thousand bytes of code stored as scripts in ASCII form. One can therefore place all applications code only on servers where it is more easily controlled and maintained, with only the relatively small and stable executable for the interpreter preloaded on the clients.
In looking at the total development process, the usual so-called "waterfall" leading from initial statement of requirements through design, reviews, coding, etc., to final testing and release, the coding part represents a small fraction of the total. Even reducing it to zero would not appear to have much effect on the total duration. However, such a conclusion ignores the reason why the process has become so long and complicated.
The reason is that it is much cheaper to fix design errors in the early stages, rather than after coding completes. Consequently the design process tries to ensure rigor in the early stages, complete specification of requirements, and full documentation and review at all phases. Because it is so time consuming and expensive, extra processes, project management techniques, code release controls, etc., must all be added to try (usually unsuccessfully) to prevent bad design, the release of errors in the code, bugs, and costly overruns. The whole process thus becomes even more expensive because of the risks inherent in an already expensive process. This is a self-defeating cycle.
Let us consider the impact of time and cost on the whole process. And let us think of it as production of a widget, not an application.
Think of the limiting case where production is free and instant. Would you set up a project management process and associated meetings for a product that could be produced by tomorrow and at no cost? Would you spend months on the design specification? Obviously not, although I have seen it done. You would talk to the builders and when they seemed to understand enough of the needs you would request a prototype to evaluate. No doubt the first one, instantly produced (well almost), would miss some performance or other attribute, but it does not matter since a new modified version is equally fast and cheap to produce; and so on, until the perfected unit is produced.
In other words, the whole cycle changes with rapid development. Costs and times are reduced in all the phases, not just the coding.
In actual use, and despite its power, J should usually not be used directly to implement commercial applications. Rather, J is used to implement a series of even higher-level tools, which are then used for the implementation. By this approach the tools can incorporate data, coding standards and design rules to meet the business needs. Via the common tools, these rules are then automatically incorporated in every application. Furthermore, future development is almost entirely with approved tools, which are already tested and in production use by other earlier applications. Once the tool set is created and in use, the risk of developing with untested tools is largely gone. Given the similarities between the various applications needed in a business, this is also a factor that contributes to reducing the needed code and hence the cost of maintenance.
A further advantage of the tools approach is that as the tools are upgraded (a new automatic form layout rule, a faster algorithm, or better auto formatting for reports or displays), all the applications receive the benefit while maintaining the common approach and standards.
While J includes an Application Programming Interface (API) for windows forms, which meets most business, needs, it is also possible to use other tools such as Visual Basic for the forms, calling J code for the actual data processing or business rules. The approach is valid, unless one wishes to apply comprehensive data standards.
One advantage of Visual Basic for forms design is the advanced graphical user interface to assist in laying out a form, entering label names, using drag and drop to size data entry fields, locate buttons, and so on. A similar capability is available in J. While these features expedite manual design, they also allow each individual designer to adopt their own naming conventions, field sizes, and other data attributes that are better defined in a dictionary. In addition, if any changes are needed to a common data element that appears in many forms, it is necessary to find and manually change the code for the multiple form layouts incorporating that element.
An alternative, and far better approach, is NOT to write or create code to generate each form required - which usually adds up to hundreds of forms in all, but to produce code which dynamically creates the forms at run time following built in rules, plus the data attributes found in a dynamic dictionary. The "tools" (referred to above as the preferred development approach), use such an approach, and the interpretive nature of J makes it rather easy to achieve.
Because of the relatively low level API required by windows and the lack of any formal standards as to behavior of forms as a whole, the event driven windows interface requires much more code than is required to implement the basic data processing. Typically in J, (even without adding the more verbose Visual Basic for the forms), adding a windows interface to a "command line" J version of a powerful query capability means at least 3-5 times more code.
Stated another way, if it takes 2 pages of code to provide some data processing capability, it probably needs another 10 pages to provide a windows interface to the process. This extra effort can be reduced by instead writing code which itself creates code for windows forms. A sort of generic Meta code created with just a little more effort than that for a single set of forms. The first one may cost more, but the extra cost of the windows interface is then spread over many subsequent applications and provides one common look and feel for all of them.
Typical commercial data processing consists of:
It is an approach necessitated in the distant past by a punch card record and limited memory space, but still used with today's technology when even a PC can have 2GB of memory. The record-by-record approach requires navigation of large amounts of code for each record processed. In the case of an interpreted language without a JIT compiler, which allows retention of compiled code, it also can mean re-interpreting the code to be processed for every record. That means potentially thousands of bytes interpreted to process tens of bytes of each record.
The alternative approach is to pass code against data rather than data against code.
Now the code is interpreted and navigated once to process all the records read. In practice, virtually all commercial file processing can be restated as an array process of this type and consequently allows very fast processing with such languages a J - often much faster than the equivalent compiled languages doing one record at a time. Such array processing could also be implemented in some conventional languages, but they lack the many features in J, features that greatly simplify such coding. .
Utilizing the array capabilities of J in commercial processing is pretty much a requirement for acceptable performance with large data files. This does mean rethinking the typical process and perhaps redesigning the typical data storage approach. While one can argue that a similar approach and corresponding benefit can be gained by using a compiled language for the array processing, there are few that are suitable. In addition, the size of the arrays processed can vary each time, which means more complex programming and memory management – a requirement that is handled automatically by J.
Another feature of the J language is ease of code reuse. Frequently used pieces of code can be made available as utilities to be included in other code as needed. The user can create these common pieces and the language developers also provide a large number of the most commonly used ones. The distribution includes many scripts for specific kinds of data manipulation such as string manipulation, file handling, or inter process communications.
These scripts distributed with the software cover everything from a simple command, which deletes extra blanks in a string, (as might be used in removing extra spaces from an Electronic Data Interchange transaction), to complex numeric formatting or communications support. Other scripts support graphics, actuarial calculations, file access, statistics, Microsoft’s OLE, ODBC, sockets, window forms, form grids, and much more. The developer is thus assisted by the availability of many reusable modules as well as by a set of idioms or phrases – short groups of code symbols that perform typical processes on data. The modules also provide a template or guide for the development of new modules, and they help in explaining use of the various features of the language. The supplied scripts are often generic, so as to apply to a broad range of processes or multidimensional data. In a specific application with more restricted needs, a simpler, faster implementation may suffice and can be easily created by modifying the examples given.
J does not have one. It is made available as a language, not a commercial applications development environment. However, adding a dynamic data dictionary for commercial applications development is not difficult. Treating the Meta data of a dictionary as just another piece of reference data means that creating a dictionary is similar to creating a first small application. Once done, it becomes the repository for the data element attributes, as well as the attributes of the records, tables, transactions and forms, the validation rules, business rules and algorithms.
As a "dynamic" dictionary accessed at run time or transaction time, it obviates the need to embed data definitions in the code. While this can involve a slight performance penalty at start up, it does mean that maintenance associated with data changes is confined to the dictionary and need not involve any code changes. In the long term this has a major impact on cost of ownership. Just think of how little impact the Year 2000 problem would have had if all applications had been built with a dynamic dictionary and data independent code!
The language comes with scripts to implement various types of file access. The tools developers can extend these to support inverted file structures, transaction storage, code storage, local caching of reference data, WORM data access, and so on - file structures and access techniques specifically optimized for the type of processing, business needs and geographic distribution of the organization. In addition, the language supports ODBC for access to SQL compliant databases, with tools to retrieve multi records or fields from records in the array form best suited to subsequent processing in J.
The ease of use and performance of the J specific files, including a capability called “mapped files”, confers such a performance and ease of use advantage over other systems that all the organizations needs, can likely be met from the J environment. Use of, and access to, the other file types in more traditional data bases would likely be needed only if they must maintain access to legacy applications that were not yet supplanted.
This rather esoteric language - J, offers many advantages to an organization and its IT department for commercial data processing. The power of primitive operators in the language itself, the conciseness of presentation, the ease of code distribution in the form of small scripts, the speed of development and testing, the integration into the windows environment, and so on, are all capabilities one looks for in an applications development language. The major drawback to its ready adoption is one of perception. It lacks the hype applied to Java, the marketing muscle of a giant corporation supporting Visual Basic, or the name recognition held by SQL. Not being recognized in the IT mainstream, it also lacks what I call the "resume" factor - it is not a tool the aspiring analyst/programmer feels they need, or even wants to have show up on their own resume. As such, it has difficulty even getting evaluated, let alone used. The IT manager who may know about it is also somewhat ambivalent. Even if they recognize its utility, do they want to introduce an approach that could potentially reduce the size of their department? Would they be rewarded or penalized for increased productivity?
For related articles by the author see Index