Object Oriented Language Design

by Josh Levenberg

Definitions and Preliminaries

There is a body of work defining what exactly an object is. I will make a few definitions, but I may have to refine them later. For now, let me say that objects are like records or structures in that they provide storage for data, organized into ‘member variables.’ Objects are also associated with functions for operating on that data, called ‘member functions.’ The information specifying the interface for interacting with the object is called a ‘type,’ which usually includes the layout and types of the member variables as well as the names, parameters, and return types for the member functions, though not the implementation of those functions. It is possible for type A to be a ‘subtype’ of another type B if type A satisfies all the requirements that B promises, so A provides functions with the same names that meet certain criteria, etc. Note that in subtyping, we add more requirements to the type, like that it specifies more member functions. We will occasionally make the distinction between the ‘static type,’ the type known at compile time, and the ‘dynamic type’ or ‘true type’, the actual type of the object at run-time. Finally, a ‘class’ is a type plus code which implements all the member functions. A class D may be ‘subclassed’ from a class C, (so D ‘extends’ C, or D is C’s ‘child’, or C is D’s ‘parent’) which gives class D the implementations of functions from class C -- though D may then ‘override’ some of those functions, which means replacing C’s implementations with ones specific to D. Note that there are a few object oriented languages without classes, and in many others subtyping is tied to subclassing. Also note that terminology varies somewhat, especially when discussing specific languages rather than the literature on general object oriented language design. In particular, sometimes the term ‘class’ includes ‘abstract classes’ which do not have implementations for all member functions.

I will declare classes using the terminology above, rather than a specific pseudo code. I will generally stick to the following format:

(‘class’ | ‘abstract class’) <name of class> [‘extends’ <name of class or list of classes>] ‘has
( ‘member function’ <name> ‘(’ <parameters> ‘)’ ‘returning’ <return type> [‘with no implementation’]
| ‘
member variable’ <name> ‘of type’ <type>
| ‘
defines an implementation for’ <member function list>
)*

Note again that it will be an ‘abstract class’ if it has member functions without implementations. I may add ‘and’s and commas to make the class declaration read more like English.

There are many tradeoffs when designing an object oriented language. We will be most concerned with:

Speed: Generally having fewer lookups or levels of indirection is generally faster. If many constraints can be established at compile-time, various techniques such as inlining of functions may be used to produce faster code. At some point, however, greater code or data size can slow the program down more than other considerations.

Error checking or safety: it is nicer to have mistakes caught at compile-time rather than at run-time. It is nicer to have mistakes caught at run-time rather than not at all. Run-time error checks have the disadvantages of introducing a speed hit and delaying the detection of bugs (possibly until after the code is shipped) compared to compile-time checks.

Expressiveness: which generally falls into two types: the ability to express relationships and the ability to express functionality. Expressiveness of relationships is usually important in large projects but can be a burden in smaller projects if required. Expressiveness of functionality is especially important in smaller projects, but is always desirable.

Ease of compilation: doing more work at compile-time is generally better than inconveniencing programmers using the language or producing inferior output. At some point, however, language complexity can be a barrier to compiler implementations being made or being improved.

We won’t be so concerned about syntax, though this may have a big impact on the users of the language as well as the compiler. More verbose syntax can make the language less ambiguous and therefore easier to parse, but may be less convenient for programmers. Some syntaxes may be more familiar to programmers, but may be less expressive or more ambiguous.

An example of an error checking or safety feature of a language is if the compiler is able to insure that the dynamic type of an object is always a subtype of the static type. Such languages are said to be ‘type sound.’

I will make the distinction between objects and references to objects. This distinction is explicit in some languages, C++ for example. In fact, in C++ there are two mechanisms for referring an object: references and pointers. In Java, on the other hand, only allows you to declare variables which are references to objects (not counting the primitive types which are not objects). Unlike C++, however, Java’s references may be ‘reseated.’ Java does not allow pointer arithmetic, though. Some reference mechanism is usually needed to refer to an object whose dynamic type is not known at compile time.

Object oriented languages usually implement some form of ‘dynamic dispatch’ for when the type of an object includes a member function but the implementation of that member function is not know at compile-time. Typically, objects include a ‘virtual table’ which provides a mechanism for looking up an implementation (or function pointer) at run-time. The details of how the virtual table is stored, what information is needed to perform the lookup, and what information is provided as a result vary from language to language.