Josh’s Object Oriented Scripting Language

Josh’s Object Oriented Scripting Language

(Maybe it should be called ‘JOOS’ pronounced juice)

Or:

Josh’s Object ScripTing LanguagE

(and called ‘JOSTLE’)

Chapter 1. Resources

In the beginning there was the resource.

When developing games these days, the largest part of the application are the assets. Assets include art, sound, maps, and all the other data which is created before the game is shipped. This is as opposed to the data which is created as the game is played. Assets are never changed when the program executes, they could be left on the CD (though may be copied to disk since CD access is slow). Assets need not all be in memory at the same time, in fact it is pretty likely that the assets won’t even fit in memory. Instead, assets are loaded from disk and then cached. Since assets are immutable, those which are not being referenced may be safely discarded when memory gets tight. This should be a big win over virtual memory systems which have to write memory back to disk. Clearly, they don’t need to be saved with saved games.

Actually, I should be more precise. Assets really are all the data (‘content’) produced by the artists, level designers, writers, and sound guys. Assets rightfully deserve version control, mechanisms for delegating work, and a bunch of other tools for asset management. Think of resources as the end product which gets included with the game. My game framework has a module called the resource manager for accessing those resources. The resource manager assumes that all the assets have been compiled into resource files using game specific formats.

The resource file begins with an index which maps a numeric resource id to a position and length in the file. Actually it stores a little more to support compressed resources, but that is transparent to anything outside of the resource manager and resource compiler. While I’m mentioning technicalities, the index may also indicate that the resource is in another file (‘linked’ rather than ‘embedded’ -- useful during development). The resource id contains the type of resource and which module it was compiled in, as well as a number to make the id unique (well, unique with a very minor caveat). Storing which module the resource was compiled from helps reduce dependencies between modules -- the compiler need only insure the id is unique within the module to guarantee global uniqueness. The resource manager reads in this index when the resource file is opened, the rest is loaded on demand.

Every resource has an associated type given by a number called the ‘type id’. Classes can be made loadable as resources by:

1) Assigning the class a type id unique to that class with the member declaration:

enum { TypeID=1 };

2) Giving the class a member function for loading the data:

void Load(ResourceLoader &rl);

3) Having a default constructor

Note that there is no global list keeping track of which classes are assigned to particular ids. The id is only used as a type check for safety reasons. Also note that the ‘Load’ function takes an abstract ‘ResourceLoader’ base class as its only parameter. This hides the specifics of how the file is stored: text vs. binary, endian-ness, compression, etc.

In order to access a resource, you declare a variable of type Res<T>, where T is some class meeting the requirements of the above paragraph. The parameter to the constructor is the numeric resource id. The constructor of the Res<T> class then asks the resource manger if the resource has already been loaded.

1) If the resource was not loaded:

a) Res<T> asks the resource manager for a ResourceLoader appropriate for loading the data.

b) Res<T> then creates a ResWrap<T> (which has an instance of T as well as a reference count)

c) calls T’s Load function

d) adds the resulting object to the ResourceManager’s database under the id number

2) If the resource is already loaded:

a) The reference count for the data is increased.

b) If the data was not already in use, it is removed from the ‘not in use’ list.

Once constructed, the Res<T> acts as a ‘smart pointer,’ emulating C++’s pointer semantics using operator overloading. Since the semantics of resources is for immutable data, the access functions of Res<T> only return const pointers/references. This implies that Res<T>’s may be copied (which increments the reference count) and the data may be safely shared. When a Res<T> is destroyed by going out of scope, the reference count is decremented. If the reference count ever falls to 0, the data is moved to the ‘most recently used’ end of the ‘not in use’ list.

The system described above has virtually no dynamic dispatch. The only virtual functions are destructors and in ResourceLoader. Instead, the template mechanism of the compiler figures out which ‘Load’ function to call. Note that the resource manager only manipulates pointers to ResWrapBase, from which the ResWrap<T> is descended.

The resource manager also provides a preload mechanism which makes sure the data has been fetched from disk. This function is not aware of the type of the data so no ‘Load’ function is called if the object is not already in memory. This function should take advantage of nonblocking file reads if the platform supports it.

In the code, if you want to refer to a resource without forcing it to be loaded, store the numeric resource id rather than a Res<T>.

My asset system can deal with a few version issues which crop up. It supports loading multiple resource files and varying which files are loaded. The resource files that are loaded might depend on if you want the strings in French, German, English, or Japanese. Some of the art depends upon screen resolution. This slightly relaxes the uniqueness of resource ids. Still, only a single version of a resource is available given a resource id at runtime. This is a handy feature, but it is mostly transparent. I only mention it here for completeness, and because it raises the issue of managing these different versions at the resource creation stage.

Chapter 2. The Object Model

Script files started out as just a description of what was supposed to be compiled into a resource file. Really they should have asset management features such as the ability to convert many different source file formats (.jpg, .gif, .tga, .png, etc. for images), detecting which files have been updated, etc.

The idea of the scripting language is to introduce another concept, called an object. Objects are going to contain all of the dynamic data of the script engine, but should be accessed in a way much like resources. So:

Objects have an id which uniquely identifies the object. You can think of an id as a segment selector.
The high bit of object ids is always 1, while the high bit of resource ids is always 0. Many primitive commands work with either; they check the high bit and do the right thing.
Objects have a type. Due to the dynamic nature of objects, the type is not embedded in the object id like for resources. However, there is a primitive function for getting the type of associated with an id which works for both object ids and resource ids.
Objects have a data area consisting of a sequence of primitive types. All primitive types take four bytes. This data area may be resized.
To access the data area of an object, you need the object id and the offset into the type.
The only special primitive type is an object reference. Part of the type information indicates which offsets in the object have object references. Note that if the reference is known at compile time to have a resource id, then it is optional for the type to indicate the object reference. This is most important for the ‘type’ resource, below, whose object references only refer to resource ids.
The other data types (int, float, bool, enum, ...) are not interpreted by the runtime engine at all.

A consequence of this is that even a thread’s stack and global variables are objects, since they need to contain dynamic data. The stack object will use all of the object features above, including the ability to resize. The first few data members of the stack contains the current function resid, the current instruction pointer, a base pointer, and a bit array storing which stack items are object references. There will be one global data object for each compiled module, and it will have an object id that is fixed at compile time. For enumeration purposes, global functions are considered members of the global data object. Really though, functions are only associated with objects by convention.

Now the clever bit is that both types and functions are represented with resources, and so can be represented by their unique resource id. In order to make the language more uniform, a type resource is also associated with every type of resource (this is recursive, but it terminates: there a single resource id representing the type of type resources, and it is its own type). The type of a resource is fixed, however, so certain optimizations are possible. Note also that some resources are opaque in that no interface is provided for accessing their internal structure, others (such as the type resource) overload such things as operator[] and can even overload the primitive instructions of the virtual machine.

Function resources contain their type (a resource id) and a sequence of instructions. The type of a function contains the return type, the number of parameters, the type of each parameter, and possibly a bunch of debug information. For uniformity, all instructions take some multiple of four bytes.

Type resources conceptually store a mapping of enums to integers, though some of those integers represent resource ids. These values are queried using the same instruction as to get the data at an offset of an object. The only difference is that the valid queries are not contiguous. Some enum subranges are reserved:

for ‘operator()’: describing the return type and parameters. This is the only range set in the type object for functions.
for objects: initialization data
the ‘parent type’

Otherwise the type typically returns the resid of a function given an enum. Some enums are predefined:

a mechanism for storing which data offsets correspond to object references
standard functions such as constructor, destructor, addref, decref, etc.
a mechanism for seeing which types are compatible with this type

It is also common to store constants (which may be overridden in subtypes), and other types (example: the type of the iterator associated with this container, or the type of the value contained in this container). Another possibility is to store data offsets of data members, but this capability is not currently provided. One could imagine specific cases where this would be useful, like that many dialogs have an ‘OK’ control.

Once you’ve queried the resid of a particular member function, the function may be called directly or its type may be queried. There is a primitive instruction for calling a member function given an enum and either object id or type id. Note that if given an object id, it passes that object id as the first parameter to the function. Note that subtyping is accomplished by having the subtype map a given enum to a function with the same prototype as in the parent type.

Primitive Instructions mentioned so far:

Get/Set value at offset X of object Y (can’t set the value of resources)

Get/Set size of data are of object X (can’t resize resources)

Get type of object X (I may allow object type to be modified, but certainly not for resources)

Call function X (exactly equivalent to applying operator() to X)

Call member X of Y

Get stack object

For an object Y, the difference between

T=Type of object Y

Call member X of T

and

Call member X of Y

is that the second passes object Y as the first parameter to the function.

Just to stretch your brain: due to the essential similarity between objects and resources, it is possible to make objects which behave very similarly to resources. Remember the runtime engine / virtual machine can overload the primitive instructions based on whether the high bit indicates its a resource or object. The idea here is to allow string resources to be completely compatible (even down to the virtual machine instructions used) with dynamic strings. This extends to allowing objects with operator()’s to be passed as functions, and objects with a specific list of required functions (see above) can act as types. This is convenient at various points: parameters of type ‘function’ are sufficient in many cases where you would otherwise have to pass an object, the compiler may at times need to create type objects to handle template features not supported by the underlying type system, you could make a type object which delegates most of its functions to another type but provides different initialization in the default constructor, etc. The key is that the [] and () operators are fundamental: they parallel commands in the byte code, and making them work right makes almost everything else work right.

It is worth mentioning that this is the second script language. The first only had a concept of functions. Unfortunately, it did not have much heap management to speak of, and no chance of incorporating garbage collection. I was having a hard time building in support for dialog boxes since the language had no features to support a nice syntax without special case code. A design goal for the new scripting language is to not only support objects, but also a syntax which makes declaring a dialog box clean.

I should say exactly what type checking is done at run-time and what type checking is done at compile-time. The facilities for both are in place: dynamic casting with run-time checks is supported in addition to a parameterized type mechanism with compile-time error checks. The standard library will have a preference for compile-time checks, but can be used with run-time checks as well. In practice there are a lot of abstract classes followed by a layer of concrete classes. Any layers beyond that may require dynamic typing. However, for the safety of the virtual machine, some crucial checks are performed at run-time even though the compiler would never produce code which would cause a problem. If I become extremely ambitious, I’d consider proof carrying code, but almost certainly not.

----- Left off here -----

Should still talk about

type resources
function resources
function type tells number of parameters, return type, type of each parameter, whether each parameter is in, out, or inout
resources should be type compatible with corresponding objects, if you apply the abstract machine instruction for accessing an array, you should get the same result applying it to a resource.

Chapter 3. Script Language Features

Inheritance

How compiler assigns enums to member functions: optional functions, extern functions, also optional/extern data members.

Restricted multiple inheritance

<rules>

Operator overloading

.get .set

Types (actually just those type references which may be resolved at compile-time) may be cast to a function which takes the same parameters as the constructor, returning a new instance of that type. Would be nice if I can make the beginning of the type record for a class look like the type record of this function.

clone (shallow copy) is a standard member function. Maybe also deepcopy?

Several standard member functions are automatically generated. Ex: clone, constructor glue, does_extend, next_obj_ref, (next_weak_ref,) reflection functions.

Strings ‘compatible’ with string resources, even with eventual ‘final inline’ support (see below)

Bool arrays, like strings, are stored in a space efficient manner. Operator overloading (with .get and .set) handles the translation to and from the compact form.

‘Importing member functions’: Setting member functions to any compatible member function (requires the compiler to store which member functions are referred to be each member function, as well as data layout referenced). This is needed to resolve member conflicts when using multiple inheritance

Dynamically sized arrays

Automatic generation of wrappers around primitive types (when an object is an expected)

Function objects: any object with an operator() may be passed as a function parameter.

The type type! IsClass primitive function. Future: Has a bounding parameterization (i.e. ‘types extending container<int>’).

Derivation of a class at variable declaration time.

Example:

// Standard Displayable with a title bar

class Window(string title) extends Displayable { ... }

// OKButton is like a button but:

// * sets itself as default

// * has text 'OK'

// * defaults to reasonable position in lower right

class OKButton() extends ButtonControl { ... }

class StatusMessage(string message="") extends Window("Status Update")

{

override mBackgroundColor=rgba(1,1,1,1); // override default value

// How do we set modal? mModal=true? something else?

string mMessage=message; // new member

class helperclass = Window; // define a type alias

override OnCreate(object creator) { /* construction? */ }

// need to make sure mMessage gets set soon enough

StaticTextControl(message) { y=10; x=CenteredX(); }

// OKButton's creater is 'StatusMessage'

OKButton() { void OnPress() { global.log.add(creator.mMessage); } }

ImageControl(0) mStatusImage { x=20; y=20 }

}

Localization support. Still need to work out some issues here. Would be best if had a special tool to support this. What do we need besides alternate versions of strings and other resources?

‘inout’ and ‘out’ parameters of functions use copy-in-copy-out protocol. Less indirection, and easy to make compatible with ‘emulated variables’, i.e. functions with .get and .set modifiers. This protocol fails gracefully if, for example, no function is defined.

Use smart pointers to maintain all references from the C++ code into scripts (primarily callbacks and tying of game objects to script objects). These will be root nodes for the garbage collector (in addition to global objects and stack objects inside the script engine).

Future: Reflection/introspection!

Future: ‘final inline’ functions (efficient, but less flexible). Normally just prevents overrides, but to use resource strings and dynamic string interchangeable, require stricter type compatibility.

Future: templates for the automatic generation of types, especially those that share an implementation. Implementationwise, containers either contain object references or another primitive type. Also: function objects, references, etc. Copy Generic Java! Need to decide on exact restrictions we can live with. Member templates? Funky recursiveness? Bridges?

Future: float support

Future: Unrealscript ‘state’ support via switching between types. ‘Mode-switching’ from ‘A Theory of Objects’. Restriction: only allowed to switch between parent/child with the same data format. Call optional member functions OnLeaveState(type to) and OnEnterState(type from).

Future: garbage collection. Note explicit delete operation supported. Only re-uses addresses if at garbage collection time it decides it needs to compact (should only compact at garbage collection time since that is when all references are known and may be renumbered). Makes it easy to detect when dangling references are used and handle them safely. Addresses in age order makes it easy to keep track of generations and maintain a generational write barrier without extra per-object storage. Unfortunately still need extra storage to get an incremental garbage collector.

Thought: can I implement garbage collection in the language itself? Need hooks for write barrier, but there should be plenty of reflection/introspection information.

Future: weak references. May be able to get away with only having them in the "containedby<T>" base class. Should be easy to implement in the virtual machine, but need a little extra support for compaction. Unfortunately doesn’t solve the "dialogs with an array of dialog controls each which wants a back pointer to the dialog not the array" problem.

Future: ‘thread’ return type for functions. For functions which run in their own thread. Maybe they (immediately) return a thread id which can later be used by thread functions (suspend/kill/query/getstackof thread)

Sather’s rules for function parameters? Yes. In several places we need the concept of ‘this function/class fulfills the requirements promised by a given declaration. For example: forward declarations. Would be nice if forward declarations could be extern, but the function/class itself (which satisfies the protocol but may vary on details) is not.

Question:

Principle: When given a programmer declaration and a declaration which could be deduced (inheritance, whatever), use the ‘tighter’ or ‘stricter’. When does this arise?

Rules:

Can’t use a type variable as a parameter or return type, have to use a template parameter in that case, or do dynamic typing.

Can’t call constructor of type variable. Can only call constructor on types that may be determined statically (i.e. at compile-time).

Those types which may be determined statically (i.e. at compile time) may be cast to the function type matching the signature of the constructor.

Different kinds of types:

static/compile-time/const type

template type

type variable

Note that a type variable corresponding to the actual type of a template parameter is (secretly) passed during construction.

Future: Expose the object model concept of an offset within the type. Should approximate C++’s member function pointers.

Chapter 4. Parameterized Types and Functions

Also known as ‘templates’ or ‘generics’. Creates a family of classes parameterized by one or more

Type parameters occur inside angle brackets (‘<’ ... ‘>’).

Classes, global functions, and member functions may be parameterized.

Supports bounded parameterization: type parameter must extend a particular class. Useful since can only access functions or member declared in the bound. Still useful without type bound, for example: containers that can contain any object, and doesn’t require assumptions about what type of object it holds.

Example:

class Comparable

{

require bool IsSmaller(Comparable);

}

class KeepBiggest<T extends Comparable>

{

T m;

T Add(T p)

{

T ret;

if (m.IsSmaller(p))

{

ret=m;

m=p;

}

else

ret=p;

return ret;

}

class IntLike(member int i) extends Comparable, GarbageCollected

{

override bool IsSmaller(Comparable p)

{ // use a dynamic cast in this example to keep things simple

IntLike p2 = Cast<IntLike>(p);

if (p2)

return i<p2.i;

else

return false;

}

/* Better way:

bridge bool IsSmaller(IntLike p2)

{ if (p2)

return i<p2.i;

else

return false;

}

// an even better way used recursive bounded parameterization

int GetValue()

{

return i;

}

void test()

{

IntLike(2) i2;

IntLike(3) i3;

IntLike(4) i4;

KeepBiggest<IntLike>() k;

k.Add(i3); // returns null object, sets k.m to i3

k.Add(i2); // returns i2, k.m still i3

k.Add(i4); // returns i3, k.m now i4

}

Complicated, terse example:

class A<T1 extends B, T2 extends C<T1>> extends D, E<T1>

{ //...

T1 member(T2);

template<T3, T4> T3 templatedmember(T4);

}

template<T5 extends F, T6 extends G> T6 globalfunc(V<T5>, F);

A<H,I>() a;

I() i;

H h=a.member(i);

J j();

K k=a.templatedmember<K>(j);

class L extends F;

V<L> v;

L() l;

G g=globalfunc<G>(v, l);

Concept of erased types. The type safety afforded from templates is strictly from static analysis done at compile-time. Most template information is lost at runtime. On the other hand, runtime type querying of objects is more powerful, but slower.

Single bytecode of function works for all possible type parameters.

Can’t use primitive types as type parameters, only descendants of ‘object’.

For many reasons: garbage collection, compaction, reference counting, function dispatch, etc.

Careful! ‘type’ type and ‘function’ type not concrete/primitive! Usually store resource (so safe to incref/decref but not necessary) but sometimes store objects.

Use wrappers for primitive types, e.g. Int, Bool, Float, and Enum<T>.

(Maybe should support automatic usage of the wrapper when a primitive type is specified as a type parameter)

Actual type passed as hidden parameter to constructor.

Recursive parameterization allowed (and encouraged!)

Example: "class LessThanComparable<T extends LessThanComparable<T>> { bool operator<(T); }"

"class String extends LessThanComparable<String>"

If need dynamic dispatch, can then use descendants of String.

Given: "class A<T extends B>"

Restrictions: A can’t call T’s constructor -- don’t know number and type of parameters. Can’t use "T.member" except during construction or if member is ‘final’ in B.