Overview of the Poly/ML Source Code

Poly/ML has a history of over 25 years and the source has undergone many changes in that time. Many of the file names no longer reflect the current function of their code. This is intended as a brief introduction to the source code to enable maintainers and those wanting to experiment with the code to find their way round it. The source code is fairly well commented at the level of individual statements.

The source code is comprised of three parts. The run-time system is written in C++ with a small amount of assembly code. The compiler is written in Standard ML and the basis library consists of several Standard ML files.

The source code changes with each release so the documentation will need to be updated. This version reflects the state of the 5.4 release.

The Poly/ML Compiler

The Poly/ML compiler is written in Standard ML. Each file is a module, either a signature, functor or structure whose name matches the name of the file. This is a consequence of using the Poly/ML "make" system to build the compiler. The larger modules will have a signature file, a functor which contains most of the code and a small file which defines the structure as an application of the functor to its arguments.

Compiler control

mlsource/MLCompiler/Debug.ML - Structure
mlsource/MLCompiler/COMPILER_BODY.ML - Functor
mlsource/MLCompiler/CompilerBody.ML - Structure

CompilerBody is the main body of the compiler. The compiler consists of four major passes but the final code-processing pass in particular involves several minor passes. CompilerBody controls each of the major passes. The source code is parsed into a parse-tree which is then type checked and subsequently code-generated into a code-tree. The parsing, type-checking and code-generation passes may each fail because of errors in the source and if one pass fails the later passes are not attempted. The final pass which transforms and optimises the code-tree and generates the final machine code will never fail unless there is an internal compiler error.

Debug contains definitions of most of the "tags" that control the compiler. These are the internal representation of the properties that may be passed in to PolyML.compiler. Most of these are used to control the output of debugging information.

Identifiers

mlsource/MLCompiler/STRUCTVALSIG.sml - Signature
mlsource/MLCompiler/STRUCT_VALS.ML - Functor
mlsource/MLCompiler/StructVals.ML - Structure

StructVals contains the fundamental datatypes that describe all Poly/ML values, types, type-constructors, functors, structures and signatures. These are all entities that can appear in the top-level name space. The compiler operates on name-spaces that contain these entities, looking up existing identifiers and making new identifiers as a result of top-level declarations. The actual "values" associated with values, functors or structures are described using the CodeTree datatype (see BaseCodeTree). This allows for inline functions to contain the full range of code. Structures are actually represented as tuples and functors as functions, which by default are inline. As well as top-level entities the datatypes also include versions of values and structures that occur only during the compilation process.

Lexical Analysis

mlsource/MLCompiler/Symbols.ML - Structure
mlsource/MLCompiler/Syms.ML - Structure
mlsource/MLCompiler/IntSet.ML - Structure
mlsource/MLCompiler/SYM_SET.ML - Functor
mlsource/MLCompiler/SymSet.ML - Structure

Symbols defines the ML reserved words. Syms, IntSet, SYM_SET and SymSet provide a way of handling sets of symbols during parsing.

mlsource/MLCompiler/LEXSIG.sml - Signature
mlsource/MLCompiler/LEX_.ML - Functor
mlsource/MLCompiler/Lex.ML - Structure

The lexical analyser processes the input text skipping over comments and blank space. It sets a group of refs to information about the current sysmbol.

Parsing

mlsource/MLCompiler/PARSE_DEC.ML - Functor
mlsource/MLCompiler/ParseDec.ML - Structure
mlsource/MLCompiler/PARSE_TYPE.ML - Functor
mlsource/MLCompiler/ParseType.ML - Structure
mlsource/MLCompiler/SKIPS_.ML - Functor
mlsource/MLCompiler/Skips.ML - Structure
mlsource/MLCompiler/UTILITIES_.ML - Functor
mlsource/MLCompiler/Utilities.ML - Structure

ParseDec is the main recursive-descent parser. It calls in to the lexical analyser to get the next symbol and calls functions in ParseTree, Signatures and Structures to build the parse-tree as it goes. ParseType does this for the type-expressions such as type constraints or in signatures. Skips and Utilities contain helper functions for parsing.

Parse Tree

mlsource/MLCompiler/STRUCTURESSIG.sml - Signature
mlsource/MLCompiler/STRUCTURES_.ML - Functor
mlsource/MLCompiler/Structures.ML - Structure
mlsource/MLCompiler/SIGNATURESSIG.sml - Signature
mlsource/MLCompiler/SIGNATURES.sml - Functor
mlsource/MLCompiler/SignaturesStruct.sml - Structure
mlsource/MLCompiler/PARSETREESIG.sml - Signature
mlsource/MLCompiler/PARSE_TREE.ML - Functor
mlsource/MLCompiler/ParseTree.ML - Structure

The parser generates a tree structure to represent the source program during the parsing pass. The type-checking and code-generation passes work on this parse tree. ParseTree contains the definitions for the core language, Structures the definitions for structures and functors and Signatures the definitions for signatures. The datatypes for the parse tree are local to each of these modules so each module contains all the code that needs to walk over the parse tree. Each of these modules makes use of the parse tree support modules to perform particular tasks.

Type Checking

mlsource/MLCompiler/TYPETREESIG.sml - Signature
mlsource/MLCompiler/TYPE_TREE.ML - Functor
mlsource/MLCompiler/TypeTree.ML - Structure
mlsource/MLCompiler/COPIERSIG.sml - Signature
mlsource/MLCompiler/COPIER.sml - Functor
mlsource/MLCompiler/CopierStruct.sml - Structure
mlsource/MLCompiler/PRINT_TABLE.ML - Functor
mlsource/MLCompiler/PrintTable.ML - Structure

TypeTree contains the main type-checking code and various other functions to support operations on types. Copier is used to make a copy of a signature when it is instantiated to a structure. PrintTable contains a list of current overloadings of overloaded operations. Previously this included user-provided pretty-printers but this has now been removed.

Parse Tree Support

mlsource/MLCompiler/VALUEOPSSIG.sml - Signature
mlsource/MLCompiler/VALUE_OPS.ML - Functor
mlsource/MLCompiler/ValueOps.ML - Structure
mlsource/MLCompiler/PRETTYSIG.sml - Signature
mlsource/MLCompiler/Pretty.sml - Structure
mlsource/MLCompiler/DATATYPEREPSIG.sml - Signature
mlsource/MLCompiler/DATATYPE_REP.ML - Functor
mlsource/MLCompiler/DatatypeRep.ML - Structure
mlsource/MLCompiler/EXPORTTREESIG.sml - Signature
mlsource/MLCompiler/ExportTree.sml - Functor
mlsource/MLCompiler/ExportTreeStruct.sml - Structure
mlsource/MLCompiler/TYPEIDCODESIG.sml - Signature
mlsource/MLCompiler/TYPEIDCODE.sml - Functor
mlsource/MLCompiler/TypeIDCodeStruct.sml - Structure
mlsource/MLCompiler/DEBUGGERSIG.sml - Signature
mlsource/MLCompiler/DEBUGGER_.sml - Functor
mlsource/MLCompiler/Debugger.sml - Structure

There are various support modules involved in the process of type-checking and code-generation. ValueOps contains operations on identifiers. As well as simple identifiers it also deals with various sorts of overloaded identifiers as well as the type-specific functions such as PolyML.print. It contains many of the functions to display ML values. Pretty defines the type used in the Poly/ML pretty printer. DatatypeRep produces an optimised representation for the value constructors of a datatype depending on the number and types of the constructors. ExportTree is used in the construction of the abstract view of the parse-tree that is made available through the IDE interface. TYPEIDCODE produces code for the type-identifiers associated with types and datatypes. These contain the type-specific printing and equality functions. Debugger is used to build the data structures and hooks used for debugging ML code if the code is compiled with PolyML.Compiler.debug set.

Code Generation

mlsource/MLCompiler/CodeTree/BaseCodeTreeSig.sml - Signature
mlsource/MLCompiler/CodeTree/BaseCodeTree.sml - Structure
mlsource/MLCompiler/CODETREESIG.ML - Signature
mlsource/MLCompiler/CodeTree/CODETREE.ML - Functor
mlsource/MLCompiler/CodeTree/ml_bind.ML - Structure

The third pass of the compiler generates an intermediate code structure from the parse-tree. BaseCodeTree contains the datatype definition for this structure and a few additional functions. CODETREE contains the optimiser and processing functions that transform the tree structure generated from the ML code into an equivalent tree structure for the low-level code generator. The optimise function performs inline function expansion, tuple optimisation and various constant folding operations. Later passes remove redundant declarations especially those added as part of the inline expansion process and compute life-time values for the remaining declarations. Life-time information is used by the low-level code-generator to aid register allocation.

Code Generation - X86

mlsource/MLCompiler/CodeTree/CODE_ARRAY.ML - Structure
mlsource/MLCompiler/CodeTree/CODEGEN_TABLESIG.sml - Signature
mlsource/MLCompiler/CodeTree/CODEGEN_TABLE.ML - Functor
mlsource/MLCompiler/CodeTree/CodeGenTable.ML - Structure
mlsource/MLCompiler/CodeTree/CODECONSSIG.sml - Signature
mlsource/MLCompiler/CodeTree/X86CODESIG.sml - Signature
mlsource/MLCompiler/CodeTree/X86OUTPUTCODE.ML - Functor
mlsource/MLCompiler/CodeTree/X86OPTIMISE.ML - Functor
mlsource/MLCompiler/CodeTree/X86LOWLEVEL.ML - Functor
mlsource/MLCompiler/CodeTree/GENERATE_CODE.ML - Functor
mlsource/MLCompiler/CodeTree/GCode.i386.ML - Structure
mlsource/MLCompiler/CodeTree/CodeCons.i386.ML - Structure

The final part of the compilation process is to generate machine code for the particular architecture. GCode (GENERATE_CODE) processes the code-tree and builds a list of instructions. CodeGenTable is used to keep track of declarations and register allocations. X86LOWLEVEL is the first part of this process. X86OPTIMISE is a peep-hole optimiser that looks for sequences of instructions that can be reduced. The final part of the process is handled by X86OUTPUTCODE which takes the instruction sequence and produces a code-object, a vector containing the X86 machine code and also the constants used in the code. CODE_ARRAY is a helper structure that provides byte and word operations on the code-object.

Bootstrapping

mlsource/MLCompiler/CompilerVersion.sml - Structure
mlsource/MLCompiler/MAKE_.ML - Functor
mlsource/MLCompiler/Make.ML - Structure
mlsource/MLCompiler/INITIALISE_.ML - Functor
mlsource/MLCompiler/Initialise.ML - Structure
mlsource/MLCompiler/ml_bind.ML

CompilerVersion is a tiny structure with the current version information. Make is a wrapper for the compiler and includes a cut-down version of the "use" function to enable the basis library to be compiled. Initialise contains declarations needed for bootstrapping. Before the basis library can be compiled there are certain identifiers that have to be added to the initial name-space. In particular, the compiler itself and various compiler switches and datatypes have to added at this stage. ml_bind is the root when building the compiler using PolyML.make. It sets up the compiler for bootstrapping.

Support Library

mlsource/MLCompiler/Boot/Address.ML
mlsource/MLCompiler/Boot/Misc.ML
mlsource/MLCompiler/Boot/HashTable.ML
mlsource/MLCompiler/Boot/UniversalTable.ML
mlsource/MLCompiler/Boot/StretchArray.ML
mlsource/MLCompiler/Boot/ml_bind.ML

The Boot directory contains a few library structures that are used throughout the compiler. These are gradually being replaced by the Standard Basis Library.

The Run-time System

The Poly/ML run-time system (RTS) is written mostly in C++ with a few files in C and assembly code. All interaction between ML code and the operating system goes through the run-time system. Most interaction is through RTS calls.

Stub Functions

libpolymain/polystub.c
polyimport.c

Every executable program has to have an initial entry point, (main or WinMain) and this is provided by either polyimport or polystub. polystub is used to create the polymain library. All other RTS files are compiled into the polyml library. polyimport is normally only used during the initial installation and reads a heap that has been exported in the portable (text) format. polystub is used when building an executable by linking in an object file that has been exported with PolyML.export.

Globals and Support Modules

libpolyml/mpoly.cpp
libpolyml/mpoly.h
libpolyml/run_time.cpp
libpolyml/run_time.h
libpolyml/diagnostics.cpp
libpolyml/diagnostics.h
libpolyml/rts_module.cpp
libpolyml/rts_module.h
libpolyml/globals.h
libpolyml/noreturn.h
libpolyml/sys.h
libpolyml/version.h
config.h
winconfig.h

mpoly.cpp contains the main entry point to the RTS and is immediately called by the main program in either polyimport or polystub. run_time.cpp contains the main despatch table for RTS calls from ML code and also various functions that do not fit elsewhere. diagnostics.cpp contains some functions to produce debugging information from the RTS. rts_module defines the RTSModule base class that is used for the more specific modules. sys.h provides symbolic definitions for run-time system calls. The information in it should match basis/RuntimeCalls.ML. globals.h defines the PolyWord and PolyObject classes that provide symbolic access to machine words as well as other global definitions. noreturn.h provides a way of indicating that a function does not return normally. version.h is a small file containing the current RTS version. config.h is produced automatically by the configuration process. winconfig.h is an equivalent for Windows when compiling under Visual C++.

Arithmetic and Strings

libpolyml/arb.cpp
libpolyml/arb.h
libpolyml/reals.cpp
libpolyml/reals.h
libpolyml/realconv.cpp
libpolyml/realconv.h
libpolyml/polystring.cpp
libpolyml/polystring.h

arg.cpp contains the arbitrary precision package. It now uses GMP to do the actual arithmetic if GMP is installed and otherwise uses its own code. reals.cpp contains real number (floating point) operations. realconv is a slightly modified version of the real to string conversion functions written by David M. Gay.

Basis Library Support

libpolyml/process_env.cpp
libpolyml/timing.cpp
libpolyml/process_env.h
libpolyml/timing.h
libpolyml/io_internal.h
libpolyml/basicio.cpp
libpolyml/network.cpp
libpolyml/basicio.h
libpolyml/network.h
libpolyml/errors.h

libpolyml/proper_io.h
libpolyml/proper_io.cpp

These files contain the operating system interfaces needed to support the Standard Basis Library. proper_io.cpp contains some wrap-around functions to avoid bugs and inconsistencies in some operating system calls. errors.h contains a table that maps between error numbers (the value stored in errno on Unix) and their textual equivalents.

State Saving and Exporting

polyexports.h
libpolyml/exporter.cpp
libpolyml/exporter.h
libpolyml/elfexport.cpp
libpolyml/elfexport.h
libpolyml/machoexport.cpp
libpolyml/machoexport.h
libpolyml/pecoffexport.cpp
libpolyml/pecoffexport.h
libpolyml/pexport.cpp
libpolyml/pexport.h
libpolyml/sharedata.cpp
libpolyml/sharedata.h
libpolyml/savestate.cpp
libpolyml/savestate.h

These files provide mechanisms for exporting the heap in various forms. Different operating systems use different formats for object modules: ELF on Linux and BSD Unix, Mach-O on Mac OS X and PE-COFF on Windows. Poly/ML also has its own portable text format that is usually used only for the initial installation and pexport.cpp contains the code to both export and import this format. sharedata.cpp is used to reduce the size of the heap by combining values that are equivalent. Although not strictly related to exporting it is usually used before a heap is exported. savestate.cpp contains code to export and import the heap as a saved state.

Operating-System Specific

libpolyml/Console.h
libpolyml/Console.cpp
resource.h
PolyML.rc
libpolyml/PolyControl.h
libpolyml/windows_specific.cpp
libpolyml/unix_specific.cpp
libpolyml/xwindows.cpp
libpolyml/xcall_numbers.h
libpolyml/xwindows.h
libpolyml/os_specific.h

Parts of the RTS are specific to either Windows or to Posix platforms i.e. Unix and Cygwin. unix_specific.cpp contains code to support the Unix and Posix structures in the basis library. windows_specific.cpp supports the Windows structure. Console.cpp provides a simple console window in Windows and PolyML.rc is the resource file with the menus and icons. xwindows.cpp contains the X-Windows and Motif interface. It is only included if the appropriate configuration option is set.

Hardware Specific

libpolyml/machine_dep.h
libpolyml/x86_dep.cpp
libpolyml/x86asm.asm
libpolyml/power_dep.cpp
libpolyml/power_assembly.S
libpolyml/sparc_dep.cpp
libpolyml/sparc_assembly.S
libpolyml/int_opcodes.h
libpolyml/interpret.cpp

Poly/ML is compiled into machine code and uses its own linkage conventions. When calling from ML to the RTS there needs to be an interface which saves the ML state and loads the C state for the RTS. Arguments and results need to be transferred. There is a C++ file and an assembly code file for each of the X86 (32 and 64-bit), PPC and Sparc architectures. On other architectures a portable, interpreted byte code is used and the interpreter takes the place of the machine-specific module.

Multi-Threading

libpolyml/processes.cpp
libpolyml/processes.h
libpolyml/locking.cpp
libpolyml/locking.h

Support for multi-threading is mostly contained in processes.cpp. locking.cpp provides implementation for the PLock, Plocker and PCondVar classes that are used in various places to provide mutual exclusion.

Memory Management

libpolyml/gc.cpp
libpolyml/gc.h
libpolyml/bitmap.cpp
libpolyml/bitmap.h
libpolyml/memmgr.cpp
libpolyml/memmgr.h
libpolyml/osmem.cpp
libpolyml/osmem.h
libpolyml/save_vec.cpp
libpolyml/save_vec.h
libpolyml/scanaddrs.cpp
libpolyml/scanaddrs.h
libpolyml/check_objects.cpp
libpolyml/check_objects.h

The main part of the garbage collector is in gc.cpp. bitmap.cpp provides the Bitmap class that is used to mark allocated words in the memory. memmgr.cpp provides classes to manage the various segments of memory: local segments for local heaps and permanent segments for object file heaps and saved states. osmem.cpp is used for the actual allocation and de-allocation of memory using calls specific to the operating system. save_vec.cpp defines classes that support a save-vector for each thread. When in the RTS a thread may need to allocate memory or access values in the ML heap. It always does this through its save vector which may be modified if there is a garbage collector. scanaddrs.cpp provides classes process data structures in the heap by following pointers. This is used in the garbage collector and also when exporting the heap. check_objects.cpp is used for debugging.

Poly/ML Extensions

libpolyml/foreign.cpp
libpolyml/foreign.h
libpolyml/objsize.cpp
libpolyml/objsize.h
libpolyml/poly_specific.cpp
libpolyml/poly_specific.h
libpolyml/profiling.cpp
libpolyml/profiling.h
libpolyml/sighandler.cpp
libpolyml/sighandler.h

As well as the standard basis library Poly/ML contains various additional structures. foreign.cpp contains the foreign-function interface (CInterface structure). objsize.cpp supports PolyML.objSize and PolyML.showSize. poly_specific.cpp has various additional functions. profiling.cpp supports profiling for time and space. sighandler.cpp supports the Signal structure that allows an ML function to be called as the result of a signal. In more recent releases the foreign-function interface has changed and foreign.cpp is replaced with polyffi.cpp and the Foreign structure.

Basis Library

The basis library is compiled when Poly/ML is built for a particular platform. Apart from the entries added by the initialisation process all entries in the name space come from the basis library. The library is mostly compiled into a basic name space created during the initialisation process. When this is complete a new name space is built using functions from the basis library and all the declarations are copied over with the exception of some of the support modules that are only used internally in the basis library.

Build control

exportPoly.sml
basis/build.sml

These files are used to control the build process.

Values and Infixes

basis/InitialBasis.ML

Most of the library is arranged as modules (structure or functors and their signatures). InitialBasis contains various values and infix declarations that can appear free in the basis and in particular those that are needed to compile the rest of the basis. A few additional value declarations are made later in the process, in particular the General structure is opened after it has been compiled.

PolyML structure

basis/InitialPolyML.ML
basis/PrettyPrinter.ML
basis/FinalPolyML.sml
basis/TopLevelPolyML.sml

The PolyML structure is unusual in that it is actually built in several phases. There is a version of the structure created in the initialisation process that contains special definitions such as PolyML.print that are infinitely overloaded and cannot be written in ML. InitialPolyML is compiled at the start of building the library and extends the structure to include some functions, such as onEntry, that are used within the basis library itself. PrettyPrinter, FinalPolyML and TopLevelPolyML are compiled after the rest of the basis library. PrettyPrinter contains a pretty printer, FinalPolyML contains the definition of PolyML.compiler and TopLevelPolyML contains code for the IDE protocol.

Support Modules

basis/LibraryIOSupport.sml
basis/LibrarySupport.sml
basis/VectorOperations.sml
basis/VectorSliceOperations.sml
basis/PolyVectorOperations.sml
basis/BasicStreamIO.sml
basis/ExnPrinter.sml

A few modules are compiled during the build process and removed later.

Standard Basis Library

basis/Array.sml
basis/Array2.sml
basis/BIT_FLAGS.sml
basis/BinIO.sml
basis/BinPrimIO.sml
basis/Bool.sml
basis/BoolArray.sml
basis/Byte.sml
basis/CommandLine.sml
basis/Date.sml
basis/General.sml
basis/GenericSock.sml
basis/IEEEReal.sml
basis/IEEE_REAL.sml
basis/IMPERATIVE_IO.sml
basis/INTEGER.sml
basis/INetSock.sml
basis/IO.sml
basis/ImperativeIO.sml
basis/Int.sml
basis/Int32.sml
basis/IntArray.sml
basis/IntArray2.sml
basis/IntInf.sml
basis/LargeWord.sml
basis/List.sml
basis/ListPair.sml
basis/MATH.sml
basis/MONO_ARRAY.sml
basis/MONO_ARRAY_SLICE.sml
basis/MONO_VECTOR.sml
basis/MONO_VECTOR_SLICE.sml
basis/NetHostDB.sml
basis/NetProtDB.sml
basis/NetServDB.sml
basis/OS.sml
basis/Option.sml
basis/PRIM_IO.sml
basis/PackRealBig.sml
basis/PackWord8Big.sml
basis/Posix.sml
basis/PrimIO.sml
basis/Real.sml
basis/RealArray.sml
basis/STREAM_IO.sml
basis/Socket.sml
basis/String.sml
basis/StringCvt.sml
basis/SysWord.sml
basis/Text.sml
basis/TextIO.sml
basis/TextPrimIO.sml
basis/Time.sml
basis/Timer.sml
basis/Unix.sml
basis/UnixSock.sml
basis/Vector.sml
basis/Windows.sml
basis/Word32.sml
basis/Word32.x86_64.sml
basis/Word16.sml
basis/Word8.sml
basis/Word8Array.sml

These all contain structures, functors and signatures defined in the Standard Basis Library.

Poly/ML Extensions

basis/RuntimeCalls.ML
basis/Signal.sml
basis/SingleAssignment.sml
basis/Thread.sml
basis/Universal.ML
basis/UniversalArray.ML
basis/Weak.sml
basis/HashArray.ML
basis/processes.ML
basis/SML90.sml

These are extensions added by the Poly/ML system. RuntimeCalls lists the RTS call numbers. Signal provides a way to handle Unix signals (and console interrupts in Windows). SingleAssignment provides a reference that can be assigned to once. Thread provides multi-threading and processes contains a definition of the old Poly/ML Process structure for backwards compatibility. Weak provides weak references i.e. references that can be used to detect when a value is no longer referenced. HashArray provides a hash table structure. SML90 provides backwards compatibility for ML/90. It was defined in the original standard basis document but later removed.