IDE Communications Protocol

Poly/ML release 5.3 introduces support for an Integrated Development Environment to extract extra information about a program. This documents the protocol used to exchange information with the front-end. It is written on top of functions that extract the information from the compiler's parse tree. Some applications may find it more convenient to interact directly with these functions and implement their own protocol. This document is primarily aimed at writers of IDEs or plug-ins who are interacting with the normal ML top-level.

Lexical

The basic format uses a binary XML-like representation in which the escape character (0x1b) is used as a special marker. It may be followed by other characters that determine how the remainder of the input is to be treated. Strings are sent as a sequence of bytes terminated by the escape character. If the escape character itself appears in the string it is sent as two escape characters, except within compilation input (see below). Where a value represents a number it is sent as base ten, possibly preceded by ~ or -.

Packets and Mark-up

There are two different ways in which escape combinations may occur. Within the communications protocol data is exchanged between the IDE and the Poly/ML front-end using packets of data. These begin and end with an escape sequence and use escape sequences, usually escape followed by comma, to separate the elements. The opening escape sequence is always escape followed by an upper case character and the closing sequence is always escape followed by the lower case version of the opening sequence. For many cases, the format of the packet is fixed but there is an exception in the case of marked-up text. Marked-up text can arise in the case of error messages or some other output from the compiler where extra information can be inserted at arbitrary point within the text of the message. Such mark-up uses the same format as the protocol packets but the opening section is delimited by escape followed by semicolon. Having a standard format provides for upwards compatibility since an IDE can easily skip mark-up that it does not recognise.

Output mark-up

Poly/ML can be run in a mode where it produces enhanced output but otherwise runs a normal top-level. This can be used by the IDE to give the user access to a normal interactive ML session. The --with-markup option to Poly/ML runs the normal Poly/ML top-level loop but causes it to add mark-up to some of its messages. Currently it is used in two cases; in error messages and in messages showing where an identifier was declared.
The format of the information showing a location is:

ESC 'D' filename ESC ',' startline ESC ',' startlocation ESC ',' endlocation ESC ';'
This is followed by the identifier itself and then the closing packet:
ESC 'd'

An error message packet consists of

ESC 'E' kind ESC ',' filename ESC ',' startline ESC ',' startlocation ESC ',' endlocation ESC ';'

"kind" is either 'E' indicating a hard error or 'W' indicating a non-fatal warning. This is followed by the text of the message and then the closing packet:

ESC 'e'

Mark up in the future will follow the same pattern allowing the IDE to skip unrecognised mark-up. This mark-up is also used in some of the packets within the full IDE protocol.

Full IDE protocol

When run with the --ideprotocol option the top-level loop runs the full IDE communication protocol. This can also be started by PolyML.IDEInterface.runIDEProtocol() from within PolyML.

This is intended primarily for compiling files while they are being edited, either as the result of an explicit request from the user or automatically. When this option is given the front-end retains the parse tree and requests can be made to extract information from the parse tree.

When the IDE mode is started, PolyML sends the following message to std-out:

ESC 'H' protocol-version-number ESC 'h'
where protocol-version-number is the version number identifying the particular version of the PolyML protocol. Requests to PolyML should wait until this message has been sent by PolyML. The current version of the protocol is 1.0.0.

Requests to PolyML are in terms of byte offsets within the last source text. If the text has been edited since it was last sent to ML the IDE must convert positions within the current source text into positions within the original before sending requests to ML and do the reverse conversion before displaying the results.

Requests from IDE to Poly/ML.

Simple requests about the current parse tree all have the same format. They contain a request code that describes the kind of information to return and a pair of positions. Frequently the start and end positions will be the same. PolyML searches for the smallest node in the parse tree that spans the positions and returns information about that node. It always retains the actual span for the node in the result so that the IDE can highlight the actual text in the display.

Every request contains a request identifier which is returned in the result. This allows the IDE to run asynchronously. A request identifier is an arbitrary string generated by the IDE. The request identifier used in a compilation request has a special status. This identifier is used to mark the version of the parse tree that results from the compilation and must be included in commands that query the parse tree. In that way Poly/ML is able to tell whether a request refers to the current tree or to an older or newer version.

The format of a request packet is:

ESC CODE request-id ESC ',' parse-tree-id ESC ',' start-offset ESC ',' end-offset ESC code
The start character is an upper case character, the end character the corresponding lower case character.
The CODE is, currently, one of the following
O Return a list of properties for the node
T Return type information
I Return declaration location
M Move relative to a given position
V Return a list of references to an identifier

Responses follow a similar structure to the request. The start and end code for a response is the same as the start and end for the request. All responses contain the actual start and end points of the current tree. If there is no parse tree the start and end offsets will be zero. An unrecognised command will return an empty response for forwards compatibility. Where a command is invalid or unrecognised the response will be

ESC CODE request-id ESC ',' parse-tree-id ESC ',' startoffset ESC ',' endoffset ESC code

In particular, because the IDE may issue requests while a compilation is running the parse tree id in the request packet may not match the current parse tree within Poly/ML. In that case the parse tree id in the result packet will contain the current parse tree and not the parse tree id in the request. The IDE must keep a list of requests it has sent along with the parse tree id it used and if it receives a response with a different parse tree id it should reissue the request adjusting the offsets to account for any changes.

O Request:

ESC 'O' request-id ESC ',' parse-tree-id ESC ',' start-offset ESC ',' end-offset ESC 'o'
O Response:
ESC 'O' request-id ESC ',' parse-tree-id ESC ',' start-offset ESC ',' end-offset commands ESC 'o'
The commands are a set of strings separated by ESC ','. At the moment the strings are all single characters and identify the set of valid commands and sub-commands: i.e. T,I,J,S,V,U,C,N and P. For forward compatibility the IDE should accept but ignore other strings.

T Request:

ESC 'T' request-id ESC ',' parse-tree-id ESC ',' start-offset ESC ',' end-offset ESC 't'
T Response:
ESC 'T' request-id ESC ',' parse-tree-id ESC ',' start-offset ESC ',' end-offset ESC ',' type-string ESC 't'
Returns the type of the selected node as a string. There is no currently mark up in the type. If this is not an expression this returns
ESC 'T' request-id ESC ',' parse-tree-id ESC ',' start-offset ESC ',' end-offset ESC 't'

I Request:

ESC CHAR 'I' request-id ESC ',' ESC ',' parse-tree-id ESC ',' startoffset ESC ',' endoffset ESC ',' request-type ESC 'i'
I Response:
ESC 'I' request-id ESC ',' parse-tree-id ESC ',' start-offset ESC ',' end-offset ESC ',' filename ESC ',' startline ESC ',' startlocation ESC ',' endlocation ESC 'i'
Thse requests return information about an identifier. Finds an identifier at a given location and returns the location where that identifier was defined, if the request-type is 'I'; where the identifier was added to the current name space using an "open" declaration, if it is J and the location where the structure that was opened was declared, if it is S. The first two locations as usual indicate the position of the use of the identifier. The remaining information relates to the declaration. Returns
ESC 'I' request-id ESC ',' parse-tree-id ESC ',' startoffset ESC ',' endoffset ESC 'i'
if this is not an identifier or did not come from opening a structure, where appropriate. The information is similar to a D block in mark-up.
I Return declaration location
J Return the location where an identifier was opened
S Return the location of an identifier's parent structure

M request:

ESC CHAR 'M' ESC ',' parse-tree-id ESC ',' startoffset ESC ',' endoffset ESC ',' action ESC 'm'
The action string defines the direction of the move. Currently all moves are single character strings.
"U" Move to the parent node
"C" Move to the first child node
"N" Move to the next sibling
"P" Move to the previous sibling
M Response:
ESC CHAR 'M' parse-tree-id ESC ',' startoffset ESC ',' endoffset ESC 'm'
PolyML selects the parse tree node at the given location and moves relative to it. If the move is not possible the location remains unchanged. If the move is successful then the new move location is returned in the packet.

V Request:

ESC 'V' request-id ESC ',' parse-tree-id ESC ',' start-offset ESC ',' end-offset ESC 'v'
V Response:
ESC 'V' request-id ESC ',' parse-tree-id ESC ',' start-offset ESC ',' end-offset ESC ',' start-ref1 ESC ',' end-ref1 ESC ',' .... start-refn ESC ',' end-refn ESC 'v'
Returns the list of local references to an identifier if the parse tree location refers to a local value identifier. The results are a sequence of start and end offsets. If the specified location is not a local value identifier the result is
ESC 'V' request-id ESC ',' parse-tree-id ESC ',' start-offset ESC ',' end-offset ESC 'v'

Compilation.

In order to compile a piece of text the IDE sends it to ML through the protocol. Because any previous compilation may have executed code and affected the global state it is assumed that the IDE will set up some form of context for the file by previously saving some state. Typically, this would require it to have compiled all the files that this particular piece of source text will depend on and to have saved it in a saved state. A compilation request therefore has the following structure:

ESC 'R' request-id ESC ',' sourcename ESC ',' startposition ESC ',' prelude-length ESC ',' source-length ESC ',' prelude ESC ',' source-text ESC 'r'
'prelude' is a piece of ML code to compile and execute before the user-supplied code. Typically this will include a call to PolyML.SaveState.loadState to load the initial state for the compilation.
'source-text' is the ML code that is being compiled and executed.
'sourcename' is the name to be used as the file name when reporting locations.
'startposition' is the value to be used as the initial offset in the file. The usual case is that this is zero or a null string which is taken as zero.
'prelude-length' and 'source-length' are the number of bytes in the prelude and source text respectively. Since the prelude and source text may be large it is much more efficient to use these lengths to read the input. Because these lengths are provided it is not necessary to search for the escape code within the text and so if the escape character appears within the text it is not itself escaped.

Poly/ML responds with a result block. The format of the result block depends on the result of the compilation and possible execution of the code. The result block has the form:

ESC 'R' request-id ESC ',' parse-tree-id ESC ',' result ESC ',' finaloffset ESC ';' errors_and_messages ESC 'r'
"result" is a single character indicating success or failure.
"finaloffset" is the byte position that indicates the extent of the valid parse tree. If there was an error this may be less than the end of the input. It may be the start of the input if there was a syntax error and no parse tree could be created. As usual "request-id" is the ID of the request. The "parse-tree-id" will normally be the same as the "request-id" indicating that the compilation has updated the parse tree, even if type checking failed. However, if there was a failure, such as during parsing, that meant that no new parse tree could be produced the ID returned will be the original parse tree ID, or the empty string if there was none.
Error packets within the errors_and_messages have the same format as described above for mark-up.

The result codes are
S - Success. The file compiled successfully and ran without an exception.
X - Exception. The file compiled successfully but raised an exception when it ran.
L - The prelude code failed to compile or raised an exception.
F - Parse or type checking failure.
C - Cancelled during compilation.
The parse tree will be updated to reflect the result of the compilation and the current parse tree identifier used by Poly/ML will be set to the identifier supplied in the request.

For a result code of S (compiled successfully) there may be warnings.

For a result code of L (prelude code failed) the result packet contains the exception packet that was returned and has the form:

ESC 'R' request-id ESC ',' parse-tree-id ESC ',' 'L' ESC ',' startOffset ESC ';' error_message ESC 'r'

For a result code of F (parsing or type checking failed) the result packet contains a list of one or more error packets. The format of the result packet is:

ESC 'R' request-id ESC ',' parse-tree-id ESC ',' 'F' ESC ',' endOffset ESC ';' error_packets ESC 'r'

Where the error packets have the same format as described above for mark-up.

For a result code of C (cancel compiled) the result may or may not contain error packets depending on whether the compilation had produced error messages before the compilation was cancelled.

For a result code of X the errors_and_messages result packet contains the exception message first within a X tag, as a string. The string may also containing output mark-up such as the D-style mark-up showing the location where the exception was raised. Thus the format of the packet with exception data is:

ESC 'R' request-id ESC ',' parse-tree-id ESC ',' 'X' ESC ',' startOffset ESC ';' ESC 'X' exception_message ESC 'x' error_packets ESC 'r'

Cancellation.

Compilation is run as a separate thread and may be cancelled using the K-request.

ESC 'K' request-id ESC 'k'

The request identifier is the identifier used in the R-request that should be cancelled. Unlike other requests this packet does not have its own identifier nor does it have a direct response.

The action on receiving a cancel request depends on the current state of the compilation. If the compilation has already finished no action is taken. Poly/ML will have already sent a result packet for the compilation. If the compilation is in progress Poly/ML will attempt to cancel it by sending an interrupt to the compilation thread to ask it to terminate. If the thread is actually in the compiler at the time the interrupt is received the result will be a C result code but if it is actually executing the result of compilation this code will receive the Interrupt exception. Assuming it does not trap it the result will be an X result code. The thread may actually have completed before the interrupt is processed so any other result is also possible.