Little Island API Description
It is intended that the Little Island system be accessible to users via any system they find fit. Development of alternative client software is encouraged. This API is intended to help developers who wish to understand and build upon the Little Island system as a whole, and also help the developers of third-party client applications to access the existing system.
There are two primary scripts used for interaction with the server: the read script (usually
read.pl) used for read-only actions such as requesting the details of a node or a
cache page; and the write script (usually littleisland.pl) used for anything that
makes changes to the system's database such as logging in and out and posting comments.
Input Details
- Character set
-
In general, all data sent to the Little Island server is expected in UTF-8. However characters may be URI encoded and will be decoded by the server (see Escaping/Encoding).
- Format
-
Data sent to the server takes the form of a standard query string in the body of an HTTP request, e.g.
uuid=1234&unid=12&title=My Title&...and so on. This is true regardless of whether the request changes data on the server or not. At present the server will accept GET data as well but this should not be depended upon in future. - Escaping / Encoding
-
For compatibility with both JavaScript escaped strings and those encoded automatically by browsers through form submission, the
%,+,&and=characters within values must always be 'percent-encoded'. Regardless of context the+character will always be decoded by the server as a space, the&as a data element delimiter, and the=as key/value separator.All characters may be presented, prior to percent-encoding, as named or numeric (decimal or hex) HTML entities which will be decoded to actual characters by the server. Of course, after percent-encoding these entities should look like this:
%26#x003D;(=). At the server the entities are parsed by the system's HTML validator and will require additional resources to process. Because of this percent encoding is preferable. It should be noted that for compatibility with permitted XHTML input and with the XML output of the server the named entities<,>,&,", and'will not be converted to characters. Also, any instance of the raw characters<,>,&,"or'that are not part of any valid XHTML markup will be either converted to their named entities or elided (depending on context) by the HTML validator after decoding. In the case of the apostrophe the conversion is always from'to', not&#39;as required by Internet Explorer. The server's output will always use'so any project that requires this data to be displayed in IE will need to make the conversion itself.The current system can interpret Unicode characters URI encoded in JavaScript via either
encodeURIComponent()orescape(). This is even true of characters whose Unicode values are greater than 0xFFFF, which are represented by the client system as UTF-16 surrogate pairs that are then encoded as a pair of Unicode-style percent-encoded sequences byescape(). For example the character𤏐(Unicode code point U+243D0, or 148432 decimal) will be encoded byescape()as%uD850%uDFD0, and byencodeURIComponent()as%F0%90%80%80, both of which the server is capable of interpreting. These two encoding methods may be used together in the same request but never in the same value. Mixing these methods will cause theencodeURIComponent()encoded text to be returned as mojibake. In order to interpret the Unicode-style percent encoding the server must decode the string from UTF-8 before translating the represented wide character. But when interpreting the standard percent encoded string the UTF-8 decoding must occur afterward if the represented bytes are to be picked up as a single character. - Encryption / Security
-
All hashing performed by the client should use MD5 to produce 32 character hex encoded strings, i.e.
75d4b2d3b9f7fba692875c0aa6bc21bdNormally passwords are concatenated with the user name, translated to UTF-8, then hashed. This hash is then concatenated with a random token shared by the server prior to each transaction and hashed again. This hash is then sent with the user's UUID and loginid to authorise write transactions.
A normal transaction will follow this basic pattern:
- Client concatenates the password and user name ("passwordusername").
- Client converts the string to UTF-8.
- Client produces an MD5 hash of the UTF-8 string which is stored for future use until the user logs out.
- At the beginning of a transaction the client concatenates this hash with a token received from the server ("hashtoken").
- Client produces an MD5 hash of the new string.
- Client sends this hash to the server along with the user's UUID.
- Server uses the UUID to retrieve the user's password hash and most recent token from the database.
- Server concatenates the password hash and token ("hashtoken").
- Server produces an MD5 hash of the new string and compares the result with the received hash.
- Server either authorises the user for that transaction if the hashes match, or rejects the transaction.
- Server sends the authorised user a new randomly generated token with the subsequent response while storing a copy in the database.
The result of the message digest of the (password+name)+token is always associated with the key
hashin requests.As SSL and the related costs are rather more trouble than the project can afford right now the Little Island server security policy for client/server communication is forced to focus on protecting sensitive data in transit rather than in storage.
Without end-to-end encryption the problem breaks down like this: a registered password can be encrypted in the database but this will then require the password given at login to be received in plain text for the server to confirm a user's identity; alternatively the password can be encrypted for transit but will then require the plain text password from the database for confirmation. So the password must be encrypted for transit or storage. It can't be both.
Ultimately the server will need to use SSL/TLS but that's something for the future.
Passwords are stored as password+name MD5 hashes in the database but this merely treats the MD5 hash as the raw password. In this case, security breaches that expose the hashed password column of the user table carry many of the same penalties as exposing a column containing plain text passwords. This is mitigated to some extent by the use of raw passwords in account-altering actions such as changing the password or the registered email address.
Output Details
- Character set
-
All data sent from the Little Island server will be encoded as UTF-8.
- Format
-
All output from the server will be in XML. The element keys that can be expected for each type of transaction are described below.
- Encryption / Security
-
Output from the server will not be encrypted, nor will it contain any value that represents a message digest.
Notes on Transactions
As described above all requests sent to the Little Island server should be in the form of a standard query string in
the body of the HTTP request. The first key in every request should be action with a value corresponding
to those listed below. There are no default actions on either the read or write scripts.
All date/time data is sent from the server as timestamps of the form YYYY-MM-DD HH:MM:SS.
All time handling is performed by the server using UTC and no time values are expected from the client.
Any timezone processing should be carried out by the client.
For compatibility with the XML standards all responses from the server will come with a root element using an arbitrary name. Responses from the server should always be treated in the context of the request rather than the name of this root element.
Sibling elements in XML responses from the server should not be expected in any particular order.
Tip: The Little Island Client Greasemonkey script has a debugging option that requires
editing to activate. At the top of the script, around line 50, change the value assigned to
$LIcfg.debug to true. This will add two panels with fixed positions at the top of
the window when the main panel is open. The panel on the left will show the body of the requests sent by
the client. The panel on the right will show the body of the response.
Read Transactions
- action=node
-
The node transaction requires the key
urlwith a value retrieved from thedocument.locationproperty of the current browser page. This will include the percent-encoding which must be percent-encoded again. If you are using a custom encoding routine then be sure to percent-encode commas to%252Cas they are used as list delimiters for this action.If the page is registered (that is, if it has had comments posted to it) the server will return the details of the node with these elements:
- unid This is the Unique Node IDentifier which corresponds to the arbitrary code used to identify the page address in the database. This should be retained for use in subsequent transactions.
- scheme This is the part of the URL between the beginning of the address and the first occurrence of the
string
://, e.g.http. When using this to reconstruct the address if the scheme has any length then it must be appended by://. Under certain circumstances the scheme and authority or path may use the short form separator:but it's proper use is still under consideration. - authority This is, usually, the domain part of the url such as
www.example.com(Note that this includes the sub-domain and TLD). When reconstructing the address if either the scheme or authority has any length then a/must follow. - path This is the section of the URL that falls between the
/appended to the authority part and the end of the URL, the?at the beginning of the query string, or the#at the beginning of a fragment. When using this to reconstruct the address nothing should be appended or prepended. The path has a special function in cases where the address does not conform to the standard URL specification. In these cases it may be that the entire address string is stored in this value leaving the others empty. - query This is the section of the URL between the first occurrence of
?and the either the end of the address or the#at the beginning of a fragment. When using the query to reconstruct the URL if it has any length it must be prepended with?. - url This is the URL as it is stored in the database. This may differ significantly from the URL sent in the initial request as the server removes extraneous sections such as empty query string values and fragments.
- count This is the number of comments currently posted to the node. Although nodes are not generated until the first comment is posted it is still possible for nodes to have a count of zero since comments can also be deleted.
- cachepages This is the number of cache pages stored in the database. Cache pages are one-based so a cache page count of 3 indicates that pages 1, 2 and 3 are available.
- lastcache This is the time of the last post or edit on, or deletion from the node.
Before trying to use these values, especially the content of the url element, the XML entities must be decoded to their actual characters such as translating
&to&. No other decoding should be necessary.The node details come wrapped in a
<node>element within the root element of the XML response. It is possible to request more than one node at a time by providing them as a comma delimited list or up to 25 addresses (be sure to encode any commas within the addresses as%252C, the percent encoding of%2C). In such cases the server will return each matching node found within sibling<node>elements in the root element of the response.Example request:
action=node&url=http%3A%2F%2Fwww.google.ie%2Fsearch%3Fhl%3Den%26q%3Dlittle%2Bisland%26btnG%3DGoogle%2BSearch%26meta%3D
Example response:
<nodelist> <node> <unid>321</unid> <scheme>http</scheme> <authority>www.google.ie</authority> <path>search</path> <query>hl=en&q=little+island&btnG=Google+Search</query> <count>124</count> <cachepages>1</cachepages> <lastcache>2009-01-28 23:12:46</lastcache> </node> </nodelist> - action=read
-
The read transaction is used to request cached comments for a particular node. The maximum number of full comments in a cache page will be approximately 100-200, though the caching system is flexible up to the maximum length of the database column holding cached pages (this allows the caching system to deal with very long comment threads).
The cache page will contain all comments as sibling elements named
<comment>which should not be expected in any particular order. The first page will contain the most recently posted comment but not necessarily the most recently edited. The threads should be organised by the client with the most recently commented thread at the top (only surviving comments are tested so deleting a reply comment that has no replies itself can lead to a thread being "un-bumped"). The caching system temporarily organises threads to determine pagination but writes them to the cache page in an arbitrary order.Each comment element will contain all of these child elements:
- ucid The Unique Comment IDentifier is used to identify the comment in the database.
- parent If this is a reply comment then this will be the UCID of the parent comment, otherwise it will be zero.
- thread If this comment is a reply then this will be the UCID of the root comment of the thread. If this is a root comment itself then this will either be it's own UCID or zero.
- name This is the user name of the comment's author. Any XML sensitive characters will be translated to named entities; if you need the raw user name you will need to reverse the translation.
If the comment contains no further elements but the name element contains a string of any length, then it is masked. Masked comments are intended to be hidden by the client. Comments toward the root of a thread are masked when a thread exceeds a certain length (usually 25-50 comments). These comments require some method for un-masking through requesting a specific comment or comments via action=comment.
If the name is empty or the element is missing, it means this comment is a shell left after deletion of a comment with replies and used to maintain thread structure. It is possible for both shell comments and the parent and reply comments to be masked, therefor it is still necessary to provide some method for un-masking via the deleted comment.
In future, very long threads may require another method of indicating masked ranges of comments simply to do away with the per-comment XML markup and reduce the size of the server response.
The following elements will appear in normal, unmasked, non-shell comments:
- created The date and time that the comment was posted. This should not be used to sort comments chronologically as it is not included in masked comments. UCIDs are auto-incremented and should be suitable for the same purpose with less processing.
- edited The date of the last edit to the comment. This will be empty if the comment has not been edited.
- uuid The Unique User IDentification number identifies the user in the database. This should be used for requesting the user's profile and identifying the visitor as owner so that Edit and Delete options can be made available.
- title The title of the comment should contain no HTML. Any XML characters will be in the form of named entities.
- body The body may contain XML compatible HTML. Comment bodies may have newlines which
should be treated as line-breaks of some form. The current Little Island Client translates new-lines to
<br />. If the body is empty but the name is present then this will be a masked comment. - abrv If this element is present it means the body only contains a portion of the available text. There needs to be some means for a user to request that a client application to download the full comment. For now this element, if present, will contain an integer representing the total number of words in the full comment body. In future this may be expanded with an attribute to indicate the units used (values such as 'words', 'characters', 'bytes' or 'kilobytes').
Example Request:
action=read&unid=321&page=1
Example Response normal comment:
<commentlist> <comment> <ucid>2345</ucid> <parent>2344</parent> <thread>2300</thread> <name>John Smith</name> <created>2009-01-01 00:00:01</created> <edited></edited> <uuid>1234</uuid> <title>Comment Title</title> <body>Comment body containing <em>HTML</em>.</body> <comment> <comment> [ ... ] <comment> [ ... ] </commentlist> - action=comment
-
This is a request for individual comments and requires the unid of the node to which the comments belong and the ucids of the comments. Multiple comments may be requested up to the server's free_return_limit (usually <=50). If multiple comments are requested they must all be from the same node.
The response for this request will be almost identical to that of action=read save that none of the comments will be abbreviated.
Example request:
action=comment&unid=321&ucid=1234,1235,1238
- action=profile
-
This is a request for a user profile and requires only the user's uuid.
If the uuid refers to a valid user, the response will contain all of these elements, some of which may be empty:
- uuid This will be the uuid used to access the profile and is returned only for completeness. However, future versions of the server may provide a name lookup.
- name The name of the user will contain named entities of any XML characters which will need to be translated to actual characters if the name is to be used for anything other than display in an HTML document.
- registered The date and time the the account was registered.
- fullname This is suggested as the user's full name though there is no specific requirement for it to be so, nor is there a limit on what other information this field may hold other than to its length and that it will contain no HTML. Essentially this is the equivalent of a comment title.
- bio This is suggested to be any biographical information the user wishes to share and may be considered as a signature of sorts. There is not specific requirements or limits as to the content other than size. This is the equivalent of a comment body.
- accessed This is the last date and time that a token was requested for this account.
Example Request:
action=profile&uuid=1234
Example Response:
<profilelist> <profile> <uuid>1234</uuid> <name>John Smith</name> <registered>2009-01-01 00:00:01</registered> <accessed>2009-02-01 23:59:59</accessed> <fullname>John Smith</fullname> <bio>Some biographical details containing: <em>XML compatible HTML</em> and newlines.</bio> </profile> </profilelist>
Write Transactions
- action=preparelogin
-
User login requires two transactions.
action=prepareloginis the first and requires only the keyname. This should be the plain text name entered by the user in some form of login dialogue.If successful, there will be only two elements in the response:
- uuid This is the UUID that corresponds to user with the given name and should be recorder for all future write transactions.
- logintoken This is a token to be used in the next stage of the
login. Currently this is presented as a alphanumeric string of 32 characters from the class
[0-9a-zA-Z]. This range may be extended in future but will only include characters from the ASCII compatible, single-byte UTF-8 range and will exclude XML characters.
Example Request:
action=preparelogin&name=John Smith
Example Response:
<login> <uuid>1234</uuid> <logintoken>slWCfGUUCUiTsHgZLDBAvC7HBgIzYB2e</logintoken> </login>
- action=login
-
To complete the login the client follows the steps outlined in Encryption/Security starting with the
logintoken. Briefly, the client concatenates the password and user name, converts them to UTF-8, and produces an MD5 hash of the result. It then concatenates the result with the logintoken and produces an MD5 hash of this. The result is a unique, one-time code sent to the server with the keyhashthat the server can reconstruct for comparison from stored data. This should be sent along with the user's UUID to complete the login process.Login tokens are stored separately from tokens used in other transactions. This is intended to prevent DoS attacks on accounts that would otherwise prevent user's from posting comments by repeatedly requesting tokens, thus invalidating their last received token. Upon successful login the user receives a
loginidwhich authorises them to receive new tokens for general transactions. It is still possible to perform a DoS attack on an account that would prevent the user from logging in, but it is hoped that both client and server systems will prove stable enough that logging in will be an infrequent task for most users.The response for a successful login will contain the following elements:
- uuid Simply echoing the user's UUID.
- token this is the general use token for any following write transactions. It will
have the same characteristics as the
logintoken. - loginid This is an identifying code that authorises the user to receive a new token
at any time. Again, this will have the same characteristics as the
logintokenand should be stored until the user logs out.
Example Request:
action=login&uuid=1234&hash=bf06da66874cd299605691a64902776b
Example Response:
<login> <uuid>1234</uuid> <token>yESC8kFDur1d7qAAX8elMF4yAP1MaWaq</token> <loginid>Uj7f3KM08bde3jb67jBr471Nbad23mD4</loginid> </login>
- action=token
-
This action is used to request a fresh token from the server which replaces any existing token. Tokens are used to authorise all write actions and each token can only be used once. There may be occasions when a write transaction fails in such a way that the server is unable to replace a used token, perhaps because the connection is interrupted or times out. In these cases the client may request a new token using the user's
uuidandloginid.If successful, the response will contain a single element:
- token An alphanumeric string with characteristics identical to
logintoken.
Example request:
action=token&uuid=1234&loginid=Uj7f3KM08bde3jb67jBr471Nbad23mD4
Example response
<newtoken> <token>Oqs5xUgXyKXL5PcHDQIThToZRunmqEdE</token> </newtoken>
- token An alphanumeric string with characteristics identical to
- action=logout
-
This action requires the user's
uuid,hash(the standard MD5 encrypted password+name)+token as described above), and theirloginid. If successful this action will immediately log the user out of the server by replacing their loginid and token in the database with random values.The successful response will contain these elements:
- message A simple confirmation of logout which should be communicated to the user.
- uuid The uuid of the user logged out. As a positive integer that only appears on success, this may be used as the indicator for a completed logout.
Example request:
action=logout&uuid=1234&loginid=Uj7f3KM08bde3jb67jBr471Nbad23mD4&hash=1548180defb3d07992ab390f4ca62ecd
Example response:
<logout> <message>Logged Out!</message> <uuid>1234</uuid> </logout>
- action=post
-
This action is used to post new comments. Required keys are the standard authorisation details (
uuid,loginid,hash) plus the ucid of theparentcomment (zero for a new root comment). Thetitlemay be left blank if the user wishes. Thebodymust have some substance, even if it is merely an <img> tag. Since the body is sent in the body of the HTTP request it may contain raw newlines. Also required is either theunidof an existing node or theurlof a new node. When using the url, remember to percent-encode the whole string even if it already contains percent-encoded information.The response from the server on successfully posting a comment will be a complete comment and replacement token as described in action=read (though, as in action=comment, it will be neither abbreviated nor masked). this is provided so that the client can insert the new comment into the comment display as it will appear without the need to refresh from the node cache. Additionally it will provide a replacement token as described in action=token.
Example request:
action=post&uuid=1234&loginid=Uj7f3KM08bde3jb67jBr471Nbad23mD4&hash=057306522f72739cdd30f92424d5f1bb&unid=321&parent=0&title=New Root Comment&body=This is the multi-line body of a new comment.
Or:
action=post&uuid=1234&loginid=Uj7f3KM08bde3jb67jBr471Nbad23mD4&hash=057306522f72739cdd30f92424d5f1bb&url=http://boingboing.net/&parent=0&title=&body=This is the multi-line body of a new comment.
- action=edit
-
Editing a comment requires the
uuid,loginid, andhashfrom the user for authorisation, theunidanducidof the comment to be edited, and the newtitleandbody. The title and body details are treated by the sever in exactly the same way as posting a new comment.The response for this request is a complete comment and replacement token identical to action=post but the
editedelement of the comment will have a date and time later than that reported in thecreatedelement.Example request:
action=edit&uuid=1234&loginid=Uj7f3KM08bde3jb67jBr471Nbad23mD4&hash=8251013f07e22f1e958dbbaeaadf1a03&unid=321&title=&body=This is the multi-line body of an edited comment.
- action=test
-
This action simply allows the client to run the proposed content of a comment (the
titleandbody) through the Little Island HTML validator without actually posting the comment. Although this does not need to change anything on the server it still requires user authorisation with theuuid,loginidandhashvalues; this is simply to prevent the system being used as a "markup laundry" by other services. The title and body are treated exactly as thy would for a post or edit action.These are the elements in response to a successful comment test:
- title The validated title of the comment.
- body The validated body of the comment.
- created The current date and time. This can be used to show a valid time when displaying the test comment.
- token Replacement token.
Example request:
action=test&uuid=1234&loginid=Uj7f3KM08bde3jb67jBr471Nbad23mD4&hash=b4333fa8af21cc5681b7689462d1f492&title=Test Comment Title&body=The body of a test comment.
Example response:
<testcomment> <title>Test Comment Title</title> <body>The body of a test comment.</body> <created>2009-02-03 03:07:35</created> <token>xQ7x15aG5logtlt5sPLMSNACSuiM5mdK</token> <testcomment>
- action=delete
-
Deletion of a comment requires the
uuid,loginidandhashof the owner, and theunidanducidof the comment to be deleted.The deletion process is based on the condition of the comment. If there are any replies then the comment will be reduced to a shell leaving only the ucid, parent and thread. If there are no replies the comment is deleted entirely from the database. If the comment is deleted entirely and the parent comment is a shell then the same process will be applied recursively until it finds a shell comment with replies.
The response on success will contain these elements:
- ucid This is confirmation of the comment that was deleted. As a positive integer it may be used to test success.
- fulldelete This will be zero for a shelled comment and one if the comment was fully deleted. This should be used to update the client display without refreshing from the cache.
- token the replacement token
In addition, if any parent shell comments were deleted in the process the response will contain this element:
- alsodeleted This will be a comma delimited list of the UCIDs of the other deleted comments. These will all be fully deleted rather than shelled. This should be used to update the client's comment display.
Example request:
action=delete&uuid=1234&loginid=Uj7f3KM08bde3jb67jBr471Nbad23mD4&hash=0ec95c02b71eb2dd6f54389ec268726d&unid=321&ucid=3456
Example response:
<delete> <ucid>3456</ucid> <fulldelete>1</fulldelete> <alsodeleted>3455,3454,3444</alsodeleted> <token>bgvt3oIb9F2WGy8UN5y8lWtiJhTz4RGD</token> </delete>
- action=editprofile
-
This action allows users to update their profiles via the client. It functions in the same manner as action=edit only replacing the keys
titleandbodywithfullnameandbioalong with theuuid,loginidandhashof the owner.Naturally, in order to properly edit a profile the current profile must first be fetched from the server via action=profile.
One difference between editing a profile and editing a comment is that content is not mandatory, users may leave both fields of their profiles empty if they wish. When registering a new account the profile starts with null values.
A successfully edited profile will return a profile with elements exactly as in action=profile with only the addition of a replacement token.
Example request:
action=editprofile&uuid=1234&loginid=Uj7f3KM08bde3jb67jBr471Nbad23mD4&hash=614eba326c532eb7bb1bc4e2727cb831&fullname=John Smith's Edited Name&bio=Some edited biographical details containing: <em>XML compatible HTML</em> and newlines.
Example response:
<editprofile> <profile> <uuid>1234</uuid> <name>John Smith</name> <registered>2009-01-01 00:00:01</registered> <accessed>2009-02-01 15:10:28</accessed> <fullname>John Smith's Edited Name</fullname> <bio>Some edited biographical details containing: <em>XML compatible HTML</em> and newlines.</bio> </profile> <token>leznijZ9urQFwlvAkvN8UfVPLgW5Z1Qr</token> </editprofile> - action=testprofile
-
This provides the same service for profiles that action=test provides for comments. Like the comment test this action also requires the
uuid,loginidandhashof the user.The successful response will carry only the tested fullname and bio elements with a replacement token.
Example request:
action=editprofile&uuid=1234&loginid=Uj7f3KM08bde3jb67jBr471Nbad23mD4&hash=022ed2f3edea3023402cebbe364882f2&fullname=John Smith's Edited Name&bio=Testing some biographical details containing: <em>XML compatible HTML</em> and newlines.
Example response:
<testprofile> <fullname>John Smith's Test Name</fullname> <bio>Testing some biographical details containing: <em>XML compatible HTML</em> and newlines.</bio> <token>8e9qwqo354rms4NFZepEZRbiOXgbpeVm</token> </testprofile>