The Research Articles in Simplified HTML (RASH) format is a markup language that restricts the use of HTML elements to only 32 elements for writing academic research articles. It allows authors to use embedded RDF annotations. In addition, RASH strictly follows the Digital Publishing WAI-ARIA Module 1.0 for expressing structural semantics on various markup elements used.
The development of RASH started from the whole HTML5 grammar, and proceeded by removing and restricting the particular use of HTML elements, to make them expressive enough for representing the structures of scholarly papers and to have the language totally compliant with the theory on structural patterns for XML documents. These patterns allow one to create unambiguous, manageable and well-structured markup languages and, consequently, documents, fostering increased reusability (e.g., inclusion, conversion, etc.) among different languages. Also, thanks to the regularity they provide, it is possible to perform easily complex operations on pattern-based documents even when knowing very little about their vocabulary (automatic visualisation of document, inferences on the document structure, etc.)
The formal grammar of RASH has been developed by means of RelaxNG, which is a simple, easy to learn, and powerful schema language for XML, accompanied by a descriptive documentation. The grammar has been logically organised in four distinct logical blocks of syntactic rules, defining respectively elements, attributes, content models8 for the elements and their related attribute lists.
The 32 HTML5 elements that can be used in RASH are:
In addition, RASH defines different ways to implement formulas. The standard specification for representing mathematics on the Web is MathML, which can be used in RASH. However, even if MathML is the best accessible way for writing mathematical formulas, the organisation of the elements for defining even a quite simple formula is quite verbose and this is a reasonable obstacle to its direct adoption. Thus, it is also possible to define formulas by means of an image (element
@role = 'formula'), or by using the element
@role = 'formula') containing a LaTeX or AsciiMath formulas – that can be rendered correctly via MathJax.
Two applications has been developed in order to check whether a document is compliant with the RASH grammar:
a Bash script that enables RASH users to check their documents simultaneously both against the specific requirements in the RASH RelaxNG grammar and also against the HTML specification through W3C Nu HTML Checker;
a Python application that enables one to validate RASH documents against the RASH grammar, and it makes also available a Web interface for visualising all the validation issues retrieved in RASH documents.
In addition, RASH these scripts implements also the automatic rendering of paper items, such as references to a bibliographic entry or a figure, so as to reduce the cognitive effort of an author when writing a RASH paper by means of a text editor.
The SPAR Xtractor Suite is a Java application that performs the automatic enrichment of RASH documents with RDFa annotations defining the actual structure of such documents in terms of the FRBR-aligned Bibliographic Ontology (FaBiO) and the Document Component Ontology (DoCO). SPAR Xtractor is designed as a one-click tool able to add automatically structural semantics to a RASH document.
In particular, SPAR Xtractor takes a RASH document as input and returns a new RASH document where all its markup elements have been annotated with their actual structural semantics by means of RDFa. The tool associates a set of FaBIO or DoCO types with specific HTML elements. The set of HTML elements and their associations with FaBIO or DoCO types can be customised according to specific needs of expressivity.
The list of some venues that have adopted RASH as submission format: