Research Articles in Semplified HTML (RASH)

a subset of 32 HTML5 elements to create and share scholarly articles on the Web

Main citation: Peroni, S., Osborne, F., Di Iorio, A., Nuzzolese, A. G., Poggi, F., Vitali, F., Motta, E. (2017). Research Articles in Simplified HTML: a Web-first format for HTML-based scholarly articles. PeerJ Computer Science 3: e132. DOI: https://doi.org/10.7717/peerj-cs.132 (also available in RASH)

What is RASH

The Research Articles in Simplified HTML (RASH) format is a markup language that restricts the use of HTML elements to only 32 elements for writing academic research articles. It allows authors to use embedded RDF annotations. In addition, RASH strictly follows the Digital Publishing WAI-ARIA Module 1.0 for expressing structural semantics on various markup elements used.

The development of RASH started from the whole HTML5 grammar, and proceeded by removing and restricting the particular use of HTML elements, to make them expressive enough for representing the structures of scholarly papers and to have the language totally compliant with the theory on structural patterns for XML documents. These patterns allow one to create unambiguous, manageable and well-structured markup languages and, consequently, documents, fostering increased reusability (e.g., inclusion, conversion, etc.) among different languages. Also, thanks to the regularity they provide, it is possible to perform easily complex operations on pattern-based documents even when knowing very little about their vocabulary (automatic visualisation of document, inferences on the document structure, etc.)

RelaxNG grammar

The formal grammar of RASH has been developed by means of RelaxNG, which is a simple, easy to learn, and powerful schema language for XML, accompanied by a descriptive documentation. The grammar has been logically organised in four distinct logical blocks of syntactic rules, defining respectively elements, attributes, content models8 for the elements and their related attribute lists.

The 32 HTML5 elements that can be used in RASH are: a, blockquote, body, code, em, figcaption, figure, h1, head, html, img, li, link, math, meta, ol, p, pre, q, script, section, span, strong, sub, sup, svg, table, td, th, title, tr, ul.

In addition, RASH defines different ways to implement formulas. The standard specification for representing mathematics on the Web is MathML, which can be used in RASH. However, even if MathML is the best accessible way for writing mathematical formulas, the organisation of the elements for defining even a quite simple formula is quite verbose and this is a reasonable obstacle to its direct adoption. Thus, it is also possible to define formulas by means of an image (element img with @role = 'formula'), or by using the element span (with @role = 'formula') containing a LaTeX or AsciiMath formulas – that can be rendered correctly via MathJax.

Validating RASH documents

Two applications has been developed in order to check whether a document is compliant with the RASH grammar:

Visualising RASH on browsers

The visualization of a RASH document is rendered by the browser by means of appropriate CSS3 stylesheets and Javascript scripts developed for this purpose. In particular, RASH adopts external libraries, such as Bootstrap and JQuery, in order to provide the current visualisation and include additional tools for the user, such as the footbar with statistics about the paper (i.e., number of words, figures, tables and formulas) and a menu to change the actual layout of the page

In addition, RASH these scripts implements also the automatic rendering of paper items, such as references to a bibliographic entry or a figure, so as to reduce the cognitive effort of an author when writing a RASH paper by means of a text editor.

Adding semantic annotations

The SPAR Xtractor Suite is a Java application that performs the automatic enrichment of RASH documents with RDFa annotations defining the actual structure of such documents in terms of the FRBR-aligned Bibliographic Ontology (FaBiO) and the Document Component Ontology (DoCO). SPAR Xtractor is designed as a one-click tool able to add automatically structural semantics to a RASH document.

In particular, SPAR Xtractor takes a RASH document as input and returns a new RASH document where all its markup elements have been annotated with their actual structural semantics by means of RDFa. The tool associates a set of FaBIO or DoCO types with specific HTML elements. The set of HTML elements and their associations with FaBIO or DoCO types can be customised according to specific needs of expressivity.

Community uptake

The list of some venues that have adopted RASH as submission format: