# Research Articles in Semplified HTML (RASH)

a subset of 32 HTML5 elements to create and share scholarly articles on the Web

Main citation: Peroni, S., Osborne, F., Di Iorio, A., Nuzzolese, A. G., Poggi, F., Vitali, F., Motta, E. (2017). Research Articles in Simplified HTML: a Web-first format for HTML-based scholarly articles. PeerJ Computer Science 3: e132. DOI: https://doi.org/10.7717/peerj-cs.132 (also available in RASH)

# What is RASH

The Research Articles in Simplified HTML (RASH) format is a markup language that restricts the use of HTML elements to only 32 elements for writing academic research articles. It allows authors to use embedded RDF annotations. In addition, RASH strictly follows the Digital Publishing WAI-ARIA Module 1.0 for expressing structural semantics on various markup elements used.

The development of RASH started from the whole HTML5 grammar, and proceeded by removing and restricting the particular use of HTML elements, to make them expressive enough for representing the structures of scholarly papers and to have the language totally compliant with the theory on structural patterns for XML documents. These patterns allow one to create unambiguous, manageable and well-structured markup languages and, consequently, documents, fostering increased reusability (e.g., inclusion, conversion, etc.) among different languages. Also, thanks to the regularity they provide, it is possible to perform easily complex operations on pattern-based documents even when knowing very little about their vocabulary (automatic visualisation of document, inferences on the document structure, etc.)

# RelaxNG grammar

The formal grammar of RASH has been developed by means of RelaxNG, which is a simple, easy to learn, and powerful schema language for XML, accompanied by a descriptive documentation. The grammar has been logically organised in four distinct logical blocks of syntactic rules, defining respectively elements, attributes, content models8 for the elements and their related attribute lists.

The 32 HTML5 elements that can be used in RASH are: a, blockquote, body, code, em, figcaption, figure, h1, head, html, img, li, link, math, meta, ol, p, pre, q, script, section, span, strong, sub, sup, svg, table, td, th, title, tr, ul.

In addition, RASH defines different ways to implement formulas. The standard specification for representing mathematics on the Web is MathML, which can be used in RASH. However, even if MathML is the best accessible way for writing mathematical formulas, the organisation of the elements for defining even a quite simple formula is quite verbose and this is a reasonable obstacle to its direct adoption. Thus, it is also possible to define formulas by means of an image (element img with @role = 'formula'), or by using the element span (with @role = 'formula') containing a LaTeX or AsciiMath formulas – that can be rendered correctly via MathJax.

# Validating RASH documents

Two applications has been developed in order to check whether a document is compliant with the RASH grammar:

• a Bash script that enables RASH users to check their documents simultaneously both against the specific requirements in the RASH RelaxNG grammar and also against the HTML specification through W3C Nu HTML Checker;

• a Python application that enables one to validate RASH documents against the RASH grammar, and it makes also available a Web interface for visualising all the validation issues retrieved in RASH documents.

# Visualising RASH on browsers

The visualization of a RASH document is rendered by the browser by means of appropriate CSS3 stylesheets and Javascript scripts developed for this purpose. In particular, RASH adopts external libraries, such as Bootstrap and JQuery, in order to provide the current visualisation and include additional tools for the user, such as the footbar with statistics about the paper (i.e., number of words, figures, tables and formulas) and a menu to change the actual layout of the page

In addition, RASH these scripts implements also the automatic rendering of paper items, such as references to a bibliographic entry or a figure, so as to reduce the cognitive effort of an author when writing a RASH paper by means of a text editor.