ART-DECOR® Hoover

image-20240717083735477Introduction

The history of versionable artifacts of a project is stored completely in the exist db, comprised of a reference wrapper (used for display in the front-end) and the changed content itself as “body”.

Every time an artifact is changed and will be saved, the old version is stored in the history along with some meta data, such as the time of change.

A project may have a collection in decor/history. They reside as several XML files named for example like TM.xml or VS.xml for the collection the history of changes of templates, value sets and other artifacts.

If the authors of a project do a lot of work, history can quickly become a large amount of data, that all reside in the database, fully indexed. On servers where this is the case, a smart cleaning-up action of the history files is appropriate. As a side note: we do have busy projects that have a total history size of more than 50 GB.

(Photo: © Depositphoto Igor Stevanovic)

The Introduction of ART-DECOR® Hoover

This situation led to the introduction of a "cleaning" option for large and old histories of projects in ART-DECOR. The code name for this feature was early picked: "Hoover" history. In fact there are even more "log" entries and legacy sets of data and artifact that may be cleaned-up over time when they become old, such as full releases of project publications.

"Hoover" has become a synonym for vacuum cleaner in some countries. This is the start of the background history of the brand Hoover.

The Hoover Company is a home appliance company founded in Ohio, United States, in 1915. It also established a major base in the United Kingdom; and, mostly in the 20th century, it dominated the electric vacuum cleaner industry, to the point where the Hoover brand name became synonymous with vacuum cleaners and vacuuming in the United Kingdom and Ireland. -- Wikipedia 2024, see https://en.wikipedia.org/wiki/The_Hoover_Company

Hoovering History

Internally these history items are stored as follows.

 <history
   date="2021-06-24T10:03:24.524+02:00" 
   authorid="kai" author="Kai Heitmann" 
   id="693a099d-c095-4264-ba09-9dc095b26408"
   intention="version"
   artifactId="2.16.840.1.113883.2.6.60.3.2.6.3"
   artifactEffectiveDate="2018-01-25T00:00:00"
   artifactStatusCode="draft" 
 >
   <!-- this is the real artifact in detail stored hereafter -->
   <concept id="2.16...2.6.3" effectiveDate="2018-01-25T00:00:00" statusCode="draft" type="item">
     <name language="de-DE">Patienten ID</name>
    ... all the body of the artifact if not hoovered, 
    ... or only the root element and attributes (on cocnepts and datasets
    ... also with the name to allow proper display on hoovered versions.
   </concept>
 </history>

Hoovering essentially means to gzip so far unhoovered artefact files that are outdated (ie their creation date is no longer within a specified period of days), store the gzip files in a subfolder of the project history file (hoover bag 😃) and replace each hoovered history item by a skeleton that looks like the following example.

Solution

A script and a collaborating XQuery has been created to “hoover” history of projects from time to time, e.g. by calling the external script by a cron job.

This action let references in the exist db as XML (actually the wrapper of an history item + addition metadata for the “new” location of the body content) and moves the actual content outside the scope of indexing into non-XML gzip files that reside in the same folder as the history files but in a subfolder called hooverbag.

A complete description is part of the system administration documentation of eXist scheduled jobs. The eXist scheduled job is OPTIONAL to be configured and should be considered when large and active projects has a big set of history items.

Last Update:
Contributors: dr Kai U. Heitmann