Setting up the eXist scheduler
Introduction
There are a couple of scheduled jobs that are required to guarantee proper working of servers and some optional ones that you might consider to add to your installation such as backup and consistency checks. We strongly recommend to follow the instructions here, install the REQUIRED scheduled jobs and check whether you need the RECOMMENDED ones.
General information
ART-DECOR's eXist database has a job scheduler based on Quartz, a full-featured, open source job scheduling system. If you want to read more, information of the general approach of eXist can be found here. We also compiled information about using cron-triggers in scheduled job definitions that can be found here.
In ART-DECOR environments the scheduled jobs are defined and set up in file conf.xml
in the the etc
directory of the eXist database root directory. If you followed our the instructions this will be thus in /usr/local/exist_atp/etc/conf.xml
.
The major topics in this documentation to be visted are
- Required preliminary configuration
- Processing Queues
- Refreshers
- Notifications
- Backup and Consistency checks
- Restart the database
Required preliminary configuration
This configuration is REQUIRED.
Open the conf.xml
file described in the General information section above. Find the closing XML element bracket </scheduler>
of the scheduler part. Paste the following comment text above the closing XML element.
<!-- ====================================================== -->
<!-- ============ ART-DECOR Release 3 Jobs BEGIN ========== -->
<!--
uses period in ms ...
period of 20s: period="20000"
...or cron-trigger, e.g.
cron-trigger every 4 hours: "0 0 0/4 * * ?"
cron-trigger every minute: "0 0/1 * * * ?"
-->
<!-- ====================================================== -->
<!-- ============ ART-DECOR Release 3 Jobs END ============ -->
<!-- ====================================================== -->
This helps you to immediatedly locate the ART-DECOR configuration part of the otherwise very long conf.xml
file for the database.
Now walk through the following sections and add the required parts and consider also to add the recommended ones.
Processing Queues
Project related requests processing
This eXist scheduled job is REQUIRED to be configured.
WHAT IT DOES
This adds a scheduled job that runs every 20 second, checks for project related requests such as compilation and execute them.
Add the following lines to the ART-DECOR Releasse 3 Jobs part in the configuration file...
<!--
Scan/process project related requests such as compilation every 20 seconds
-->
<job type="user" name="scheduled-tasks" xquery="/db/apps/api/modules/library/scheduled-tasks.xql"
period="20000" unschedule-on-exception="false"/>
...so that it looks like this
<!-- ====================================================== -->
<!-- ============ ART-DECOR Release 3 Jobs BEGIN ========== -->
<!--
uses period in ms ...
period of 20s: period="20000"
...or cron-trigger, e.g.
cron-trigger every 4 hours: "0 0 0/4 * * ?"
cron-trigger every minute: "0 0/1 * * * ?"
-->
<!--
Scan/process project related requests such as compilation every 20 seconds
-->
<job type="user" name="scheduled-tasks" xquery="/db/apps/api/modules/library/scheduled-tasks.xql"
period="20000" unschedule-on-exception="false"/>
<!-- ====================================================== -->
<!-- ============ ART-DECOR Release 3 Jobs END ============ -->
<!-- ====================================================== -->
Refreshers
Cache refresh
This eXist scheduled job is strongly RECOMMENDED to be configured.
WHAT IT DOES
This adds a scheduled job that refreshes the ART-DECOR Cloud Cache every 6 hours. If a refresh fails, the job is unschedules.
The scheduled job has two parameters that should be fixed for this refesh action.
- topic shall be valued
cache
- format shall be value
decor
There are other constellations described elsewhere.
To activate this scheduled job, add the following lines to the ART-DECOR Releasse 3 Jobs part in the configuration file. More information about using cron-triggers in scheduled job definition that can be found here.
<!--
Cache refresh every 6 hours, unschedule if fails
parameter topic : 'cache'
parameter format : 'decor'
-->
<job type="user" name="scheduled-refreshs" xquery="/db/apps/api/modules/library/scheduled-refreshs.xql"
cron-trigger="0 0 0/6 * * ?" unschedule-on-exception="true">
<parameter name="topic" value="cache"/>
<parameter name="format" value="decor"/>
</job>
Notifications
Periodic notifications
This eXist scheduled job is strongly RECOMMENDED to be configured.
WHAT IT DOES
This adds a scheduled job that runs every 10 minutes (600 seconds), checks for periodic notification requests, e.g. notifications on changed issues, compiles them and sends them out.
The scheduled job has two parameters that should be fixed for this action.
- sendmail shall be valued
true
orfalse
, where false is only meant to be the testing mode; in a true production environment this shall be always set to true. - mysender shall be a formal valid email sender address, XML escaped chars like
<
for<
; an example from out main server isART-DECOR Notifier <reply.not.possible@art-decor.email>
so that users receive their notification with this sender address.
WARNING
Make sure that the mysender email address for the sender at the server is set up correctly so that notification emails are not classfied as spam elsewhere and so won't reach recipients.
Typically you use a reply.not.possible address as you don't expect anybody sending replies to the notifications.
To activate this scheduled job, add the following lines to the ART-DECOR Releasse 3 Jobs part in the configuration file...
<!--
Scan/process for periodic notifications every 600 seconds (10 mins)
e.g. notifications on changed issues and release
parameter sendmail : true or false
parameter mysender : a formal valid email sender address, XML escaped chars like <
-->
<job type="user" name="periodic-notifier" xquery="/db/apps/api/modules/library/periodic-notifier.xql"
period="600000" unschedule-on-exception="true">
<parameter name="sendmail" value="true"/>
<parameter name="mysender" value="ART-DECOR Notifier <reply.not.possible@art-decor.email>"/>
</job>
Scheduled notifications
This eXist scheduled job is strongly RECOMMENDED to be configured.
WHAT IT DOES
This adds a scheduled job that runs every 12 minutes (720 seconds), checks for scheduled notification requests, e.g. (new) users to be notified about the username, password (reset) and his/her projects and sends them out.
The scheduled job has four parameters that should be fixed for this action.
- sendmail shall be valued
true
orfalse
, where false is only meant to be the testing mode; in a true production environment this shall be alsways set to true. - mysender shall be a formal valid email sender address, XML escaped chars like
'<'
for <; an example from out main server isART-DECOR Notifier <reply.not.possible@art-decor.email>
so that users receive their notification with this sender address. - accounting shall be a formal valid email address, XML escaped chars like
<
for<
that may get inquries or reply messages from users who received a notification; an example from out main server isART-DECOR Notifier <reply.not.possible@art-decor.email>
so that users receive their notification with this sender address. - myserverurl shall be the formal valid server URL that repesents your server. This URL is included in the notifiction message to inform the user about the server where he got credetials for, for example.
WARNING
Make sure that the mysender email address for the sender at the server is set up correctly so that notification emails are not classfied as spam elsewhere and so won't reach recipients.
Typically you use a reply.not.possible address as you don't expect anybody sending replies to the notifications.
Make sure that the myserverurl is a valid URL pointing to your server.
To activate this scheduled job, add the following lines to the ART-DECOR Releasse 3 Jobs part in the configuration file...
<!--
Scan/process for scheduled notifications every 720 seconds (12 mins)
e.g. (new) users to be notified about the username, password (reset) and his/her projects
parameter sendmail : true or false
parameter mysender : a formal valid email sender address
parameter accounting : a valid email address of accountings where to write emails to
parameter myserverurl : the server url for which this task runs, e.g. https://my-server.org
-->
<job type="user" name="scheduled-notifier" xquery="/db/apps/api/modules/library/scheduled-notifier.xql"
period="720000" unschedule-on-exception="true">
<parameter name="sendmail" value="true"/>
<parameter name="mysender" value="ART-DECOR Notifier <reply.not.possible@art-decor.email>"/>
<parameter name="accounting" value="ART-DECOR Accounts <accounts@art-decor.email>"/>
<parameter name="myserverurl" value="https://develop.art-decor.org"/>
</job>
Backup, Export and Consistency checks
There are a couple of option for backup, export and consitency checks.
Backup of the pure database files
This eXist scheduled job is RECOMMENDED to be configured.
WHAT IT DOES
This adds a scheduled job that runs a backup of the pure database files, zipped, every night at 1:00 am.
This will result in a file like 202211240100007.zip
containing files and directories in the data
folder like *.dbx
files etc. into the exist database subdirectory /data/backup
.
The scheduled job has two parameters.
- output-dir shall be valued
backup
; leave it as is in production environments as this is the expected location/path. - zip-files-max shall be an integer, e.g.
1
to hold a maximum of 1 zip file as backups or5
for example to hold 5 backups. Older backups will be deleted from directory output-dir.
To activate this scheduled job, add the following lines to the ART-DECOR Releasse 3 Jobs part in the configuration file. More information about using cron-triggers in scheduled job definition that can be found here.
<!--
Run a backup of the pure database files, zipped, every night at 1:00 am
This will result in a file like 202211240100007.zip containing *.dbx files etc in /data/backup
-->
<job type="system" name="databackup" class="org.exist.storage.DataBackup"
cron-trigger="0 0 1 * * ?">
<parameter name="output-dir" value="backup"/>
<parameter name="zip-files-max" value="10"/>
</job>
Full Export of the database
This eXist scheduled job is RECOMMENDED to be configured.
WHAT IT DOES
This adds a scheduled job that runs a consistency check and full export of the database every night at 2:00 am.
This will result in a file like full20221101-0201.zip
containing the exported files and the check report file in /data/export
.
The scheduled job has six parameters.
- output shall be valued
export
; leave it as is in production environments as this is the expected location/path. - backup shall be valued
yes
as we want a consistency check and an export. - zip shall be valued
yes
to zip the resulting export. - incremental shall be valued
no
as we want a full export. We recommend to do full nightly exports only. - incremental-check shall be valued
no
. If you want repeating checks and full + incremental backups, please see follwoing section. - max shall be an integer, e.g.
1
to hold a maximum of 1 zip file as exports or5
for example to hold 5 exports. Older exports will be deleted from directory output.
WARNING
Due to limitations of the zip format, archives larger than 4 gigabytes may not be readable. Consider to set the zip option to no
(see above) which will create a backup on the file system which has no such limitations.
To activate this scheduled job, add the following lines to the ART-DECOR Releasse 3 Jobs part in the configuration file. More information about using cron-triggers in scheduled job definition that can be found here.
<!--
Run a consistency check and export of the database every night at 2:00 am
This will result in a file like full20221101-0200.zip containing the exported
files and the check report file in /data/export
-->
<job type="system" name="check-backup" class="org.exist.storage.ConsistencyCheckTask"
cron-trigger="0 0 2 * * ?">
<parameter name="output" value="export"/>
<parameter name="backup" value="yes"/>
<parameter name="zip" value="yes"/>
<parameter name="incremental" value="no"/>
<parameter name="incremental-check" value="no"/>
<parameter name="max" value="4"/>
</job>
NOTE
You should use a Full Export of the database or a Full and Incremental Exports of the database as described here, not both.
Full and Incremental Exports of the database
This eXist scheduled job is OPTIONAL to be configured. If you have a heavily working server with lots of users this option might be considered.
WHAT IT DOES
This adds a scheduled job that runs a consistency check and full export every night at starting at 2:00 am and subsequent incremental exports of the database every 2 hours.
This will result in a file like full20221101-0201.zip
for the first full backup and inc20221101-1500.zip
containing the exported files and the check report file in /data/export
.
The scheduled job has six parameters.
- output shall be valued
export
; leave it as is in production environments as this is the expected location/path. - backup shall be valued
yes
as we want a consistency check and an export. - zip shall be valued
yes
to zip the resulting export. - incremental shall be valued
yes
as we want additional an incremental exports. We recommend to do full nightly exports only. The first backup will always be a full backup. Subsequent backups will be incremental: only resources which were modified since the last backup will be saved. - incremental-check shall be valued
no
as incremental backups should not do consistency checks because this may take too long. - max On incremental backup, create a full backup every max backup runs. For eaxmple, if you set the parameter to 2, a full backup will be performed after every two incremental backups. For our setting we recommend to set max to
12
or more. Note that the full and subsequent incremental backups all count in into thismax
count. If "every two hours" is set and max is 10, there will be another full backup after the full and 9 incremental backups have been created, resulting in two full "anti-cyclic" backups per day (which is not desired).
WARNING
Due to limitations of the zip format, archives larger than 4 gigabytes may not be readable. Consider to set the zip option to no
(see above) which will create a backup on the file system which has no such limitations.
To activate this scheduled job, add the following lines to the ART-DECOR Releasse 3 Jobs part in the configuration file. More information about using cron-triggers in scheduled job definition that can be found here.
<!--
Run a consistency check and full export of the database every night at 2:00 am
and incremental export every 2 hours.
This will result in a file like full20221101-0200.zip containing the exported
files and inc20221101-0400.zip for the incrementals
plus all the check report files in /data/export
-->
<job type="system" name="check-backup" class="org.exist.storage.ConsistencyCheckTask"
cron-trigger="0 0 2/2 * * ?">
<parameter name="output" value="export"/>
<parameter name="backup" value="yes"/>
<parameter name="zip" value="yes"/>
<parameter name="incremental" value="yes"/>
<parameter name="incremental-check" value="no"/>
<parameter name="max" value="12"/>
</job>
NOTE
You should use a Full Export of the database or a Full and Incremental Exports of the database as described here, not both.
Consistency check only
This eXist scheduled job is RECOMMENDED to be configured.
WHAT IT DOES
This adds a scheduled job that runs consistency check only every 3 hours. Typically a concistency check takes only a few seconds even for larger databases.
This will result in a check report file in /data/export; A backup is started only if inconsistencies are found.
The scheduled job has three parameters.
- output shall be valued
export
; leave it as is in production environments as this is the expected location/path. - backup shall be valued
no
as we require an export only when the consistency check failed. - zip shall be valued
yes
to zip the resulting export.
To activate this scheduled job, add the following lines to the ART-DECOR Releasse 3 Jobs part in the configuration file. More information about using cron-triggers in scheduled job definition that can be found here.
<!--
Run a consistency check only every 3 hours
This will result in a check report file in /data/export, a backup
is started only if inconsistencies are found
-->
<job type="system" name="check" class="org.exist.storage.ConsistencyCheckTask"
cron-trigger="0 0 0/3 * * ?">
<parameter name="output" value="export"/>
<parameter name="backup" value="no"/>
<parameter name="zip" value="yes"/>
</job>
Artifact History Hoovering
This eXist scheduled job is OPTIONAL to be configured and should be considered when large and active projects has a big set of history items.
WHAT IT DOES
The history of versionable artefacts of a project is stored completely in the exist db, comprised of a reference wrapper (used for display in the front-end) and the changed content itself as “body”.
Some projects have an enormous amount of detailed history files that all reside in the database, fully indexed. That is not a desirable situation.
The API library has a “hoover” mechanism implemented. The “hoover” activity allows to zip parts of the history and just leaving a skeleton in the index to be displayed in the History Panel.
The scheduled job has three parameters
- threshold: total size of a project history in MB when hoover shall take place; the value shall be between 10 and 500 MB,
- project: the project(s), expressed as prefixes, for which the hoovering shall take place, either
"*ALL*"
which processes all folders in the history folder (see $setlib:strDecorHistory), or it is a single project (ending with “-
”), or a list of projects such as"demo5- demo3- prsb03-"
, all separated by blanks; empty is not allowed, - action: the actual action to be taken, either list-history or hoover-history.
An example in etc/conf.xml
<!--
Run “hoover” mechanism
This will tidy-up the project history folders and zip older entries
-->
<job type="user" name="scheduled-hoover-sth" xquery="/db/apps/api/modules/library/scheduled-hoover-sth.xql"
cron-trigger="0 0 6 2 * ?" unschedule-on-exception="true">
<parameter name="threshold" value="50"/>
<parameter name="project" value="demo5- demo3- prsb03-"/>
<parameter name="action" value="hoover-history"/>
</job>
The typical cron trigger is cron-trigger="0 0 6 2 * ?
which means every 2nd day of a month at 6:00 in the morning.
NOTE
Hoovering – as in real live 😃 – can be a strenuous and time-consuming action. Don’t do this during normal operation hours or running at a time where it interferes with backup procedures, for example.
Restart the database
After walked through the instruction chapters above stop the database service...
systemctl stop eXist-db.service
...and then immediately start the system again to reload your new database server configuration.
systemctl start eXist-db.service
...
systemctl status eXist-db.service
This concludes the database configuration.