Dataset

Definition

DATASET

A dataset is a (hierarchical) list of healthcare concepts, a hierachical glossary of terms.

ART-DECOR undefined is the symbol used throughout the app.

A DECOR project contains one or more datasets. When there are multiple datasets, they are different versions of the same dataset. It is not recommended to create datasets for different healthcare projects in the same DECOR project.

Datasets define of a set of realworld healthcare concepts. One can look at data sets as a “hierarchical glossary”. Dataset Concepts shall be well-defined and described and can be demonstrated with the following examples.

EXAMPLES

  • Heart Frequency
  • Blood pressure
  • Age
  • Plan of care
  • Immunisation
  • Body weight
  • Body length
  • Gender
  • Person name

Datasets define of a set of realworld healthcare concepts, such as heart frequency, blood pressure, age, plan of care, immunisation, body weight etc. One can look at data sets as a “hierarchical glossary”.

The concept definition can be grouped so that ART-DECOR offers two types of concept: either group or item.

A group concept carries no value definitions, only child concepts, the items.

image-20220629122039761

Item concepts have the same meta data plus a value domain, that is the declaration of the nature of expected data and that is populated with data in data collection situations.

image-20220629122057888

Putting it together, the following example shows a small dataset that hierarchically arranges Vital Signs parameter.

image-20220629182011257

In ART-DECOR, groups can be expanded or collapsed which helps to get better oriented in larger datasets. In the following example, the Person group is expanded and the Vital Signs group is collapsed.

image-20220629182030609

The following graphic shows that situation with the example of Vital Signs. The green area reflects the definition (that is the work done with ART-DECOR). Definitions and Specifications is the scope of ART-DECOR.

Once the Specifications are published, implemented and used, the pink area is an example of data collection done in applications like apps or physician office systems or hospital information systems. Collecting or storing real data is out-of-scope of ART-DECOR and is provided by Healthcare Applications that have implemented the Specifications.

image-20220629181510055

Concepts

Concepts are the building blocks of a DECOR Dataset. A concept may refer to a simple feature, as simple as a person's name or birth date.

Concepts may also be more complex and refer to other concepts. Referring to other concepts can be done by means of

  • a hierarchical relation
  • inheritance
  • containment

Hierarchical relations are very common. An example is the concept of a Person, which is defined by child concepts like the person's name, birth date, social security name, gender, and so on. An example of a medical concept can be blood pressure which is defined by (among others) systolic, diastolic and mean blood pressure plus information about the measurement (method, date).

A concept can also have an inherit relation with another concept, thereby specializing it. Or it can reference another concept by mains of a contains relation.

Concepts are defined in datasets. Other DECOR areas reference them, for example scenarios.

In essence, a data set item concept has the following fields.

image-20220630173648044

Name

Dataset Concepts shall be well-defined and described and can be demonstrated with the following examples.

EXAMPLES

  • Heart Frequency
  • Blood pressure
  • Gender
  • Person name

Dataset Concept Names may be multi-lingual.

Description

Dataset Concepts shall have a proper description, that is fit for persons in the team that are not healthcare providers.

EXAMPLE

Heart Frequency :: The number of times a heart beats within a certain period of time, typically within a minute.

Dataset Concept Descriptions may be multi-lingual.

Version Date and Label

(effective date)

Status

Concept Identifier

A Dataset Concept has an id (Oid) which uniquely identifies it. This is a simple reference to the destinct Dataset Concept that also has a name and a description – both of which can be given in various human languages – but which is always easier to be uniquely identifed by the Concept Identifier.

Synonym

A Dataset Concept may have multiple Synonyms, one or more terms (or abbreviations) that have the same meaning as the Concept itself

EXAMPLES

  • Heart Frequency – Pulse, Pulsation
  • Date of Birth – DOB
  • Immunization – Vaccination
  • Medical Drug – Medicine, Medication, Medicament

Comment

A Dataset Concept may carry a multi-lingual comment.

Type

image-20220629122057888image-20220629122039761As explained earlier, a concept can be a group concept that carries no value definitions, only child concepts, the items, or an item concept that have the same meta data plus a value domain, that is the declaration of the nature of expected data and that is populated with data in data collection situations.

Value Domain

The possible values a concept can have can be defined and/or constrained by means of the Value Domain. A Value Domain specifies the type for a concept (e.g., string, identifier, code, date, quantity) by assigning a Datatype. In this chapter, datatypes are explained as the nature of data you expect when collecting real world data. A more technical and complete list of datatypes can be found at the Datatypes page.

When the type of the Value Domain is coded, one can specify a Choice List, which acts as a list of valid options when collecting real world data.

EXAMPLE

The concept Gender has a Value Domain with type code and a concept list of three concepts: male, female and diverse.

image-20220630180752207

Source

A source of a dataset concept can be mentioned in this field.

EXAMPLE

The daily Blood Pressure in the morning is typically recorded by the patient (=source of data).

The Base Excess is determined by a laboratory device (=source of data).

Operationalization

How to come to a proper measurement.

EXAMPLE

Determine the Body Weight in the morning before breakfast

Fasting Blood Glucose

Rationale

What is the rationale to have this dataset concept in this dataset.

EXAMPLE

This item is needed to be recorded for the Trauma Registry.

Relationship

Property

The Value Domain may have one or more properties associated with it. Typically one or more units for quantities are specified, along with optional minimum and maximum values, number of decimals, a default (but changable) or a fixed (unchangable) value.

image-20230923204148921

Properties can be specified mutliple times, e. g. to allow multiple units with their respective min, max, decimals etc.

EXAMPLE

Body weight of a one-year old child may be measured

  • in kg with a range of 0 .. 10 kg and no decimals allowed, or
  • in g with a range of 0 .. 10000 g and 2 decimals are allowed.

Body size is in m with 2 decimals or in cm with no decimals.

Minimum and Maximum Ranges may be specified as well. It is an indication for later implementers of Questionnaires or other Rules to check and validate those ranges.

EXAMPLES

Base Excess: –2 .. +2 mEq/l

Bone density DEXA measurement with a T-Score may range from –7 .. +7

Hounsfield units (HU) as standard unit of x-ray CT density, in which air and water are ascribed values of -1000 and 0 respectively.

Datatypes

While most of the data types are self-explanatory such as string or text or date, coded data may carry a choice list offering a typical or exhaustive list of real world things that might be valid values for the item concept. Examples include the color of the iris (eye), which typically could have a choice list of “brown”, “blue”, “green” and “other”.

The following datatypes are available and supported.

image-20220630181051054

count

Countable (non-monetary) quantities. Used for countable types. A count may also be a negative count.

EXAMPLES

number of pregnancies

steps (taken by a physiotherapy patient),

number of cigarettes smoked in a day

code

A system of valid symbols/codes, that substitute for specified concepts e.g. alpha, numeric, symbols and/or combinations, usually defined by a formal reference to a terminology or ontology, but may also be defined by the provision of text. Typically a symbol/code is expressed with a value for code, an identifier for the terminology or ontology it belongs to and at least one textual representation (display name).

ordinal

Models rankings and scores, e.g. pain, Apgar values, etc, where there is a) implied ordering, b) no implication that the distance between each value is constant, and c) the total number of values is finite.

Note that although the term ‘ordinal’ in mathematics means natural numbers only, here any integer is allowed, since negative and zero values are often used by medical professionals for values around a neutral point.

Scores are commonly encountered in various clinical assessment scales. Assigning a value to a concept should generally be done in a formal code system that defines the value, or in an applicable value set for the concept, but some concepts do not have a formal definition (or are not even represented as a concept formally, especially in questionnaires. Scores may even be assigned arbitrarily during use (hence, on Coding). The value may be constrained to an integer in some contexts of use.

EXAMPLES

-3, -2, -1, 0, 1, 2, 3 -- reflex response values (neurology)

0, 1, 2 -- Apgar score values (perinatology)

1, 2, 3, 4,... -- ASA classification (anestesiology)

I, II, III, IV, ... -- Tanner scale (pediatrics)

identifier

Type for representing identifiers of real-world entities.

EXAMPLES

drivers licence number

social security number

prescription id

order id

string

Any text item, without visual formatting.

text

A text item, which may contain any amount of legal characters arranged as e.g. words, sentences etc. Visual formatting and hyperlinks may be included.

date

Represents an absolute point in time, as measured on the Gregorian calendar, and specified only to the day. Semantics defined by ISO 8601. Used for recording dates in real world time. The partial form is used for approximate birth dates, dates of death, etc.

datetime

Represents an absolute point in time, specified to the second. Semantics defined by ISO 8601. Used for recording a precise point in real world time

EXAMPLES

the exact date and time of the birth of a baby

the origin of an history observation which is only partially known

timeNEW

NOTE

This datatype has been introduced in April 2020 first.

Represents a time, specified to the second (hh:mm:ss). Semantics defined by ISO 8601. Used for recording a real world time.

EXAMPLES

time of medication administration

starting/stopping a procedure

for approximate times, e.g. the origin of an history observation which is only partially known

decimal

Decimal number, rarely used because in most cases a decimal number is actually a quantity.

EXAMPLE

pi 3.14159265359

quantity

Quantitified type representing "scientific" quantities, i.e. quantities expressed as a magnitude and units. If not further specified with fractionDigits, a decimal number with optional decimal point. A quantity may also have a negative value.

EXAMPLE

Body weight: 80 kg

Body mass index: 22.1 kg/m²

Base Excess: –1.4 mEq/l

Heart frequency: 62 beats/min

There are some "special" quantities (used in healthcare), and explained elsewhere:

  • for countable items, count is used
  • for real numbers without a unit, decimal is used
  • for time durations duration shall be used

Some additional quantities may be considered to be used as follows:

  • for monetary amounts, quantity is used but the units shall be currency units only, e. g. EUR or USD.

  • for ratios of two physical quantities, use complex.

duration

Is a quantity, represents a period of time with respect to a notional point in time, which is not specified. A sign may be used to indicate the duration is “backwards” in time rather than forwards.

boolean

Items which are truly boolean data.

EXAMPLES

true / false

yes / no answer

complex

Non-atomic datatypes which are not explictly further defined in the dataset itself.

EXAMPLES

An address: 81 rue de Seine, 75006 Paris, France

A person's name: Dr François Percevais, MD

A ratio: 12 mg / 100 ml

Usually complex types are assumed to be well-known enough not to warrant further decomposition in the dataset itself.

blob

Things that are typically stored as binary objects in the computer world and need to be rendered appropriately.

EXAMPLES

images: like X-rays, computertomographic images

graphic: diagrams, graphs, mathematical curves, or the like – usually a vector image

icons: a sign or representation that stands for its object by virtue of a resemblance or analogy to it

pictures: A visual representation of a person, object, or scene – usually a raster image

currencyPLANNED

Monetary quantities

ratioPLANNED

A ratio of two Quantity values - a numerator and a denominator.

Datatype Facets

The datatypes can be further restricted using the following datatype facets:

FacetDescriptionExampleApplies to
unitUnit for quantitieskg
mmol
quantity
minIncludeRange min include for quantities1 .. 100count, ordinal, quantity, currency
maxIncludeRange max include for quantities1 .. 100count, ordinal, quantity, currency
fractionDigitsFraction digits for quantities"1" for at least 1 or "1!" for exactly 12!quantity
timeStampPrecisionPrecisions for timing specs, see belowdate, datetime
defaultDefault valueall datatypes
fixedFixed valueall datatypes
minLengthMinimum length for strings7string
maxLengthMaximum length for stringsstring

Facet timeStampPrecision is used for date / time aspects and takes the following values:

ValuetimeStampPrecision
At least year (YYYY)Y
Only year (YYYY)Y!
At least month (MM) and year (YYYY)YM
Only month (MM) and year (YYYY)YM!
At least day (DD), month (MM) and year (YYYY)YMD
Only day (DD), month (MM) and year (YYYY)YMD!
At least day (DD), month (MM) and year (YYYY), hour (hh) and minute (mm)YMDHM

Moving Concepts in Datasets

... are done in the Data set Tree only, preferably by simple drag and drop actions. To start MOVE click on a concept and choose the MOVE button. Concept context is needed especially in large data sets for orientation. A special separate (elevated) dialog appears.

image-20220629202605191

Moved items get a special flag* in the frontend. The move is saved by submitting the Skeleton Tree to the API. The API re-arranges the whole dataset according to the skeleton, surrounded by a lock.

image-20220630181837748

Inheritance / Containment

Inheritance and Containment of Concept are an enhanced functionalty of Datasets and their Concepts. It is possible to transition an containment relationship into a true inheritance relationship or into a true copy.

These features is described in a specialized chapter here.

Dataset versioning

You can naturally version a dataset. Typically a versioned dataset get a new id and a new effective date.

The concepts in the new dataset version keep their id, but get a new effective date and inherit from the original concept. If a concept needs changes it may be disconnected (de-inherit) from its source concept so editing is possible.

The name and other properties of a concept constitute its definition. This definition should be governed. This means that any property in any language is under governance. When project wants to re-use (inherit) these concepts they inherit this definition as-is, and only comments are allowed additions. These comment cannot in any way shape or form alter the semantics of the original concept

In multi lingual settings: when a project wants to re-use concepts from a building block repository (BBR) that does not have defining properties in the same language as the project, then the project can do one of the following things:

  • Accept the BBR concept as-is, potentially adding a comment
  • Talk to the BBR governance group and work out an agreement whereby translations may be submitted The recommended procedure for a BBR governance group for submitted translations is to create a new version of the dataset that adds the translations. The governance group could, but this is not recommended, decide to add the translation to the original dataset. ART will not support this as BBR datasets should be final, and final objects cannot be edited.
Last Update:
Contributors: dr Kai U. Heitmann