Dataset
Definition
DATASET
A dataset is a (hierarchical) list of healthcare concepts, a hierachical glossary of terms.
is the symbol used throughout the app.A DECOR project contains one or more datasets. When there are multiple datasets, they are different versions of the same dataset. It is not recommended to create datasets for different healthcare projects in the same DECOR project.
Datasets define of a set of realworld healthcare concepts. One can look at data sets as a “hierarchical glossary”. Dataset Concepts shall be well-defined and described and can be demonstrated with the following examples.
EXAMPLES
- Heart Frequency
- Blood pressure
- Age
- Plan of care
- Immunisation
- Body weight
- Body length
- Gender
- Person name
Datasets define of a set of realworld healthcare concepts, such as heart frequency, blood pressure, age, plan of care, immunisation, body weight etc. One can look at data sets as a “hierarchical glossary”.
The concept definition can be grouped so that ART-DECOR offers two types of concept: either group or item.
A group concept carries no value definitions, only child concepts, the items.
Item concepts have the same meta data plus a value domain, that is the declaration of the nature of expected data and that is populated with data in data collection situations.
Putting it together, the following example shows a small dataset that hierarchically arranges Vital Signs parameter.
In ART-DECOR, groups can be expanded or collapsed which helps to get better oriented in larger datasets. In the following example, the Person group is expanded and the Vital Signs group is collapsed.
The following graphic shows that situation with the example of Vital Signs. The green area reflects the definition (that is the work done with ART-DECOR). Definitions and Specifications is the scope of ART-DECOR.
Once the Specifications are published, implemented and used, the pink area is an example of data collection done in applications like apps or physician office systems or hospital information systems. Collecting or storing real data is out-of-scope of ART-DECOR and is provided by Healthcare Applications that have implemented the Specifications.
Concepts
Concepts are the building blocks of a DECOR Dataset. A concept may refer to a simple feature, as simple as a person's name or birth date.
Concepts may also be more complex and refer to other concepts. Referring to other concepts can be done by means of
- a hierarchical relation
- inheritance
- containment
Hierarchical relations are very common. An example is the concept of a Person, which is defined by child concepts like the person's name, birth date, social security name, gender, and so on. An example of a medical concept can be blood pressure which is defined by (among others) systolic, diastolic and mean blood pressure plus information about the measurement (method, date).
A concept can also have an inherit relation with another concept, thereby specializing it. Or it can reference another concept by mains of a contains relation.
Concepts are defined in datasets. Other DECOR areas reference them, for example scenarios.
In essence, a data set item concept has the following fields.
Name
Dataset Concepts shall be well-defined and described and can be demonstrated with the following examples.
EXAMPLES
- Heart Frequency
- Blood pressure
- Gender
- Person name
Dataset Concept Names may be multi-lingual.
Description
Dataset Concepts shall have a proper description, that is fit for persons in the team that are not healthcare providers.
EXAMPLE
Heart Frequency :: The number of times a heart beats within a certain period of time, typically within a minute.
Dataset Concept Descriptions may be multi-lingual.
Version Date and Label
(effective date)
Status
Concept Identifier
A Dataset Concept has an id (Oid) which uniquely identifies it. This is a simple reference to the destinct Dataset Concept that also has a name and a description – both of which can be given in various human languages – but which is always easier to be uniquely identifed by the Concept Identifier.
Synonym
A Dataset Concept may have multiple Synonyms, one or more terms (or abbreviations) that have the same meaning as the Concept itself
EXAMPLES
- Heart Frequency – Pulse, Pulsation
- Date of Birth – DOB
- Immunization – Vaccination
- Medical Drug – Medicine, Medication, Medicament
Comment
A Dataset Concept may carry a multi-lingual comment.
Type
As explained earlier, a concept can be a group concept that carries no value definitions, only child concepts, the items, or an item concept that have the same meta data plus a value domain, that is the declaration of the nature of expected data and that is populated with data in data collection situations.
Value Domain
The possible values a concept can have can be defined and/or constrained by means of the Value Domain. A Value Domain specifies the type for a concept (e.g., string, identifier, code, date, quantity) by assigning a Datatype. In this chapter, datatypes are explained as the nature of data you expect when collecting real world data. A more technical and complete list of datatypes can be found at the Datatypes page.
When the type of the Value Domain is coded, one can specify a Choice List, which acts as a list of valid options when collecting real world data.
EXAMPLE
The concept Gender has a Value Domain with type code and a concept list of three concepts: male, female and diverse.
Source
A source of a dataset concept can be mentioned in this field.
EXAMPLE
The daily Blood Pressure in the morning is typically recorded by the patient (=source of data).
The Base Excess is determined by a laboratory device (=source of data).
Operationalization
How to come to a proper measurement.
EXAMPLE
Determine the Body Weight in the morning before breakfast
Fasting Blood Glucose
Rationale
What is the rationale to have this dataset concept in this dataset.
EXAMPLE
This item is needed to be recorded for the Trauma Registry.
Relationship
Property
The Value Domain may have one or more properties associated with it. Typically one or more units for quantities are specified, along with optional minimum and maximum values, number of decimals, a default (but changable) or a fixed (unchangable) value.
Properties can be specified mutliple times, e. g. to allow multiple units with their respective min, max, decimals etc.
EXAMPLE
Body weight of a one-year old child may be measured
- in kg with a range of 0 .. 10 kg and no decimals allowed, or
- in g with a range of 0 .. 10000 g and 2 decimals are allowed.
Body size is in m with 2 decimals or in cm with no decimals.
Minimum and Maximum Ranges may be specified as well. It is an indication for later implementers of Questionnaires or other Rules to check and validate those ranges.
EXAMPLES
Base Excess: –2 .. +2 mEq/l
Bone density DEXA measurement with a T-Score may range from –7 .. +7
Hounsfield units (HU) as standard unit of x-ray CT density, in which air and water are ascribed values of -1000 and 0 respectively.
Datatypes
While most of the data types are self-explanatory such as string or text or date, coded data may carry a choice list offering a typical or exhaustive list of real world things that might be valid values for the item concept. Examples include the color of the iris (eye), which typically could have a choice list of “brown”, “blue”, “green” and “other”.
The following datatypes are available and supported.
count
Countable (non-monetary) quantities. Used for countable types. A count may also be a negative count.
EXAMPLES
number of pregnancies
steps (taken by a physiotherapy patient),
number of cigarettes smoked in a day
code
A system of valid symbols/codes, that substitute for specified concepts e.g. alpha, numeric, symbols and/or combinations, usually defined by a formal reference to a terminology or ontology, but may also be defined by the provision of text. Typically a symbol/code is expressed with a value for code, an identifier for the terminology or ontology it belongs to and at least one textual representation (display name).
ordinal
Models rankings and scores, e.g. pain, Apgar values, etc, where there is a) implied ordering, b) no implication that the distance between each value is constant, and c) the total number of values is finite.
Note that although the term ‘ordinal’ in mathematics means natural numbers only, here any integer is allowed, since negative and zero values are often used by medical professionals for values around a neutral point.
Scores are commonly encountered in various clinical assessment scales. Assigning a value to a concept should generally be done in a formal code system that defines the value, or in an applicable value set for the concept, but some concepts do not have a formal definition (or are not even represented as a concept formally, especially in questionnaires. Scores may even be assigned arbitrarily during use (hence, on Coding). The value may be constrained to an integer in some contexts of use.
EXAMPLES
-3, -2, -1, 0, 1, 2, 3 -- reflex response values (neurology)
0, 1, 2 -- Apgar score values (perinatology)
1, 2, 3, 4,... -- ASA classification (anestesiology)
I, II, III, IV, ... -- Tanner scale (pediatrics)
identifier
Type for representing identifiers of real-world entities.
EXAMPLES
drivers licence number
social security number
prescription id
order id
string
Any text item, without visual formatting.
text
A text item, which may contain any amount of legal characters arranged as e.g. words, sentences etc. Visual formatting and hyperlinks may be included.
date
Represents an absolute point in time, as measured on the Gregorian calendar, and specified only to the day. Semantics defined by ISO 8601. Used for recording dates in real world time. The partial form is used for approximate birth dates, dates of death, etc.
datetime
Represents an absolute point in time, specified to the second. Semantics defined by ISO 8601. Used for recording a precise point in real world time
EXAMPLES
the exact date and time of the birth of a baby
the origin of an history observation which is only partially known
NEW
timeNOTE
This datatype has been introduced in April 2020 first.
Represents a time, specified to the second (hh:mm:ss). Semantics defined by ISO 8601. Used for recording a real world time.
EXAMPLES
time of medication administration
starting/stopping a procedure
for approximate times, e.g. the origin of an history observation which is only partially known
decimal
Decimal number, rarely used because in most cases a decimal number is actually a quantity.
EXAMPLE
pi 3.14159265359
quantity
Quantitified type representing "scientific" quantities, i.e. quantities expressed as a magnitude and units. If not further specified with fractionDigits, a decimal number with optional decimal point. A quantity may also have a negative value.
EXAMPLE
Body weight: 80 kg
Body mass index: 22.1 kg/m²
Base Excess: –1.4 mEq/l
Heart frequency: 62 beats/min
There are some "special" quantities (used in healthcare), and explained elsewhere:
- for countable items, count is used
- for real numbers without a unit, decimal is used
- for time durations duration shall be used
Some additional quantities may be considered to be used as follows:
for monetary amounts, quantity is used but the units shall be currency units only, e. g.
EUR
orUSD
.for ratios of two physical quantities, use complex.
duration
Is a quantity, represents a period of time with respect to a notional point in time, which is not specified. A sign may be used to indicate the duration is “backwards” in time rather than forwards.
boolean
Items which are truly boolean data.
EXAMPLES
true / false
yes / no answer
complex
Non-atomic datatypes which are not explictly further defined in the dataset itself.
EXAMPLES
An address: 81 rue de Seine, 75006 Paris, France
A person's name: Dr François Percevais, MD
A ratio: 12 mg / 100 ml
Usually complex types are assumed to be well-known enough not to warrant further decomposition in the dataset itself.
blob
Things that are typically stored as binary objects in the computer world and need to be rendered appropriately.
EXAMPLES
images: like X-rays, computertomographic images
graphic: diagrams, graphs, mathematical curves, or the like – usually a vector image
icons: a sign or representation that stands for its object by virtue of a resemblance or analogy to it
pictures: A visual representation of a person, object, or scene – usually a raster image
PLANNED
currencyMonetary quantities
PLANNED
ratioA ratio of two Quantity values - a numerator and a denominator.
Datatype Facets
The datatypes can be further restricted using the following datatype facets:
Facet | Description | Example | Applies to |
---|---|---|---|
unit | Unit for quantities | kg mmol | quantity |
minInclude | Range min include for quantities | 1 .. 100 | count, ordinal, quantity, currency |
maxInclude | Range max include for quantities | 1 .. 100 | count, ordinal, quantity, currency |
fractionDigits | Fraction digits for quantities"1" for at least 1 or "1!" for exactly 1 | 2! | quantity |
timeStampPrecision | Precisions for timing specs, see below | date, datetime | |
default | Default value | all datatypes | |
fixed | Fixed value | all datatypes | |
minLength | Minimum length for strings | 7 | string |
maxLength | Maximum length for strings | string |
Facet timeStampPrecision is used for date / time aspects and takes the following values:
Value | timeStampPrecision |
---|---|
At least year (YYYY) | Y |
Only year (YYYY) | Y! |
At least month (MM) and year (YYYY) | YM |
Only month (MM) and year (YYYY) | YM! |
At least day (DD), month (MM) and year (YYYY) | YMD |
Only day (DD), month (MM) and year (YYYY) | YMD! |
At least day (DD), month (MM) and year (YYYY), hour (hh) and minute (mm) | YMDHM |
Moving Concepts in Datasets
... are done in the Data set Tree only, preferably by simple drag and drop actions. To start MOVE click on a concept and choose the MOVE button. Concept context is needed especially in large data sets for orientation. A special separate (elevated) dialog appears.
Moved items get a special flag* in the frontend. The move is saved by submitting the Skeleton Tree to the API. The API re-arranges the whole dataset according to the skeleton, surrounded by a lock.
Inheritance / Containment
Inheritance and Containment of Concept are an enhanced functionalty of Datasets and their Concepts. It is possible to transition an containment relationship into a true inheritance relationship or into a true copy.
These features is described in a specialized chapter here.
Dataset versioning
You can naturally version a dataset. Typically a versioned dataset get a new id and a new effective date.
The concepts in the new dataset version keep their id, but get a new effective date and inherit from the original concept. If a concept needs changes it may be disconnected (de-inherit) from its source concept so editing is possible.
The name and other properties of a concept constitute its definition. This definition should be governed. This means that any property in any language is under governance. When project wants to re-use (inherit) these concepts they inherit this definition as-is, and only comments are allowed additions. These comment cannot in any way shape or form alter the semantics of the original concept
In multi lingual settings: when a project wants to re-use concepts from a building block repository (BBR) that does not have defining properties in the same language as the project, then the project can do one of the following things:
- Accept the BBR concept as-is, potentially adding a comment
- Talk to the BBR governance group and work out an agreement whereby translations may be submitted The recommended procedure for a BBR governance group for submitted translations is to create a new version of the dataset that adds the translations. The governance group could, but this is not recommended, decide to add the translation to the original dataset. ART will not support this as BBR datasets should be final, and final objects cannot be edited.