Mapping the CFIHOS data

latest update: 2020-07-09


The source of most life-cycle data is either a relational data base or spreadsheets.
A method for mapping an instance of the CFIHOS relational data model to the format defined in ISO 15926-7/8 is given here.
This method consists of four phases:
  1. mapping all attributes and properties to either owl:ObjectProperty or owl:DatatypeProperty ;
  2. mapping all owl:ObjectProperty and owl:DatatypeProperty instances to ISO 15926-7 templates in ISO 15926-8 format;
  3. mapping functional diagrams (e.g. P&ID, One-line Diagram, Loop Diagram) to place the results of the second phase in the plant topology;
  4. mapping data at the source to ISO 15926-7/8
The result of
    - Phase 1 can serve as the basis for storing the handover data in a triple store and queying them with SPARQL.
    - Phase 2 is a fully defined information set with metadata (if required) not yet integrated because the topology of the P&ID and other functional diagrams lacks.
- Phase 3 is a complete integration of the handed over plant information to the CFIHOS data model and CFIHOS RDL.
    - Phase 4
is a complete integration of the handed over plant information that can be updated continuously, keeping the history in place.

Phase 1

First we look at the structure of any relational data model, like that of CFIHOS, and use the ISO 15926-2 entity type EnumeratedSetOfClass as a kind of bag, an unordered container. We allocate its members by means of rdf:type. So all tables are members of the data model, and each attribute is a member of its table.

For the "fundamentalists" this: It is correct to make an EnumeratedSetOfClass a member of another instance of EnumeratedSetOfClass. An EnumeratedSetOfClass is a ClassOfClass that is an enumerated set of the instances of Class. Enumerated means that the full set of members is specified. Since it is a ClassOfClass, which is a subtype of Class, it can be a member of another EnumeratedSetOfClass.

Building on this framework the CFIHOS data model, with its RDL (Reference Data Library) and pick-lists, as RDL sections, is represented by the following graph:

The CFIHOS data model and all RDL classes have been mapped this way.


In the diagram below an example mapping trail with populated model items is shown.

NOTE 1 - Since RDF works with single identifiers rather than the combinations of separate FK's used in the CFIHOS model, it was necessary to create concatenated identifiers where necessary. Where in RDF each N-triple exists autonomously, it is necessary to assign a UUID for all things. UUIDs are universally unique, so can be assigned without any controlling authority. Exception is made for things that are explicitly controlled, e.g. by ISO or CFIHOS. However, because CFIHOS uses certain (FK) attributes more than once, such as Plant code (FK) these got a modified unique code by prefixing them with the CFIHOS unique code of the table they are an attribute of.
NOTE 2 - In de code of Phase 1 the CFIHOS unique code cfihos:60001797 is shown rather than "piston pump". The reason for that is that in the "values" of the qualitative properties these values often are not unique. For example there are three identical values 'II' and three values 'water'. Another reason is that any misspellings are detected since these can't map to a CFIHOS unique code.

Mapping the data model attributes

The mapping of the attributes of the CFIHOS data model is done by declaring that attribute to be also an instance of owl:ObjectProperty or owl:DatatypeProperty. Then an rdfs:subPropertyOf is used to declare an rdf:Property with the required name. It inherits the rdfs:domain and rdfs:range of its superproperty.

If so wished it is then possible to declare an inverse of the latter by using the owl:inverseOf predicate. The domain and the range are interchanged then. However, inverse properties have not been modeled here.

An example, the attribute 'Originating company (FK)', dubbed 'originatorCompany(FK)' to RDF conventions, has been detailed below.
The Turtle code for this is
    id                                       :00000035.10000077 ;
    rdf:type                           :DOCUMENT-REVISION ; # this means that the attribute is a part of the DOCUMENT REVISION table
    rdf:type                           dm:ClassOfClassOfInformationRepresentation ;
    rdfs:subClassOf            dm:ClassOfInformationRepresentation ;
    rdfs:label                         "Originator company (FK)" ;
    skos:definition              "The name of the company who has generated the document revision"@en ;
    meta:valEffectiveDate "2020-05-22T00:00:00Z"^^xsd:dateTime .

    rdf:type                          owl:ObjectProperty ;
:00000035.10000077 ;   
rdfs:label                        "The name of the company who has generated the document revision"@en ;
    rdfs:domain                   :DocumentRevision ;
    rdfs:range                      :Company .

In order to avoid a mix-up these two worlds will be kept in two separate endpoints. That is work-in-progress.

Phase 2

In Phase 2 all 367 data model attributes + 630 RDL properties = 997 CFIHOS properties are to be mapped to ISO 15926-7/8 templates/specialized templates.
To a large extent this has been done in a preparational mode, but the formal code still has to be prepared. Many of the CFIHOS attributes and properties are conflated and result in the same template with one or two fixed variables, so the real number mappings is lower.

Phase 1 and Phase 2 compared

As explained elsewhere on this website the differences and (dis)advantages of the Phase 1 results vs Phase 2 results are:

Phase 1 results Phase 2 results
Relatively low effort, accepts data as they are made input
More preparational work, also to clear up many semantic vaguenesses
No meta data possible (e.g. status, rules, reliabiltiy, etc)
Any number of meta data possible
For batch snapshot purposes only
Supports integration of life-cycle information
Difficult to change or extend the scope
Fully extendable, for example to human and process activities