FAIR Terminology or UNFAIR?
31. January 2023
Klaus Fleischmann
CEO
Terminology data hold a significant portion of a company's knowledge and thus should be considered valuable and important. As such, it should follow the FAIR Guiding Principles for scientific data management and stewardship. These principles hold guidelines to improve the findability, accessibility, interoperability, and reuse of digital assets.
In many cases, I have found that what a company considers their "terminology" really does not adhere to these guidelines at all. These collections of word lists and glossaries typically are only created for an exceedingly small part of the organization, for example, localization or a particular software project or maybe even development sprint within one software project. Also, they are often kept in non-terminology repositories such as MS Excel. They are thus not
- findable,
- accessible,
- interoperable, or
- re-usable
Let's look at the FAIR Guiding Principles data management and how this relates to "true" terminology management, such as Kaleidoscope Quickterm, versus simple glossaries or word lists.
Please note that I am not saying that keeping these simpler forms is necessarily bad. Of course, a localization or project or even sprint specific glossary has its value. I am only relating them to the FAIR principles.
The FAIR principles have different requirements. As you can see in the table below, "findable" has four, "accessible" two, and so forth. We will work our way through all requirements, but clicking on one will take you directly to that topic.
FAIR Terminology - Table of Contents
FAIR Requirement Findable 1
FAIR
(Meta)data are assigned a globally unique and persistent identifier
Terminology
In terminology, every entry and term normally have a unique ID. This way, it can be shared among users and also systems. In Kalcium Quickterm, we use Universally Unique Identifiers (UUIDs) both on the concept and the term level, making sure we can differentiate not only concepts but also synonyms and homonyms.
Glossary
There is normally no identifier or universally valid ID. Glossaries are normally term-oriented, and the task of grouping synonyms around concepts is normally done by verbose fields such as comments or a field called "synonyms" This of course is not unique.
FAIR Requirement Findable 2
FAIR
Data are described with rich metadata (defined by R1 below)
Terminology
A termbase normally includes rich metadata ranging from source or context to complex categorization in predefined picklists. Even taxonomies can be used inside termbases. The goal is to clearly differentiate and explain the concept to the end user. Kalcium Quickterm offers a rich termbase schema modeler which allows for adapting the schema in real time to new use cases and requirements. Also, in Kalcium Quickterm you can store the purpose or a definition of the field values/purpose inside the termbase scheme and display these as tool tips in the termbase entries.
Glossary
A typical word list or glossary includes only minimal metadata aimed at clarifying the differences between different terms. Also, there normally is no innate description of the field's content or purpose.
FAIR Requirement Findable 3
FAIR
Metadata clearly and explicitly include the identifier of the data they describe
Terminology
Termbases usually are modeled around concept orientation. So, a concept in itself has one UUID, but it can have as many terms as needed. These are clearly nested underneath the concept and have their own ID. This resolves not only the issue of synonyms (various terms for the same concept) but also homonyms (the same term for several concepts). So, it is quite clear when you look at a term, or in particular at homonyms to which concepts they relate. Kalcium Quickterm goes one step further by allowing the user to also link concepts in hierarchical or abstract relationships (i.e., Concept Maps).
Glossary
Glossaries typically are term oriented and don´t take into account concept orientation. This is one of their main drawbacks. In a glossary, it is very difficult to differentiate synonyms, let alone homonyms, since they simply occur separately in the list with no explicit relation to one another.
FAIR Requirement Findable 4
FAIR
(Meta)data are registered or indexed in a searchable resource
Terminology
Certainly, being able to search for terms or the metadata is the fundamental purpose of a termbase. Kalcium Quickterm, for example, has five different search methods including morphological search. And on top of that, taxonomies and concept maps make terminology searchable and enable drilldowns and network-like navigation for machines and humans.
Glossary
In a glossary, the search is normally limited to the tool in which they are created, such as MS Excel or a simple database tool. The search methods often are only "string-based", making it impossible to search for similar or completely unknown terms.
FAIR Requirement Accessible 1
FAIR
(Meta)data are retrievable by their identifier using a standardized communications protocol
Terminology
While it is not the normal use case, terms and concepts can be retrieved by their UUID. In Kalcium Quickterm, you can do this via the search user interface, but also via an HTTP(s) call or of course via the API, for instance via the supported OpenSearch API. The normal use case, however, is to search for the actual terms. However, terms are not reliable UUIDs, since there can be synonyms or homonyms.
Glossary
Since there typically are no identifiers in a glossary, the only identifier to search for is the term itself. This is programmatically unreliable, since there are homonyms or synonyms, and searches could retrieve multiple or also no results.
FAIR Requirement Accessible 2
FAIR
Metadata should be accessible even when the data is no longer available
Terminology
It is not common to remove terms from a termbase. Instead, it would typically be flagged as "deprecated". This is another crucial difference between glossaries and terminology. FAIR terminology normally tries to collect and cluster as many synonyms as possible and labels the deprecated or outdated ones explicitly. This is done because both humans and machines (for example authoring or term-checking tools) also need to retrieve the no longer valid terms and be redirected to the valid ones.
Glossary
Glossaries typically list only allowed and current terms and do not store histories. Once a term is removed from a glossary, it is unretrievable, along with its metadata.
FAIR Requirement Interoperable 1
FAIR
(Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
Terminology
The area of terminology, unfortunately, has not agreed on a shared and broadly applicable language for knowledge representation. TBX is a common standard for pure terminology, while RDF or SKOS are standards for representing more relational and "knowledge" oriented terminology, including taxonomies and knowledge graphs. While Kalcium Quickterm uses a proprietary XML language internally, it can import and export both TBX as well as RDF or SKOS.
Glossary
Glossaries typically do not take into account compatibility with other systems based on uniform and shared languages. A frequent export-import format is CSV. Only the more sophisticated glossary tools support TBX. Normally, RDF or SKOS are not used in glossaries.
FAIR Requirement Interoperable 2
FAIR
(Meta)data use vocabularies that follow FAIR principles
Terminology
By following TBX, RDF, or SKOS, terminology complies with this requirement. Due to the open nature of these standards, and due to the very individual requirements of each organization that uses a terminology, however, TBX or RDF representations are not normally identical but can vary greatly. They, too, require a parsing or transformation step to make them interoperable.
Glossary
Glossaries typically include no metadata about the structure of itself. It is not intended to be interoperable or machine-readable without a parser explaining the logic of the table, for example.
FAIR Requirement Interoperable 3
FAIR
(Meta)data include qualified references to other (meta)data
Terminology
The ability of terminology to link to other data sources is vital and very widely used. This starts with being able to link inside the termbase itself, via cross-references or in a more sophisticated way, via Concept Maps or taxonomies. However, the requirement to link to other data sources within a company is a very frequent one and can be modeled very explicitly into the termbase scheme. Kalcium Quickterm, for example, has mechanisms to ensure the link is valid (for example links to images) and is not deprecated (e.g., if it no longer exists).
Glossary
Due to the technologically less sophisticated structure of glossaries, and due to the lack of IDs, it is normally difficult to link inside the glossary itself. Linking to external data sources could of course be achieved by a corresponding table logic. The tools normally used to manage glossaries, however, do not contain any technology to ensure these links are valid and up to date. Deleting a term, for example, will have no consequences on the cross-reference that pointed to it.
FAIR Requirement Re-usable 1
FAIR
(Meta)data are richly described with a plurality of correct and relevant attributes
Terminology
Termbase schemes, such as Kalcium Quickterm's, can theoretically hold as many metadata points as you want in order to supply sufficient metadata for all users and use cases (for example filtering, exporting, workflows, or also contextually different term verification). However, for different target user groups and target systems, different metadata can be USEFUL, which is the core of this requirement. Kalcium Quickterm manages this by creating group-specific views which can filter the useful metadata. Also, the Publishing Framework can filter and transform all the metadata points to produce exactly the required metadata set for the respective target system.
Glossary
Glossaries can only contain a limited number of metadata points, particularly if they are tabular. A word list or table with 5 metadata points in 20 languages simply gets too complex to handle. Also, since there typically are no configurable views, all users see all metadata points, which makes it extremely hard to focus on the values required for the respective use case.
Conclusion
While glossaries can serve their specific purpose of listing terminology used for a very limited use case, such as a localization or software development project, they do not meet the FAIR data principles. Therefore, if the terminology data is intended for company-wide or even a broader distribution or use, it needs to follow at least most of the FAIR principles. It becomes quite clear very quickly, that a glossary cannot fulfill this purpose and you need a true terminology system such as Kalcium Quickterm.
So, what's your take on FAIR terminology?
Kaleidoscope: Taking your content global
We combine our expertise and software solutions as well as those of carefully selected technology partners to create the right solutions to enable you to achieve success on the global market with your content. Thanks to our innovations and further developments, we continuously make it easier for you to manage terminology, quality, reviews, queries, and automation.
Contact us!