AI and Terminology
5. December 2024
Klaus Fleischmann
CEO
While summer may be over and colder fronts are moving in, AI remains a hot topic. We are actively researching the potential of AI with several internal experiments, customer projects, proof-of-concepts, and in intensive collaboration between our Professional Services division and the development teams. As our workshops at tekom and the LocWorld conferences in Dublin and Monterey demonstrate (not to mention the training courses we offer), we at Kaleidoscope have always been guided by the question of how AI can be used to increase the benefits of terminology both in our products and for our customers. We see great potential in the AI discussion for the language industry in general and terminology in particular, allowing terminologists and other language professionals to finally take the role their linguistic expertise deserves. After all, there is a reason why we speak of "large language models".
AI – The basics
As language experts, it is crucial that we are not only involved in internal company discussions on AI and language, but also clearly demonstrate our expertise. As this important topic must not be allowed to be hijacked by IT or other departments, we need to build up our own IT and AI knowledge in return in order to make constructive contributions with company-wide benefits.
AI – The knowledge
We have developed a workshop concept that provides a deep insight into AI as well as the meaning and use of terminology in AI applications within an organization in particular. The aim of the workshop is to provide language departments and/or terminologists with the knowledge they need to actively participate in discussions about the deployment of AI.
Our training course, which has already been met with great acclaim, shows how this transfer of knowledge can succeed.
AI in terminology
How can AI support terminology work and make it more efficient? We ask ourselves this question not only with regard to imparting knowledge, but also what concrete benefits there are for everyday working life.
Kalcium
All this is why we have already incorporated AI into our Kalcium platform. You can use prompts to generate certain metadata, such as definitions, but also additional information such as part of speech, subject area, and grammatical information. What's more, AI is extremely powerful in terminology verification and enables texts to be rewritten with correct terminology and zero grammatical errors. We are able to deliver these advantages thanks to our AI team, which has been successful in significantly refining the prompts supplied as standard. And as the market is obviously moving towards Azure OpenAI, we are now also supporting this provider. Our policy has always been to make use of AI services that already exist and not to engineer our own AI tool, which would be more expensive and, above all, more difficult to implement in terms of information security.
TermCatch
Term extraction has made enormous progress in the last two years, helping to solve some of the major challenges in the terminology field. Our Swedish partner company Fodina is playing a key role in these developments with its TermCatch software. TermCatch gives you the answer to the following questions:
- How do I capture all the terms used in my company?
- How do I find out which of them belong together and which are synonyms or variants? What data do I have from which to derive suggestions for preferred terms?
- Which terms do I prioritize? Proceeding alphabetically is not ideal.
- How can I keep my termbase up to date, e.g., when a colleague sends me an Excel list, or a new department makes its content available for terminology work?
- How can I check texts for new terminology before translation, make them consistent, or bring new terminology into the target languages in advance? Preferably directly in the translation project itself?
- And how do I get all this into my termbase in a controlled and coordinated way?
Ideal scenarios for the use of TermCatch.
What TermCatch can do:
- Extract term candidates from files or online content
- AI-supported grouping of variants and synonyms into clusters
- Supply metadata such as frequency, scoring, context, etc., to provide a more objective terminological basis for decision-making
- Generation of AI-powered metadata, including suggested definitions, subject fields, domains, grammatical information, part of speech etc.
Among the resulting use cases are:
- Initial creation of a database including comparison of synonyms and clusters in term candidates
- Extension of existing terminology through the possibility of superimposing new extraction results onto the existing terminology and finding synonyms and variants here too
- Standardization of several or non-uniform inventories by superimposing and comparing them as "term views". The clustering function is again fundamental here
TermCatch therefore offers several advantages:
- It gives you control over the terms used in your organization and which synonyms, variants, spellings, etc., exist
- TermCatch determines a score that provides a robust data-based foundation on which to make decisions on preferred terms and work prioritization. For example, term candidates with a certain score can also be created automatically
- It automates time-consuming tasks in the terminology workflow
- It is cloud-based and therefore soon ready for use
- It is integrated into Quickterm and can therefore be quickly docked to a database or the defined workflows
So, what exactly does Kaleidoscope offer here?
It has always been important to us not just to offer theoretical approaches, but to deliver truly concrete solutions through our software and services.
We have therefore entered into a partnership with Fodina and integrated TermCatch into our Kalcium platform. In addition to providing the engines and functionalities, a la OpenAI, for example, TermCatch also comes with an interface that is perfectly tailored to the requirements of these tasks. This means you can get started in no time at all and build or expand your terminology with AI support. Or you can simply hand over the entire project to us: many organizations have already found our service packages to offer exceptional added value.
We offer predefined service packages for term extraction at fixed conditions. You provide us with the data, and we return clustered term candidates, either in Quickterm, an Excel sheet, or as an importable TBX. If you do not yet have a Quickterm subscription, we will be happy to provide you with a license for the duration of the term validation project.
Terminology in AI – RAG or TAG?
Large language models (LLMs) – with the emphasis on language – open doors inside companies that were previously firmly closed to us language experts. The concept of language technology is even starting to infiltrate minds at board level, albeit maybe more superficially for now. We should seize this opportunity and position ourselves as experts within the company, because our knowledge and data can now achieve much more than "just" improving the translation process.
AI, and of course we mean generative AI here, has two major shortcomings: it hallucinates, and it does not know the preferred corporate wording. Both can be solved with terminology.
Of course, the "classic" approaches (if you can call two-year-old technology classic) are based on prompt engineering and retrieval augmented generation (RAG), but in our opinion neither approach is sufficiently suitable for the use of terminology. Prompt engineering does not access the latest state of knowledge of our terminology, and RAG is too unpredictable, complicated, slow, and costly.
We have therefore carried out extensive research into the powers of AI and now offer TAG: Terminology Augmented Generation. Instead of comparing prompts with fuzzy comparisons of inputs with rough chunks in vector databases, we have incorporated traditional terminology methods into the generation process. Using "normal" search methods to extract context from the termbase in real time via our Kalcium API, TAG is a much quicker and precise process than RAG
The difficulty at present is that terminology data is too extensive for simple TAG and also overburdens IT teams. Although LLMs can process formats such as TBX, JSON, or Markdown well, prose formats have proven to be particularly suitable.
We are therefore currently working on our own TAG endpoint, which will enable terminologists to configure completely new outputs and use their terminology knowledge to pull exactly the right data from the termbase. We are already saving these configurations in the form of retrieval profiles. In the new TAG endpoint, IT can then call the Kalcium API and specify the desired profile, allowing us terminologists to work together in harmony with the IT department, without IT having to take lengthy time out to thoroughly understand and process the terminology output.
Although the TAG endpoint will only be formally launched with Kalcium 6.7.1, it is already available for test projects if you are interested. The first projects are underway now.
The new business model of terminology
Justifying terminology work and the associated business model or ROI of terminology has always been a challenge. Terminology is a maturing process that has both qualitative and economic effects within a company. See also the new Terminology Maturity Model from CSA Research, which was developed on the initiative of Kaleidoscope, for more on this subject.
AI is a game changer for the business model. Not only does it make terminology work more efficient and therefore more cost-effective thanks to extraction, metadata generation, and improved checking functions, but terminology – including with TAG access – makes a huge contribution to the productive implementation of generative AI within an organization.
As a result, terminology is both cheaper and has greater corporate value: the costs decrease, the benefits increase. This should enable us to demonstrate the importance of terminology work in the company even more convincingly in the future.
AI and terminology - want to know more?
More about terminology
Kaleidoscope: Taking your content global
We combine our expertise and software solutions as well as those of carefully selected technology partners to create the right solutions to enable you to achieve success on the global market with your content. Thanks to our innovations and further developments, we continuously make it easier for you to manage terminology, quality, reviews, queries, and automation.
Contact us!