Semantic Modelling

Neither written languages nor formal programming languages are capable of representing knowledge in a human friendly format. Even though Semantic Web technologies attempt to offer assistance in this area, their scope of applicability is limited to the role of establishing crude links between elements of knowledge in the public domain. Making knowledge tangible and easily accessible requires new techniques, and dedicated technologies.

Semantic modelling is a special form of modelling that is based on so called semantic identities, which serve the purpose of identifying all the distinct concepts and facts about concepts that make up a so called semantic domain. For example, in the semantic domain of Customer Relationship Management, “customer”, “contact”, “ is an active customer”, “this week’s pre-sales meetings”, “organisational unit”, and “Joe Bloggs” are all examples of semantic identities. Semantic models (representations) consist of the connections between semantic identities that are relevant from the view point of a specific role. In semantic modelling the concern of giving human-friendly names (and other symbols such as icons) to semantic identities can easily be separated from the concern of modelling (creating connections between semantic identities).

The semantic approach allows the semantic identity “customer” to be used in a database model, in models of various user interfaces, as well as in various algorithms. In a semantic model a change in the name of the semantic identity from “customer” to “client” has no impact on the models that reference the semantic identity. The distinction between naming and modelling also allow a single semantic identity to be associated with several names that relate to different view points in the organisation.

Most modelling technologies do not implement a clean notion of semantic identity. The minimum prerequisite for semantic modelling is a model repository that relies on universally unique identifiers (UUIDs). However, to enable true semantic modelling, a model repository must allow reuse of semantic identities across the multitude of models that are required to capture various view points that exist within an organisation. In the context of data management, semantic identities provide the foundation for impact analysis and data lineage calculations.

It is important to realise that the semantics of a concept evolve over time. Each time new application functionality is developed that makes use of the “insurance policy” concept, the semantics of “insurance policy” within the organisation are extended. Additionally, the semantics of the concept “insurance policy” in company A differ from the semantics of the same concept in company B. Company A may only sell car insurance policies, and company B may have several lines of business. In the context of the systems of company A, there are no semantics for policies that relate to health insurance. Conversely, in the context of company B, the only semantics for car insurance policies are those that relate to the specific products offered by company B.

These observations illustrate that semantic modelling in the context of a given organisation differs from modelling in the Semantic Web, which is an attempt to capture the common sense semantics that people associate with vocabulary in the public domain. The Semantic Web can be viewed as a lowest common denominator for semantic modelling in scenarios that involve interoperability between different organisations.

The limitations of natural language

Natural language, whether verbal or in writing, involves encoding of concepts and ideas in a linear, one-dimensional string of abstract symbols (words represented in sound waves or in the familiar alphabet). This format is useful for transmission in a carrier medium such as air or light, but it is quite different from the format for representing knowledge that is used by the human brain. Over the last 30 years cognitive scientists have gathered a large body of empirical evidence from neurobiology, genetics, and linguistics, that points to a multi-dimensional format of knowledge representation, which in turn can be represented using mathematics.

This first-level approximation of the format in which the human brain represents knowledge that is at our disposal today is extremely useful, but it is not to be confused with the big challenge of fully understanding the human brain. The immediate practical application is much more tangible: the development of human-friendly two- and three-dimensional notations that are capable of capturing thoughts and ideas in an intuitive and elegant form, leaving behind the limitations of a one-dimensional transmission format.

The volume of visual information processed by the human brain is at least 20 times larger than the volume of audio information processed. It is no accident that knowledge workers make extensive use of diagrams and models in discussions and meetings – simply imagine the alternative, say a meeting with an architect, and a specification of a building in written language, without any diagrammatic assistance.

Computer Aided Design (CAD) and Computer Aided Manufacturing (CAM) systems are playing a very important role in improving the degree of automation, by providing a direct translation from diagrams into hardware and software artefacts. Yet, in order for such tools to perform their task, they need to be configured to the specific organisational context in which they are used. Core organisational and operational knowledge is recorded in technical configuration languages. Banks and insurance companies increasingly use similar techniques (company specific configuration languages) to express new product designs. Often these configuration languages again make use of a one-dimensional format, and thereby significantly limit the understandability and maintainability of core knowledge.

In an attempt to make the knowledge we encode in software systems more accessible, we describe the intent of computer programs in natural language and in the form of diagrams. Unfortunately, the vast majority of diagrams created (excluding a few examples from database design, robotics, and other specialised domains) are not automatically processed by a machine that produces the desired/specified software. Instead, the diagrams remain disconnected, and it is again a human brain that must translate the desired intent into one-dimensional specifications that can be executed by a computer, using a highly time-consuming and error-prone process.

The challenge lies in breaking the habit of relying too heavily on a format (written language) that humans have used for thousands of years. In order to improve maintainability and understandability, we need to re-cast knowledge in new formats.

The Cell Platform is a semantic modelling technology that allows migrating from one-dimensional representations of knowledge to multi-dimensional representations that are machine processable and human-friendly at the same time.

If you would like to know more, feel free to contact us.