The journey towards shared meaning and organizational knowledge
At its core, data is just that: data. Disintegrated numbers, letters, or symbols that provide little meaning on their own. Ordering, structuring, and presenting these data points in a useful manner and context makes them informative, enabling audiences to extract meaning from them and make sense of the world. Given the abundance of data that asset managers face daily, it matters that data per se is not actually that useful. Instead, people need to adopt a broader perspective, assigning meaning and relationship context to data. Only by doing so can they form a holistic, non-siloed data strategy across the entire organization.
The crux of the problem is the siloed approach to managing information. Every in-house software application has been built with a specific business function in mind, implying that most software applications have their specific way of organizing data (database model). In addition, the meaning of the data varies from one application to the next because they were built to specifications which reflected the understanding of data (and data usage) in a narrow business context at one point in time.
Going forward, I believe that we should adhere to two principles:
The first one is that establishing a clear and shared meaning of particular data points should be a priority.
Secondly, the relationships of a particular data point to other data points is just as important as the individual meaning.
The aim towards meaningful data
By meaningful data I mean data that can be understood both by humans and machines: we cannot allow ourselves to restrict our ambitions to the mere interoperability of computer systems without considering the need for humans to fully understand the data. For the sake of entertainment, think of concepts and ideas when thinking about data. The name of a fund represents a string of information i.e., data. Thus, what becomes interesting is what exactly we mean when we refer to a fund.
The aim is by no means to propose a novel scientific approach but rather to make use of philosophical concepts and how these can be applied in computer science to allow asset management practitioners to think of this subject in a different way or view the problem from a different angle.
The current challenge the industry—composed of individual firms—is facing with data is a problem with two dimensions. The first dimension is that the people within an asset management firm define concepts expressed as data differently. They do not share a common language, which is a prerequisite for understanding. Secondly, the computer systems within a specific asset management organization will define the meaning of data differently because they have been built by people who did not share a common language. This results in data incomprehension both at a human level and at a system level.
Far be it from me to suggest the creation of a common language across the industry, yet a first step should be the creation of a common understanding of meaning inside specific firms, which already constitutes a significant challenge not to be underestimated.
Let us think about what we mean by meaning and knowledge. We acquire knowledge through understanding meaning—meaning we acquire through understanding concepts that can be expressed through symbols, e.g. words. Humans use informal language to express concepts. Machines, conversely, require formal language (e.g. mathematics) to express concepts. Thus, if our objective is to radically augment the understanding of data, we must strive to keep this duality of human and machine understanding in mind.
Thomas Davenport from IBM once famously quipped “people can’t share knowledge if they don’t speak a common language”. But what does it mean to speak a common language? The first step is to agree on symbols (alphabet) and concepts. This is the field of syntax. Then we would need to agree on literal and contextual meaning. This is the field of semantics. A next step would be an agreement of the classification of concepts (i.e. taxonomy) and a shared understanding of the associations and relations of these concepts (like a thesaurus). Finally, we may need to agree on which relations make sense and are allowed, which is what could be defined as an ontology.
I believe that if we are to achieve knowledge through an understanding of the meaning of concepts, expressed contextually and temporally through data used in asset management, firms should invest in a better understanding of syntax, of semantics, of taxonomy and of ontology in the context of asset management. This understanding should not be restricted to computer scientists but should be shared across the whole of the organization, at different levels of complexity.
I challenge anyone to find a common answer in one firm to the simple question: “What is a product?”.
An important step to meaningful data is to consider the intellectual groundwork provided by computer science in the field of ontology. I stress the word consider, which entails a form of skepticism, realism, and curiosity rather than embracing ontological theory without due restraint. What can we learn from this particular discipline, which is a subset of information science? In philosophy, ontology has traditionally been defined as the theory of what exists. It is the study of entities in their reality and the relationship between these entities. Essentially, ontology as part of metaphysics deals with a systematic account of reality and existence. In recent times, the use of the term ontology has become prominent in computer science.
A useful definition of ontology has been provided by the very prominent computer scientist Thomas Gruber: “An ontology is an explicit, formal, specification of a shared conceptualization […] For computer systems, what exists is what can be represented”. At this stage, it is important to define the meaning of each of these words: explicit, formal, specification, shared and finally, conceptualization.
In the context of asset management, the word conceptualization refers to the creation of an abstract model that would represent the reality of that industry or, more realistically, of the individual firm. The reality would strive to define entities and the relationships between these entities. An investment fund, for example, would be an entity (concept) or benchmark. And then, what would be the relationship between the fund and the benchmark?
The objective of this conceptualization would be the sharing of knowledge. Here again I am insisting on the dual objective of sharing this knowledge with humans and machines. In order to represent knowledge that can be shared between machines, that representation needs to be formal.
Building a common data language
The first word to define, in the definition provided by Gruber, is the word, explicit. This means that all concepts should be defined because leaving one concept undefined would mean that concept could take up any number of different meanings. Here, we are in the realm of semantics and there is an element of subjectivity, because the concepts that I will give as an example could be defined differently across different asset managers. Here are four examples:
Investment vehicle: An investment vehicle is a product designed by an asset manager to offer investment solutions to its investors. It can take many forms such as an umbrella investment company or a stand-alone common fund.
Fund: A fund is the aggregation of pooled capital from multiple investors which is invested according to a prescribed investment policy.
Portfolio: A portfolio is the total of all assets held by a fund manager as part of the delivery of an investment service in line with a prescribed investment policy.
Share class: A share class or unit class is a group of investors who pooled their capital into the fund under the same conditions (class type), the same currency and pay-out model.
Explicit in this case entails defining every concept within the domain of knowledge i.e. asset management company X. The word shared reminds us of the importance of consensus across the ontology, consensus on the meaning of the concepts, of the relations between the concepts, their specific name/value attributes and the logical rules that limit these relationships (e.g. an ISIN cannot be shared by two investment funds).
This shared consensus that we advocate within asset management organizations represents the single largest impediment to achieving the objective of shared understanding of meaning of concepts expressed as data points across staff and computer systems. The representation of the knowledge domain, the ontology, is twofold:
1) The organization should determine the classification of concepts (objects) into classes, subclasses, entities, relationships between the classes and finally the properties linked to each class.
2) This can be represented through a graph commonly referred to as a knowledge graph. The graph will illustrate the objects called nodes and their relationships called edges. The relationships can be described through language and hence be called semantic relationships because the sentence using informal or formal syntax will prescribe a literal meaning that can be understood by a human and a machine.
Let us use a simple example: A portfolio of investments is a class; a benchmark is a class. The semantic relationship between the two classes can be one-directional or two-directional. A one-directional relationship would be a portfolio of investments that uses a benchmark.
At this point, I would like to stress that the effort of knowledge representation (abstract modeling of a reality i.e. asset management) implies a radical change in the way asset managers think of data.
Such knowledge representation should deal with large complex parts of the organization and the larger that effort, the more likely it is that classes, attributes, and relationships will be represented that refer to data that lives uniquely or often duplicated in a number of distinct systems without sharing a common meaning of the data.
Here we have finally reached the core of the problem: staff and systems within one particular asset management firm will not share a common language around the key concepts that define their reality. The problem is not data, the problem is much more significant—do we share a common understanding across our organization?
The data challenge is the result of a lack of shared comprehension of the concepts used on a daily basis and the processes which use these concepts to create value for investors. In essence, the challenge is not only of a technological nature. The digitalization imperative facing the asset management industry requires firms at an individual level to take a step back and reflect on what knowledge means, how it is shared, enhanced and expanded between humans and humans, between humans and machines and between machines and machines.
These challenges are not insurmountable but require a period of reflection and also a healthy interest in how other communities of professionals, especially scientists, have and still are continuously attempting to model the physical reality that we call life.
“Data is the new gold” may be an accurate saying, but it represents a consequence of intelligent abstraction of both concepts and processes. Data is just the expression of the level of conceptualization maturity that the organization can fathom. It is sometimes accurate, meaningful, and shared but unfortunately often lonely, misleading, and potentially misunderstood. To put an end to the cacophony of data, I believe that asset management firms need to work on their proprietary syntax, semantics, taxonomy, thesauri, and ontology. In other words, they need to develop proper frameworks for sharing and internalizing knowledge that helps establish a common context or frame of reference.
The benefits of this approach would be quite significant. Each asset management firm would be able to fully reap the benefits of digitalization. Obviously, this will not happen overnight, and continuous effort will be required by those who deeply understand their firm’s internal value creation processes. By value creation I mean the process of transforming inputs into outputs: from investment analysis to delivering rewards to investors via the distribution of a product (fund or segregated mandate). Data is a concrete expression of the concepts underpinning this value creation process. The concepts are not immutable and can change over time.
Thinking is the new gold. There is nothing fundamentally new about this. But thinking as a group, with shared concepts and deeper understanding, might just be the new black.