The real world is uncertain, inconsistent, and incomplete. When people interact with the real world, data from their inbuilt physical sensors (eyes, ears, nose, tongue, skin) and mental sensors (happiness, guilt, fear, anger, curiosity, ignorance, and many more) get magically converted into insights and actions. Even if these insights are shared & common, the actions may vary across people.
Example: When a child begs for money on the streets, some people choose to give money, others prefer to ignore the child, and some others decide to scold the child. These people have a personal biased context overlooking the fact that it’s a child begging for money (or food), and their actions result from this context.
The people who give money claim that they feel sorry for the child, and parting away little money won’t damage them and help the child eat. The people who don’t give money argue that giving cash would encourage more begging, and a mafia runs it. Some people may genuinely have no money, and others expect the governments (or NGOs) to step up.
Switching context to the technology world:
With IoT, Cloud, and BIG Data technologies, everybody wants to collect data, get insights, and convert these insights into actions for business profitability. This computerized data and workflow automation system approximates an uncertain, inconsistent, and incomplete real-world setup. Call this IoT, Asset Performance Management (APM), or a Digital Twin; the data to insights to actions process is biased. Automating a biased process is a hard problem to solve.
It’s biased because of semantics of the facts involved in the process.
semantics [ si-man-tiks ]
“The meaning, or interpretation of the meaning, of a fact or concept”
Semantics is to humans as syntactic is to machines. So, a human in the loop is critical to manage semantics. AI is changing the human-in-the-loop landscape but comes with learning bias.
Let’s try some syntactic sugar to decipher semantics.
Data Semantics, Data Application Semantics
Data semantics is all about the meaning, use, and context of Data.
Data Application Semantics is all about the meaning, use, and context of Data and application agreements (contracts, methods).
Sounds simple? Not to me. I had to read that several times!
Let’s dive in with some examples:
Example A: A Data engineer claims: “My Data is structured (quantifiable with schema). So, AI/BI applications can use my Data to generate insights & actions”. Not always true. Mostly not.
Imagine a data structure that captures the medical “heart rate” observation. The structure may look like {“rate”: 180, “units”: ‘bpm’} with a schema that defines the relevant constraints (i.e., the rate is a mandatory field and must be a number >=0 expressed as beats per minute).
An arrhythmia detection algorithm, analyzing this data structure might send out an urgent alarm – “HELP! Tachycardia”, and dial 102 to call an ambulance. The ambulance arrives to find that the person was running on a treadmill, causing a high heart rate. This Data is structured but “incomplete” for analysis. The arrhythmia detection algorithm will need more context than the rate and units to raise the alarm. It will need context to “qualify” and “interpret” the “heart-rate” values. Some contextual data elements could be:
- Person Activity: {sleeping, active, very active}
- Person Type: {fetal, adult}
- Persons age: 0-100
- Measurement location: {wrist, neck, brachial, groin, behind-knee, foot, abdomen}
- Measurement type: {ECG, Oscillometry, Phonocardiograpy, Photoplethysmography}
- Medications-in-use: {Atropine, OTC decongestants, Azithromycin, …}
- Location: {ICU, HDU, ER, Home, Ambulance, Other}
Let’s look at this example from the “semantics” definition above:
- The meaning of “heart rate” is abstract but consistently understood as heart contractions per minute.
- The “heart rate” observation is used to build an arrhythmia detection application.
- Additional data context required to interpret “heart-rate” is Activity, Person Type, Person Age, Measurement Location, Measurement Type, Medications-in-use, and Location. This qualifying context is use-specific. An application to identify the average heart-rate range in a population by age intervals needs only the Person Age.
- The algorithm’s agreement (contract = what?) is to “Detect arrhythmias and Call Ambulance in case of ER care”
- The algorithm’s agreement (method = how?) is not well defined. A competing algorithm may use AI to make a better prediction to avoid false alarms. This is similar to our beggar child analogy, where the method of the people to derive insight differed, resulting in different actions.
Example B: Another familiar analogy to help understand “meaning,” “use,” “context,” and “agreement” is to look at food cooking recipes. Almost all these recipes have the statement “add salt to taste.”
- The meaning of “Salt” is abstract but consistent. Salt is not sugar! 🙂 It’s salt.
- Salt is used to make the food tasty.
- Additional data context required to interpret “Salt” is the salt-type {pink salt, black salt, rock salt, sea salt}, users salt toleration level {bland, medium, saltier}, users BP, and users-continent {Americas, Asia, Europe, Africa, Australia}.
- The agreement (contract = what?) is to “Add Salt.”
- The agreement (method = how?) is not well defined. Depending upon the chef, she may have a salt type preference with variations to the average salt toleration levels. For good business reasons, she may add less salt than her salt toleration level and serve extra salt to allow the customer to adjust the food taste according to the customer’s salt toleration levels.
In computerized systems, physical-digital data modeling can achieve data semantics (meaning, use, context). It’s much harder to achieve data application semantics (data semantics + agreements). Data Interpretation is subject to the method, and associated bias.
So, to interpret data, there must be a human in the loop. Not all people infer equally. Thus, semantics leads to variation in insights. Variation in insights leads to variation in actions.
Diving into Context – It’s more than Qualifiers
Alright, I want further peel the “context” onion. Earlier, we said that “context” is used to “qualify” the data. There is another type of context that “modifies” the data.
Let’s go back to our arrhythmia detection algorithm (Example A). We have not captured and sent any information about the patient’s diagnosis to the algorithm. The algorithm does not know whether the high heart rate is due to Supra-ventricular Tachycardia (electric circuit anomaly in the heart), Ventricular Tachycardia (damaged heart muscle and scar tissue), or food allergies. SVT might not require an ER visit, while VT and food allergies require an ER visit. Let’s say our data engineers capture this qualifying information as additional context:
{prior-diagnosis: [], known-allergies:[]}
Great. We have qualifying context. So, what does diagnosis = [] mean? The patient does not have SVT and VT? No, Not true. It means that the doctors have not tested the patient for the condition or not documented a negative result of the test in the data system. It doesn’t mean that the patient has neither SVT nor VT. So, we are back to square one. Now, let’s say that we have a documented prior diagnosis:
{prior-diagnosis: [VT], known-allergies: []}
Ok, even with this Data, we cannot confirm that VT causes a high heart rate. It could be due to undocumented/untested food allergies or yet undiagnosed SVT. This scenario calls for data “modifiers.”
{prior-diagnosis-confirmed: [VT], prior-diagnosis-excluded: [SVT], known-allergies-confirmed: [pollen, dust], known-allergies-excluded: [food-peanuts]}
The structure above has more “semantic” sugar. There is a diagnosis-excluded: [SVT] modifier as a “NOT” modifier on “diagnosis.” This modifier helps to safely ignore SVT as a cause.
Summary
Going from data to insights to actions is challenging due to “data semantics” and “data application semantics.”
Modeling all relationships between real-world objects and capturing context mitigates “data semantics” issues. Context is always use-specific. The context may still have “gaps,” and inferencing data with context gaps lead to poor-quality insights.
“Data application semantics” is a more challenging problem to solve.
The context must “qualify” the data and “modify” the qualifiers to improve data semantics. This context “completeness” requires collecting good quality data at source. More than often, an human data analyst goes back to the data source for more context.
When technology visionaries say “We bring the physical and digital together” in the IT industry, they are trying to solve the data semantics problem.
For those in healthcare, the words “meaning” and “use” will trigger the US government’s initiative of “meaningful use” and shift to a merit-based incentive payment system. To achieve merit-based incentives, the government must ensure that the data captured has meaning, use, and context. The method (= how) used by the care provider to achieve the outcome is important but secondary. This initiative also serves as a recognition that data application semantics are HARD.
Enough said! Rest.