Data Quality (Dirty vs. Clean)

Data Quality has a grayscale, and data quality engineers can continually improve data quality. Continual quality improvement is a process to achieve data quality excellence.

Dirty data may refer to several things: Redundant, Incomplete, Inaccurate, Inconsistent, Missing Lineage, Non-analyzable, and Insecure.

  • Redundant: A Person’s address data may be redundant across data sources. So, the collection of data from these multiple data sources will result in duplicates.
  • Incomplete: A Person’s address record may not have Pin Code (Zip Code) information. There could also be cases where the data may be structurally complete but semantically incomplete.
  • Inaccurate: A Person’s address record may have the wrong city and state combination (E.g., [City: Mumbai, State: Karnataka], [City: Salt Lake City, State: California])
  • Inconsistent: A Person’s middle name in one record is different from the middle name in another record. Inconsistency happens due to redundancy.
  • Missing Lineage (and Provenance): A Person’s address record may not reflect the current address as the user may not have updated it. It’s an issue of freshness.
  • Non-analyzable: A Person’s email record may be encrypted.
  • Insecure: A Person’s bank account number is available but not accessible due to privacy regulations.

The opposite of Dirty is Clean. Cleansing data is the art of correcting data after it is collected. Commonly used techniques are enrichment, de-duplication, validation, meta-information capture, and imputation.

  1. Enrichment is a mitigation technique for incomplete data. A data engineer enriches a person’s address record by adding country information by mapping the (city, state) tuple to a country.
  2. De-Duplication is a mitigation technique for redundant data. The data system identifies and drops duplicates using data identities. Inconsistencies caused by redundancies require use-case-specific mitigations.
  3. Validation is a mitigation technique that applies domain rules to verify correctness. An email address can be verified for syntactical correctness by using a regular expression (\A[\w!#$%&’+/=?{|}~^-]+(?:\.[\w!#$%&'*+/=?{|}~^-]+)@(?:[A-Z0-9-]+.)+[A-Z]{2,6}\Z). Data may be accepted or rejected based on validations.
  4. Lineage and Provenance capture is a mitigation technique for data where source or freshness is critical. An image grouping application will require meta-data about an image series (video) collected like phone type and captured date.
  5. Imputation is a mitigation technique for incomplete data (data with information gaps due to poor collection techniques). A heartrate time-series data may be dirty with missing data in minutes 1 and 12. Using data with holes may lead to failures, so a data imputation may use the previous or next value to fill the gap.

These are cleansing techniques to reduce data dirtiness after data is collected. However, data dirtiness originates at creation time, collection time, and correction time. So, a data cleansing process may not always result in non-dirty data.

A great way to start with data quality is to describe the attributes of good quality data and related measures. Once we have a description of good quality data, incrementally/iteratively use techniques like CAPA (corrective action, preventive action) with a continual quality improvement process. Once we are confident about data quality given current measures, the data engineer can introduce new KPIs or set new targets for existing ones.

Example: A research study requires collecting stroke imaging data. A description of quality attributes would be:

Data Quality AttributeDescription
Data Lineage & Provenance– Countries: {India}
– Imaging Types: {CT}
– Source: {Stroke Centers, Emergency}
– Method – Patient Position: supine
– Method – Scan extent: C2-2-vertex
– Method – Scan direction: caudocranial
– Method – Respiration: suspended
– Method – Acquisition-type: volumetric
– Method – Contrast: {Non-contrast CT, PCT with contrast}
RedundancyMultiple scans of the same patient are acceptable but need to be separated by one week.
CompletenessEach imaging scan should be accompanied by a radiology report that describes these features of the stroke:
– Time from onset: { early hyperacute (0-6H), late hyperacute (6-24H), acute (1-7D), sub-acute (1-3W), chronic (3W+) }
– CBV (Cerebral Blood Volume) in ml/100g of brain tissue
– CBF (Cerebral Blood Flow) in ml/min/100g of brain tissue
– Type of Stroke: {Hemorrhagic-Intracerebral, Hemorrhagic-subarachnoid, Ischemic-Embolic, Ischemic-Thrombotic}
AccuracyThree reads of the image by separate radiologists to circumvent human errors and bias. Anonymized Patient history is sent to the radiologist.
Security and PrivacyPatient PII is not leaked to the radiologist interpreting the result or the researcher analyzing the data.
Data Quality Attributes

As you can see from the table of attributes for CT Stroke imaging data, the quality description is data-specific and use-specific.

Data engineers compute attribute-specific metrics using data attribute descriptions on a data sample to measure overall data quality. These attribute descriptions are the N* to pursue excellence in data quality.

Summary: The creation, collection, and correction improve over some time when measured using criteria. There will always be data quality blind spots and leakages. Hence, data engineers report data quality on a grayscale with multiple attribute-specific metrics.

Streaming vs. Messaging

We already have pub/sub messaging infrastructure in our platform. Why are you asking for a streaming infrastructure? Use our pub/sub messaging infrastructure” – Platform Product Manager

Streaming and Messaging Systems are different. The use-cases are different.

Both streaming and messaging systems use the pub-sub pattern with producers posting messages and consumers subscribing. The subscribed consumers may choose to poll or get notified. Consumers in streaming systems generally poll the brokers, and the brokers push messages to consumers in messaging systems. Engineers use streaming systems to build data processing pipelines and messaging systems to develop reactive services. Both systems support delivery semantics (at least once, exactly once, at most once) of the messages. Brokers in streaming systems are dumber than messaging systems that build routing and filtering intelligence in the brokers. Streaming systems are faster than messaging systems due to a lack of routing and filtering intelligence 🙂

Let’s look at the top three critical differences in detail:

#1: Data Structures

In streaming, the data structure is a stream, and in messaging, the data structure is a queue.

Queue” is FIFO (First In First Out) data structure. Once a consumer consumes an element, it is removed from the queue, reducing the queue size. A consumer cannot fetch the “third” element from the queue. Queues don’t support random access. E.g., A queue of people waiting to board a bus.

Stream” is a data structure that is partitioned for distributed computing. If a consumer reads an element from a stream, the stream size does not reduce. The consumer can continue to read from the last read offset within a stream. Streams support random access; the consumer may choose to seek any reading offset. The brokers managing streams keep the state of each consumer’s reading offset (like a bookmark while reading a book) and allow consumers to read from the beginning, the last read offset, a specific offset, or the latest. E.g., a video stream of movies where each consumer resumes at a different offset.

In streaming systems, consumers refer to streams as Topics. Multiple consumers can simultaneously subscribe to topics. In messaging systems, the administrator configures the queues to send messages to one consumer or numerous consumers. The latter pattern is called a Topic used for notifications. A Topic in the streaming system is always a stream, and it’s always a queue in a messaging system.

Both stream and queue data structures order the elements in a sequence, and the elements are immutable. These elements may or may not be homogenous.

Queues can grow and shrink with publishers publishing and consumers consuming, respectively. Streams can grow with publishers publishing messages and do not shrink with consumers consuming. However, streams can be compacted by eliminating duplicates (on keys).

#2: Distributed (Cluster) Computing Topology

Since a single consumer consumes an element in a queue in a load-balancing pattern, the fetch must be from the central (master) node. The consumers may be in multiple nodes for distributed computing. The administrator configures the master broker node to store and forward data to other broker nodes for resiliency; however, it’s a single master active-passive distributed computing paradigm.

In the notification (topic) pattern, multiple consumers on a queue can consume filtered content to process data in parallel. The administrator configures the master node to store and forward data to other broker nodes that serve consumers. The publishers publish to a single master/leader node, but consumers can consume from multiple nodes. This pattern is the CQRS (Command Query Responsibility Segregation) pattern of distributing computing.

The streaming pattern is similar to the notification pattern w.r.t. distributed computing. Unlike messaging, partition keys break streams into shards/partitions, and the lead broker replicates these partitions to other brokers in the cluster. The leader election process selects a broker as a leader/master for a given shard/partition, and shard/partition replications serve multiple consumers in the CQRS pattern. The consumers read streams from the last offset, random offset, beginning, or latest.

If the leader fails, either a passive slave can take over, or the cluster elects a new leader from existing slaves.

#3: Routing and Content Filtering

In messaging systems, the brokers implement the concept of exchanges, where the broker can route the messages to different endpoints based on rules. The consumers can also filter content delivered to them at the broker level.

In streaming systems, the brokers do not implement routing or content filtering. Consumers may filter content, but utility libraries in the consumer filter out the content after the broker delivers the content to the consumer.

Tabular Differences View

CategoryStreamingMessaging
Support Publish and Subscribe ParadigmYesYes
Polling vs. NotificationPolling by ConsumersNotification by Brokers to consumers
Use CaseData Processing PipelinesReactive (micro)services
Delivery Semantics Supportedat-most-once
at-least-once
exactly-once
at-most-once
at-least-once
exactly-once
Intelligent BrokerNoYes
Data StructureStreamQueue
PatternsCQRSContent-Based Routing/Filtering
Worker (LB) Distribution
Notification
CQRS
Data ImmutabilityYesYes
Data RetentionYes. Not deleted after delivery.No. Deleted after delivery.
Data compactionYes. Key de-duplication.N/A
Data HomogeneityHeterogenous by Default. Supports schema checks on data outside the broker.Heterogenous by Default.
SpeedFaster than MessagingSlower than Streaming
Distributed Computing TopologyBroker cluster with single master per stream partition and consumers consuming from multiple brokers with data replicated across brokersBroker cluster with single master per topic/queue. Active-passive broker configuration for the load-balancing pattern. Data replicated across brokers for multiple consumer distribution.
State/MemoryBrokers remember the consumers’ bookmark (state) in the streamConsumers always consume from time-of-subscription (latest only)
Hub-and-Spoke ArchitectureYesYes
Vendors/Services (Examples)Kafka
Azure Event Hub
AWS Kinesis
RabbitMQ
Azure Event Grid
AWS SQS/SNS
Domain ModelA stream of GPS positions of a moving carA queue to buy train tickets
Table of Differences between Streaming/Messaging Systems

Visual Differences View

Summary

Use the right tool for the job. Use messaging systems for event-driven services and streaming systems for distributed data processing.

Data Batching, Streaming and Processing

The IT industry likes to treat data like water. There are clouds, lakes, dams, tanks, streams, enrichments, and filters.

Data Engineers combine Data Streaming and Processing into a term/concept called Stream Processing. If data in the stream are also Events, it is called Event Stream Processing. If data/events in streams are combined to detect patterns, it is called Complex Event Processing. In general, the term Events refers to all data in the stream (i.e., raw data, processed data, periodic data, and non-periodic data).

The examples below help illustrate these concepts:

Water Example:

Let’s say we have a stream of water flowing through our kitchen tap. This process is called water streaming.

We cannot use this water for cooking without first boiling the water to kill bacteria/viruses in the water. So, boiling the water is water processing.

If the user boils the water in a kettle (in small batches), the processing is called Batch Processing. In this case, the water is not instantly usable (drinkable) from the tap.

If an RO (Reverse Osmosis) filtration system is connected to the plumbing line before the water streams out from the tap, it’s water stream processing with filter processors. The water stream output from the filter processors is a new filtered water stream.

A mineral-content quality processor generates a simple quality-control event on the RO filtered water stream (EVENT_LOW_MAGNESIUM_CONTENT). This process is called Event Stream Processing. The mineral-content quality processor is a parallel processor. It tests several samples in a time window from the RO filtered water stream before generating the quality control event. The re-mineralization processor will react to the mineral quality event to Enrich the water. This reactive process is called Event-Driven Architecture. The re-mineralization will generate a new enriched water stream with proper levels of magnesium to prevent hypomagnesemia.

Suppose the water infection-quality control processor detects E-coli bacteria (EVENT_ECOLI), and the water mineral-quality control processor detects low magnesium content (EVENT_LOW_MAGNESIUM_CONTENT). In that case, a water risk processor will generate a complex event combining simple events to publish that the water is unsuitable for drinking (EVENT_UNDRINKABLE_WATER). The tap can decide to shut the water valve reacting to the water event.

Water Streaming and Processing generating complex events

Data Example:

Let’s say we have a stream of images flowing out from our car’s front camera (sensor). This stream is image data streaming.

We cannot use this data for analysis without identifying objects (person, car, signs, roads) in the image data. So, recognizing these objects in image data is image data processing.

If a user analyses these images offline (in small batches), the processing is called Batch Processing. In the case of eventual batch processing, the image data is not instantly usable. Any events generated from such retrospective batch processing are too late to react.

If an image object detection processor connects to the image stream, it is called image data stream processing. This process creates new image streams with enriched image meta-data.

If a road-quality processor generates a simple quality control event that detects snow (EVENT_SNOW_ON_ROADS), then we have Event Stream Processing. The road-quality processor is a parallel processor. It tests several image samples in a time window from the image data stream before generating the quality control event.

Suppose the ABS (Anti-lock Braking Sub-System) listens to this quality control event and turns on the ABS. In that case, we have an Event-Driven Architecture reacting to Events processed during the Event Stream Processing.

Suppose the road-quality processor generates snow on the road event (EVENT_SNOW_ON_ROAD), and a speed-data stream generates vehicle speed data every 5 seconds. In that case, an accident risk processor in the car may detect a complex quality control event to flag the possibility of accidents (EVENT_ACCIDENT_RISK). The vehicle’s risk processor performs complex event processing on event streams from the road-quality processor and data streams from the speed stream. i.e., by combining (joining) simple events and data in time windows to detect complex patterns.

Data Streaming and Processing generating complex actionable events

Takeaway Thoughts

As you can see from the examples above, streaming and processing (Stream processing) is more desired than batching and processing (Batch processing) because of actionable real-time event generation capability.

Data engineers define data-flow “topology” for data pipelines using some declarative language (DSL). Since there are no cycles in the data flow, the pipeline topology is a DAG (Directed Acyclic Graph). The DAG representation helps data engineers visually comprehend the processors (filter, enrich) connected in the stream. With a DAG, the operations team can also effectively monitor the entire data flow for troubleshooting each pipeline.

Computing leverages parallel processing at all levels. Even with small data, at the hardware processor level, clusters of ALU (Arithmetic Logic Unit) process data streams in parallel for speed. These SIMD/MIMD (Single/Multiple Instruction Multiple Data) architectures are the basis for cluster computing that combines multiple machines to execute work using map-reduce with distributed data sets. The BIG data tools (E.g., Kafka, Spark) have effectively abstracted cluster computing behind common industry languages like SQL, programmatic abstractions (stream, table, map, filter, aggregate, reduce), and declarative definitions like DAG.

We will gradually explore big data infrastructure tools and data processing techniques in future blog posts.

Data stream processing is processing data in motion. Processing data in motion helps generate real-time actionable events.

Career

People have different balanced expectations from themselves at work: money, technology, networking, challenging project, innovative project, travel, title, equities, work-life balance, job satisfaction, exposure, health, happiness, social responsibility, job type, startup, strategy, public image, leadership, and many more. At different points in their career, these parameters would be stacked differently, and rightly so.

In this blog post, I am sharing good career advice that I received for my career and growth.

Strive for excellence, use your strengths

Pursuit to excellence is like a pursuit of happiness. The bar keeps going up. It’s not the destination but the more meaningful journey. I had the privilege to work with leaders that would ask me to focus on my strengths – demonstrated and non-demonstrated (i.e., potential). Acknowledge and improve areas that you are weak, but not at the cost of your strengths.

Never let anybody dent your confidence. Always create value for your customers and business.

Seek out your dream job/role

It’s deliberately reaching out for discomfort. Doing a job that you are good at is great for the employer and not necessarily suitable for you.

You may be good in engineering but seek discomfort in project management. You may be good at project management but seek discomfort in product management. You may be good at product management but seek discomfort in engineering.

Lateral is the way to grow. Lateral is the way up.

Great leaders encourage you to apply to jobs/roles that give you another milestone to cherish in your life. Mediocre managers leverage your strengths only for the current job/role.

Posted job descriptions are never a good representation of the role demands. Always “talk” to the hiring leader.

Specialize or Diversify? Do it well

If you desire to seek specialization – go after it. Specialization in any subject requires you to spend significant time to acquire the competency/skill and practice. Don’t spread yourself too thin; pick an area and go deep.

If you desire to seek diversification – go after it. Diversification will require that you build networks and teams; you rely on others (in your network or group) but remain accountable. Connect, listen (not hear), and act.

If you desire specialization and diversification – go after it. Some people have successfully navigated both.

Some started with diversification and then specialized. Others began with specialization and then diversified. There is no career recipe; make your recipe.

Performance, Image, Exposure

Jobs/Roles demand performance, and growth requires exposure to new people and projects. Volunteer to work on initiatives that give you more exposure (work/life). Volunteering to do more when you have a demanding job/role requires you to stretch, work smarter (manage time), and delegate.

Stress: Circle of Control, Influence, & Concern

Stress is good for growth.

It’s widely believed that as you go up a corporate ladder, your circle of control gets bigger. However, the reality is that your circle of influence gets more significant, and the circle of control relatively shrinks.

The circles of concern & influence are the primary driver of “stress” in job/roles. It’s also critical to understand that your circle of control could be the primary driver of “stress” for your peers and teams.

If you understand your abilities to control or influence the outcome, you don’t need to eat stress for breakfast.

Treat People Like People

Treat others like you would like them to treat you.

Don’t treat people like “resources.” People are not like a CPU with a fixed capacity.

People have infinite capacity, and capacity increases when they are motivated. If they find inspiration, then you get unbounded capacity.

People get burnt out; they need time to relax, so do you.

Health, Family, Work

If you lose health, you cannot take care of your family or do an excellent job at work. If you lose work, then you cannot support your family. Family is always there for you – in grief and happiness. You can’t afford to lose unconditional family love. So, the priority order has to be – health, family, and work.

Exercise for 45 minutes every day, even if that is a simple morning brisk walk. This “me” time helps you recharge.

Don’t treat your work colleagues like your family. Treat them like your team. Don’t treat your family like your team.

Final Thoughts

Choose to ignore advice that does not make sense, and consider it your common sense to use the advice that makes sense. Please don’t make people your role models; choose to cherish their actions or ideas that made them role models.

Triage Prioritization

In the last blog post, we talked about balanced prioritization. It’s excellent for big-picture-type things like product roadmaps, new technology introduction, and architecture debt management. The balanced prioritization technique helps build north stars (guiding stars) and focus; however, something else is needed when the rubber hits the road. That something is triage prioritization, where people have to “urgently” prioritize to optimize the whole.

Agile pundits like to draw similarities, and this table is some food for thought:

TypesOther Names AOther Names BOther Names C
Balanced PrioritizationDeliberate
Prioritization
BIG ‘P’ Prioritization“Strategic” Prioritization
Triage PrioritizationJust-in-time (JIT) PrioritizationSmall ‘p’ prioritization“Tactical” Prioritization
Synonyms (Kind-of Similar)

The table below talks about some desired properties of the two:

Expected PropertiesBalanced PrioritizationTriage Prioritization
ValuesValues multiple viewpoints (customer, leadership, architect, team, market) to prioritize. Consensus driven.Values newly available learnings/insights to prioritize items. Values collaboration (disagree-and-commit) to consensus.
PerspectiveStrategic, Long-termTactical, Short-term
Driven-asProcess-first, driven by peoplePeople-first process, driven by people
Time-takenHigh. Analysis-biased.

It’s an involved process to get feedback, consolidate, review, discuss, and reach consensus.
Short. Action-biased.

It’s okay to get it wrong but wrong to waste time in analysis. Values failure to analysis.
AssumptionsLargely Independent of any constraints and uncertainties. Weighted-Risk-Assessment approach. Strives in uncertainties. Uses a mindset of maximizing returns – “effort should not get wasted” and “optimize the whole.”
ToolsRISE Score, Prioritization Matrix, Eisenhower MatrixCommunication (say, listen); and a Kanban to track WIP of priority items.
Properties of prioritization

Okay, so the story 🙂 This story is a personal one:

I have spent most of my career in digitalizing healthcare. The most memorable of this journey was when I spent my time digitalizing emergency care. Emergency care is a hospital department where patients can walk in without an appointment and are assured that they will not be turned away. So, the department gets patients that cannot afford care and patients that need urgent attention. The emergency department is capacity constrained and has to serve a variety of workloads. Most days, it’s just a single emergency, and then there are days where there is a public health emergency or people wheeling in from a significant accident nearby. The department cannot be staffed to handle the worst crisis but is reasonably staffed to handle the typical workload. Then the worst happens – a significant spike in workload – say patients are coming in from a train accident. Some are alive, some are barely alive, and some are dead on arrival. You cannot use a consensus-driven approach to sort these people into an urgent-important matrix or derive a RISE score for each patient. Emergency departments use a scoring system (ESI) where one person identifies the score – “Triage” Nurse; and the team collaborates in the absence of consensus.

I have seen triage nurses shut down the computers, and move to the whiteboard. Computers are great to organize, but organizing is a slow computerized process. Need something faster – whiteboards & instant communication. They don’t need to know the names of the people (patient on bed-2 is a good name). If someone needs help, they shout, and the person who can help offers help.

Commonly heard statements in an emergency care room during high workload emergency:

“That patient is not likely to survive. The damage is beyond repair. Move on, and stop wasting time on this patient.”

“We have only one ventilator. Use this on that younger fellow and not the elderly. We can save more life-hours.”

“That patient is screaming and in pain and will survive if we don’t attend her for the next 45 minutes. She has a fracture. Focus on this patient; our effort here and now can save her leg. We will get back to the screaming one. Somebody shut her up.”

Automated Computerized Workflow can’t respond well to such emergencies. This triage prioritization is a people-first process driven by people. People can increase their capacity – both individually and as a team – in emergencies. People are not like CPUs; they can work at 150% capacity. Capacity is measured not by the elapsed time of effort but outcomes achieved per unit of elapsed time.

End of the day, most people will be saved, and there would be an unfortunate loss. But the triage prioritization values maximizing life hours (optimizing the whole), effort not getting wasted, quick pivot when effort does not yield results (new insights to prioritize), and action bias. The responsibility and authority to prioritize is with the ER care team and stays in those closed doors: the best tools are collaboration and communication (shout, scream, listen, act). When the emergency is behind them, they will retrospect to improve their behavior in an emergency and request for infrastructure that could help them in the future (E.g., that second ventilator could have helped).

Shifting back to software development:

There are parallels in the software team – Tiger team to fix product issues, WAR Room to GO-LIVE, Support Team for 24×7 health of applications in operations during Black Friday, and many more.

While the examples above are like ER teams, triage prioritization also happens in scrum teams executing a well-defined plan. This triage prioritization is hidden from the work tracking boards. When engineers are pair-programming to get something done, many things can go wrong (blind spots) that must be prioritized and fixed. The team cannot fix everything in the sprint duration and carries over some debt. Big Bad debt gets on the boards (some debt remains in code: TODOs). Big Bad debt is prioritized with other big-ticket items using balanced prioritization.

Summary: The outcome quality is directly proportional to triage prioritization, a people-first process driven by people and works best with delegated responsibility and authority.

Some of my friends have argued with me that the story of “mom prioritizing her child’s homework” (last post) is triage prioritization and not balanced prioritization. My friends say, “Mom is the “Triage Nurse” collaborating with the child, and the time taken to prioritize is short, with learnings from execution fed back into the prioritization process.”. There is a grayscale (balanced-triage), and my only argument against the statement above is that mom is not working in a capacity-constrained environment of uncertainties. If she had to choose between a burning car and her child, the choice is obvious.

Balanced Prioritization

In an (agile) product development team, everybody has a say about the priority of the backlog features. However, only the product owner decides (vetoes). The product owner has to consider the customer’s perspectives, market trends, leadership vision, architecture enablers, technical debt, team autonomy, and most importantly, her biased opinions. She has to use some processes to prioritize and justify the priorities.

Let’s move from the corporate dimension to the family dimension. Here’s a story that every parent will relate:

Child: Mom – I have too many things to do. I have got Math, English, Hindi, and Kannada homework to complete. Also, I have to prepare for my music exam before the following Monday. I want to watch the newly released “Frozen” movie – my friends have already watched it. I have a play-date today evening – I can’t skip it; I have promised not to miss it. This is too much to do.

Mom: Hey! I want to watch the “Frozen” movie with you too! Let’s do that after your music exam following Monday? I will book the tickets today.

Child: Ok. Yay!!

Mom: When is your homework due?

Child: English and Kannada are due today. Math and Hindi are due tomorrow.

Mom: Ok, let’s look at the homework. Oh, English and Hindi look simple; you have to fill in the blanks. Kannada is an essay that you have to write about “all homework, and no play makes me sad.” So, you will need to spend some time thinking through that 🙂 Math seems to be several problems to solve; let’s do this piecemeal.

Child: Can I start with Math? I love to solve math problems.

Mom: I know. It’s fun to solve math. Let’s just get done with the English first – its due today and simple.

Child: Ok.

<10 minutes later>

Child: Done, Mom! Can I do Hindi? That is simple too.

Mom: Hindi is not due today. It’s easy to get that done tomorrow. Let’s start with your Kannada essay and do some math today.

<30 minutes later>

Child: I have been writing this essay for 30 minutes. It’s boring.

Mom: Ok, solve some math problems then.

<30 minutes later>

Mom: Having fun? Time to complete the Kannada essay; you have to submit it today.

Child: Ok – Grrr.

<30 minutes later>

Child: Done! Phew. I have one more hour before my friend comes. Today’s homework is done. I will loiter around now.

Mom: No, you should practice for your music exam. Only practice makes it perfect. Why don’t you practice for the next one hour, and then play? After your play-date, you can do some more math; you like to do it anyway. However, 8:00 PM is sleep time.

Child: I am tired. Can I loiter around for 15 minutes and then practice music?

Mom: Ok – I will remind you in 15.

<15 minutes later>

Mom: Ready to practice music?

Child: Grr, I was enjoying doing nothing and loitering around.

Mom: Quickly, now finish your music practice before your friend comes. Or. I will ask her to come tomorrow.

Child: Mommy! How can you do that! Ok – I will practice music.

<45 minutes later>

Friend: Hello! Let’s play.

<2 hours later>

Mom: Playtime is over, kids. Have your dinner, and then complete some math; 8:00 PM is sleep time. Remember.

Child: Ok. The playtime was too short.

<completes dinner, completes some more math problems, sleeps>

Mom: Hey, good morning. No school today. It’s Saturday. Finish your remaining homework, and you can play all day. You can start with Hindi or Math. Your choice.

Child: I will do Hindi first. It’s simple. Then math.

<20 minutes later>

Child: Done. Now, I can play, loiter, and do anything?

Mom: Yes, let me know when you want to do one more hour of music practice. We will do that together.

Child: Ok, Mom. Love you!

<After a successful music exam on Monday>

Mom: Let’s watch Frozen.

The story above is balanced prioritization. Mom balances work-play, urgent-important, big-small effort, short-long term, like-dislike, carrot-stick, confidence level, reach, and impact. While she considers priorities, she also allows her child to make some decisions, delegating authority.

Balanced prioritization is an exercise for execution control. There is no use of prioritization without a demand for execution control.

When mom comes to the corporate world, the n-dimensional common sense transforms to the 2-dimensional corporate language: RISE Score, MoSCoW, Eisenhower’s Time Management Matrix, and Prioritization ranking matrix.

Top Prioritization Techniques

Balanced prioritization is part of the backlog grooming process and extends to sprint planning, where the team slices the work items to fit in a sprint boundary.

Balanced prioritization is a continuous process.

In our story, mom did not have to deal with capacity constraints. However, in the corporate world, there are capacity constraints that push for more aggressive continuous real-time prioritization. I will share a (healthcare) story in my next blog.

Data Semantics

The real world is uncertain, inconsistent, and incomplete. When people interact with the real world, data from their inbuilt physical sensors (eyes, ears, nose, tongue, skin) and mental sensors (happiness, guilt, fear, anger, curiosity, ignorance, and many more) get magically converted into insights and actions. Even if these insights are shared & common, the actions may vary across people.

Example: When a child begs for money on the streets, some people choose to give money, others prefer to ignore the child, and some others decide to scold the child. These people have a personal biased context overlooking the fact that it’s a child begging for money (or food), and their actions result from this context.

The people who give money claim that they feel sorry for the child, and parting away little money won’t damage them and help the child eat. The people who don’t give money argue that giving cash would encourage more begging, and a mafia runs it. Some people may genuinely have no money, and others expect the governments (or NGOs) to step up.

Switching context to the technology world:

With IoT, Cloud, and BIG Data technologies, everybody wants to collect data, get insights, and convert these insights into actions for business profitability. This computerized data and workflow automation system approximates an uncertain, inconsistent, and incomplete real-world setup. Call this IoT, Asset Performance Management (APM), or a Digital Twin; the data to insights to actions process is biased. Automating a biased process is a hard problem to solve.

It’s biased because of semantics of the facts involved in the process.

semantics [ si-man-tiks ]

“The meaning, or interpretation of the meaning, of a fact or concept”

Semantics is to humans as syntactic is to machines. So, a human in the loop is critical to manage semantics. AI is changing the human-in-the-loop landscape but comes with learning bias.

Let’s try some syntactic sugar to decipher semantics.

Data Semantics, Data Application Semantics

Data semantics is all about the meaning, use, and context of Data.

Data Application Semantics is all about the meaning, use, and context of Data and application agreements (contracts, methods).

Sounds simple? Not to me. I had to read that several times!

Let’s dive in with some examples:

Example A: A Data engineer claims: “My Data is structured (quantifiable with schema). So, AI/BI applications can use my Data to generate insights & actions”. Not always true. Mostly not.

Imagine a data structure that captures the medical “heart rate” observation. The structure may look like {“rate”: 180, “units”: ‘bpm’} with a schema that defines the relevant constraints (i.e., the rate is a mandatory field and must be a number >=0 expressed as beats per minute).

An arrhythmia detection algorithm, analyzing this data structure might send out an urgent alarm – “HELP! Tachycardia”, and dial 102 to call an ambulance. The ambulance arrives to find that the person was running on a treadmill, causing a high heart rate. This Data is structured but “incomplete” for analysis. The arrhythmia detection algorithm will need more context than the rate and units to raise the alarm. It will need context to “qualify” and “interpret” the “heart-rate” values. Some contextual data elements could be:

  1. Person Activity: {sleeping, active, very active}
  2. Person Type: {fetal, adult}
  3. Persons age: 0-100
  4. Measurement location: {wrist, neck, brachial, groin, behind-knee, foot, abdomen}
  5. Measurement type: {ECG, Oscillometry, Phonocardiograpy, Photoplethysmography}
  6. Medications-in-use: {Atropine, OTC decongestants, Azithromycin, …}
  7. Location: {ICU, HDU, ER, Home, Ambulance, Other}

Let’s look at this example from the “semantics” definition above:

  1. The meaning of “heart rate” is abstract but consistently understood as heart contractions per minute.
  2. The “heart rate” observation is used to build an arrhythmia detection application.
  3. Additional data context required to interpret “heart-rate” is Activity, Person Type, Person Age, Measurement Location, Measurement Type, Medications-in-use, and Location. This qualifying context is use-specific. An application to identify the average heart-rate range in a population by age intervals needs only the Person Age.
  4. The algorithm’s agreement (contract = what?) is to “Detect arrhythmias and Call Ambulance in case of ER care”
  5. The algorithm’s agreement (method = how?) is not well defined. A competing algorithm may use AI to make a better prediction to avoid false alarms. This is similar to our beggar child analogy, where the method of the people to derive insight differed, resulting in different actions.

Example B: Another familiar analogy to help understand “meaning,” “use,” “context,” and “agreement” is to look at food cooking recipes. Almost all these recipes have the statement “add salt to taste.”

  • The meaning of “Salt” is abstract but consistent. Salt is not sugar! 🙂 It’s salt.
  • Salt is used to make the food tasty.
  • Additional data context required to interpret “Salt” is the salt-type {pink salt, black salt, rock salt, sea salt}, users salt toleration level {bland, medium, saltier}, users BP, and users-continent {Americas, Asia, Europe, Africa, Australia}.
  • The agreement (contract = what?) is to “Add Salt.”
  • The agreement (method = how?) is not well defined. Depending upon the chef, she may have a salt type preference with variations to the average salt toleration levels. For good business reasons, she may add less salt than her salt toleration level and serve extra salt to allow the customer to adjust the food taste according to the customer’s salt toleration levels.

In computerized systems, physical-digital data modeling can achieve data semantics (meaning, use, context). It’s much harder to achieve data application semantics (data semantics + agreements). Data Interpretation is subject to the method, and associated bias.

So, to interpret data, there must be a human in the loop. Not all people infer equally. Thus, semantics leads to variation in insights. Variation in insights leads to variation in actions.

Diving into Context – It’s more than Qualifiers

Alright, I want further peel the “context” onion. Earlier, we said that “context” is used to “qualify” the data. There is another type of context that “modifies” the data.

Let’s go back to our arrhythmia detection algorithm (Example A). We have not captured and sent any information about the patient’s diagnosis to the algorithm. The algorithm does not know whether the high heart rate is due to Supra-ventricular Tachycardia (electric circuit anomaly in the heart), Ventricular Tachycardia (damaged heart muscle and scar tissue), or food allergies. SVT might not require an ER visit, while VT and food allergies require an ER visit. Let’s say our data engineers capture this qualifying information as additional context:

{prior-diagnosis: [], known-allergies:[]}

Great. We have qualifying context. So, what does diagnosis = [] mean? The patient does not have SVT and VT? No, Not true. It means that the doctors have not tested the patient for the condition or not documented a negative result of the test in the data system. It doesn’t mean that the patient has neither SVT nor VT. So, we are back to square one. Now, let’s say that we have a documented prior diagnosis:

{prior-diagnosis: [VT], known-allergies: []}

Ok, even with this Data, we cannot confirm that VT causes a high heart rate. It could be due to undocumented/untested food allergies or yet undiagnosed SVT. This scenario calls for data “modifiers.”

{prior-diagnosis-confirmed: [VT], prior-diagnosis-excluded: [SVT], known-allergies-confirmed: [pollen, dust], known-allergies-excluded: [food-peanuts]}

The structure above has more “semantic” sugar. There is a diagnosis-excluded: [SVT] modifier as a “NOT” modifier on “diagnosis.” This modifier helps to safely ignore SVT as a cause.

Summary

Going from data to insights to actions is challenging due to “data semantics” and “data application semantics.”

Modeling all relationships between real-world objects and capturing context mitigates “data semantics” issues. Context is always use-specific. The context may still have “gaps,” and inferencing data with context gaps lead to poor-quality insights.

“Data application semantics” is a more challenging problem to solve.

The context must “qualify” the data and “modify” the qualifiers to improve data semantics. This context “completeness” requires collecting good quality data at source. More than often, an human data analyst goes back to the data source for more context.

When technology visionaries say “We bring the physical and digital together” in the IT industry, they are trying to solve the data semantics problem.

For those in healthcare, the words “meaning” and “use” will trigger the US government’s initiative of “meaningful use” and shift to a merit-based incentive payment system. To achieve merit-based incentives, the government must ensure that the data captured has meaning, use, and context. The method (= how) used by the care provider to achieve the outcome is important but secondary. This initiative also serves as a recognition that data application semantics are HARD.

Enough said! Rest.

Data Measurement Scale and Composition

In the parent blog post, we talked about data terms: “Structured, Unstructured, Semi-structured, Sequences, Time-series, Panel, Image, Text, Audio, Discreet, Categorical, Numerical, Nominal, Ordinal, Continuous and Interval”; let’s peel this onion.

Some comments that I hear from engineers/architects:

“My Data is structured. So, it’s computable.”Not true. Structure does not mean that Data is computable. In general, computable applies to functions, and when used in the context of data, it means quantifiable (measurable). Structured data may contain non-quantifiable data types like text strings.

“All my Data is stored in a database. It’s structured data because I can run SQL queries on this data”Not always true. Databases can store BLOB-type columns containing structured, semi-structured, and unstructured data that SQL cannot always query.

“Data lakes store unstructured data, and this data is transformed into structured data in data warehouses”Not Really. Data lakes can contain structured data. Data pipelines extract, transform, and load data into data warehouses. The data warehouse is optimized for multi-dimensional data queries and analysis. Inability to execute queries in data lakes does not imply that Data in the lake does not have structure.

Ok, it’s not as simple as it appears on the surface. Before we define the terms, let’s look at some examples.

Example A: The data below can be classified as structured because it has a schema. The weight sub-structure is quantifiable. “Weight-value” is numeric and continuous type data type, and “weight-units” is categorical and nominal data type.

nameweight-valueweight-units
Nitin79.85KG
Example A: Panel Data
FieldMandatoryData TypeConstraints
nameYesStringNot Null
Length < 100chars
weight-valueYesFloat> 0
weight-unitsYesEnum{KG, LBS}
Example A: Schema & Constraints

Example B: The data below can be classified as semi-structured because it has a structure but no schema or constraints. Some schema elements can be derived, but the consumer is at the mercy of the producer. The value of weight can be found in “weight-value”, “weight”, or “weight-val” fields. Given the sample, the consumer can infer that the value is always numerical and continuous data type (i.e., float). The vendor of the weighing machine may decide to have their name captured optionally. The consumer will also have to transform “Kgs,” “KG,” and “Kilograms” into a common value before analyzing the data.

Data Instance AData Instance BData Instance C
{
“name”: “Nitin”,
“weight-units”: “Kgs”,
“weight-value”: 79.85,
“vendor”: “Apollo”
}
{
“name”: “Nitin”,
“weight-units”: “KG”,
“weight”: 79.85,
“vendor-name”: “Fitbit”
}
{
“name”: “Nitin”,
“weight-units”: “Kilograms”,
“weight-val”: 79.85,
“measured-at”: “14/08/2021”
}
Example B: JSON Data

Example C: A JPEG file stored on a disk can be classified as structured data. Though the file is stored as binary, there is a well-defined structure (see table below). This Data is “structured,” but the image data (SOS-EOI) is not “quantifiable” and loosely termed as “unstructured.” With the advance of AI/ML, several quantifiable features can be extracted from image data, further pushing this popular unstructured data into the semi-structured data space.

JFIF file structure
SegmentCodeDescription
SOIFF D8Start of Image
JFIF-APP0FF E0 s1 s2 4A 46 49 46 00 ...see below
JFXX-APP0FF E0 s1 s2 4A 46 58 58 00 ...optional,
… additional marker segments
SOSFF DAStart of Scan
compressed image data
EOIFF D9End of Image
Example C: JPEG Structure (courtesy: Wikipedia)

Example D: The Text below can be classified as “Unstructured Sequence” data. The English language does define a schema (constraint grammar); however, quantifying this type of data for computing is not easy. Machine learning models can extract quantifiable features from text data. In modern SQL, machine learning is integrated into queries to extract information from “unstructured” data.

I must not fear. Fear is the mind-killer. Fear is the little death that brings total obliteration. I will face my fear. I will permit it to pass over me and through me. And when it has gone past, I will turn the inner eye to see its path. Where the fear has gone, there will be nothing. Only I will remain.

So, the lines are not straight 🙂 Given this dilemma, let’s further define these terms, with more examples:

Quantifiable Data is measurable Data. Computing is easy on measurable data. There are two different types of measurable data – Numerical and Categorical. Numerical Data types are quantitative, and categorical Data types are qualitative.

Numerical data types could be either discreet or continuous. “People-count” cannot be 35.3, so this Data type is discreet. “Weight-value” is always approximated to 79.85 (instead of 79.85125431333211), and hence this Data type is continuous.

Categorical data type could be either ordinal or nominal. In “Star-rating,” a value of 4 is higher than 3. Hence, the “star rating” data is ordinal as there is an established order in ratings. The quantitative difference between ratings is not necessarily equal. There is no order in “cat,” “dog,” and “fish”; hence, “Home Animals” is nominal data type.

Parent CategoryChild CategoryExample
NumericalDiscreet{ “people-count”: 35 }
NumericalContinuous{ “weight-value”: 79.85 }
CategoricalOrdinal5 STAR RATING = {1,2,3,4,5}
{“star-rating”: “3”}
CategoricalNominalHome Animals = {“cat”, “dog”, “fish”}
{“animal-type”: “cat”}
Quantifiable Data (Quantitative and Qualitative)

Non-Quantifiable Data is Data where the measurable features are implicit. The Data is rich in features, but the features need to be extracted for analysis. AI is changing the game of feature extraction and pattern-recognition for such data. The three well-known examples in this category are Images, Text, and Audio. The latter two (Text and audio) are domains of natural language processing (NLP), while Images are the domain of computer vision (CV).

Quantifiable and Non-Quantifiable data can be composed together into structures. The composition may be given a name. Example: A “person” is a composite data type of quantifiable (i.e., weight) and non-quantifiable (i.e., name) data types.

When data is composed with a schema, it is loosely called “structured” data. Any composition without a schema is loosely called “semi-structured” data.

Structured or Semi-structured data with non-quantifiable fields is called unstructured data. In this spirit, example C is unstructured. Also, the quote about data lakes storing “unstructured” Data is true. The data might have a structure with schema but cannot be queried in place without loading into a data cube in the warehouse. The lines blur when modern lake-houses that query data in-place at scale.

Data can also be composed together into “Collection” data types. Sets, Maps, Lists, and Arrays are examples of some collections. “Collections” with item order and homogeneity are called sequences. Movies and Text are sequences (arrays of images and words). In most cases, all data generated in a sequence is usually from the same source.

Sequences ordered by time are called time-series sequences, or just time-series for short. Sensor data that is generated periodically with a timestamp is an example of time-series data. Time-series data have properties like trend and seasonality that we will explore in a separate blog post. A camera (sensor) generates a time-series image data feed with a timestamp on every image. This feed is a time-series sequence.

Some visuals will help to clarify this further:

Data by Measurement Class
Data by Structure (composition) Class

JSON and XML are data representations that come with or without schema. It’s incorrect to call all JSON documents semi-structured, as they might originate from a producer that uses well-defined schema and data typing rules.

Data Compositions

Hope this post helps to understand the “data measurement and composition vocabulary“. You can be strict or loose about classifying data by structure based on context—however, it’s critical to understand the measurement class.

Only measurable data generates insights.

After all that rant, let’s try to decipher “logs” data-type.

  1. “Performance Event Logs” generated by an application with fixed fields like {id: number, generated-by: application-name, method: method-name, time: time-since-epoch} is composed of quantifiable fields and constrained schema. So, it belongs to the “Structured Data” class.
  2. “Troubleshooting Logs” generated by an application with fields like {id: number, generated-by: application-name, timestamp: Date-time, log-type: <warn, info, error, debug>, text: BLOB, +application-specific-fields} is composed of quantifiable and non-quantifiable fields, without a constraining schema. Some applications may add additional fields like – API name, session-id, and user-id. Strictly, this is “unstructured” data due to the BLOB but loosely called “semi-structured” data.

Measurement-based data type classification and composition of types into structures do not convey semantics. We will cover semantics in our next blog!

Agile: “Teaming” + “Collaboration”

How to improve the “collaboration” in an agile “team”? Collaboration is a critical ingredient in the pursuit of excellence.

“Agile is for the young who can sprint. What’s the minimum age to apply for a manager? Managers don’t need to sprint.” – Software Engineer

“I need my personal space and can’t work in a collaborative space. I need to hide sometimes. Managers are in a cabin; I want to be isolated too.”Software Engineer

“No pair programming, please. I am most productive when I am alone. I listen to my music and code. I will follow the coding guidelines; just let me be with my code. I will work all night and get it done with great quality – promise. Can I WFH?” – Software Engineer

“You must review your code before check-in. Peer review is like brushing your teeth in the morning. It’s hygiene. Do you brush your teeth every day? Like it or not – just like brushing teeth – you have to peer-review your code before check-in” – Coach.

“You don’t go to the gym to only run on a treadmill for cardio. You have to train your back with pullups, chest with dumbbell incline press, shoulders with machine shoulder press, and biceps with dumbbell bicep curls. Whole-body training gives the best results. In the same way, in the agile gym, you have to practice pair-programming, team retrospectives, team backlog grooming, peer-review, and many more. It’s a team sport. We will start you with pair-programming and then gradually introduce other practices” – Coach.

Software Engineers’ perspective is correct: They have been nurtured by the system to be competitive. They did not work in pairs to clear their engineering exams. Study groups were just boredom killers. They compete with other engineers for jobs. They are where they are because of their “individualism” and not their “collaboration” skills. And, the ask from them is to un-learn all of that and “collaborate.” It’s a value conflict.

Coaches’ perspective is correct: Great things have been achieved with collaboration. From “hunting for food” during cave days to “winning a football (soccer) game” requires intense collaboration.

In sports teams, say, cricket teams, some basic instincts kick in to drive collaboration. People quickly self-organize into the batter, bowler, wicket-keeper, and captain. Batters collaborate to seek “runs.” Everybody gives feedback to bowlers. The team claps when anybody fields the ball. They hug and scream. Emotions flow. They celebrate each other – it’s the team and not the individual. And, when this same team comes back to their desk to work, emotions stop, and they continue complaining about pair programming (2 batters running) and peer review (everybody giving feedback to bowlers).

It’s not about process, practices, and tools. It’s about people. In a team context, it’s an identity loss for the individual. It’s a mistake only to celebrate a team and overlook individualism. “Collaboration” shines when “Individualism” is honored. While there is a cup for the team, there is also a man-of-the-match (or woman-of-the-match). So, it has to be “Teaming,” “Collaboration,” and “Individualism.”

It’s not about leveraging digital collaboration tools. It’s about allowing human emotions to flow in the work context using gaming to simulate a sports environment. Example: Leaderboards in adopting an agile practice, Leaderboards in competency development, Leaderboards in customer NPS, and badges for engineer-of-the-team help pump the adrenaline gene. The gaming process should not be too liberal in the rewarding process and allow good/bad/ugly emotions to flow—Mix western-style gaming by rules and eastern-style gaming by shame. Example: In football (soccer),

  1. Western-style: The yellow card to warn a player is gaming by rules.
  2. Eastern-style: The coach pulling out a non-performing player from the field is gaming by shame.

Agile Manifesto: “Individuals and interactions over processes and tools

For engineers: Don’t treat your team at work like your family. Treat them as your sports team. If you are handicapped for life, the people looking after you is your family and not your team. The team will extend financial/emotional support but cannot replace family. The value systems are different.

For coaches: Use gaming. Use agile. Not just the process. People before process.

Digital Career in Technology

“I am in software, so I am digital,” claims a software engineer. It’s a fallacy.

“What is Digital?” Based upon their experience, the audience may answer as software, computers, workflow, automation, social media, or agile.

“I have converted the paper workflow into a form on the computer. We are now paperless. Everything is digitized and neatly stored in the database libraries.” – Software Product Leader. Behind the scenes a user is complaining – “This computerized documentation is slower than paper”

Digitizing is not Digitalizing

“I have applied for a job in a Digital company. They are into IoT, Cloud, and BIG Data. After I joined, it’s no different than any other software company. It’s just fixing bugs in somebody else’s code! and long working hours.”

Digital is not about software

So, here’s my opinion – Digital is about customer experience and consumerism. Technology and Software (Computers, software, agile, AI/ML, workflow, etc.) play a role, but they are just a means to an end.

“Customer” <<>> “Experience” <<>> “Consumerism”

Story of Mrs. Jane Doe going Digital

Mrs. Jane Doe loves cooking. She believes she makes the best chocolate cookies in the world. She wants to go digital – she wants more people to experience and consume her cookies. A good business is a growing business. A friend tells her about “anything-you-want.com,” where there are a million users registered. She can publish her cookies and her location; the platform has delivery partners that will deliver her cookies anywhere in the world. So, she has only to make yummy cookies and not hire expensive and cranky software engineers!

The results were excellent; there were 100 delivery requests on day-1. Mrs. Jane Doe improved the consumption of her service (cookies). When she went back to the site, 60 users had rated her cookies as 5-star, 30 users had rated her cookies as 4-star, and 10 users had rated her cookies as 2-star with comments as “Too sweet and sugary. Avoid”

Hmmm, more consumption means more feedback (experience ranges from good, bad, ugly). So, in her next iteration, she added customization to her cookies to request reduced sweetness. She observed the next 100 orders, and it looks like she has made an incremental improvement. 65 cookies rated as 5-star and 35 cookies rated as 4-star. She had no objective/subjective data to improve her cookies. There were no comments at all.

She had an idea; she published a discount coupon code with the next order; the discount coupon code would be activated only after a feedback comment. After this incremental change, she observed the next 100 orders, and voila! she had comments (at the cost of giving free cookies). The comments ranged from Boring package, Expected more cookies in the package, same looking and tasting cookies without variety, too hard for my teeth, and too mushy and melting. She was now armed with feedback from a poor experience and ready to make more changes. She was determined to improve her rating! A higher rating means more orders!! So, it’s experience and consumerism. Digital is cool.

Digital is a new way of doing business. Well, it’s the old way that is packaged in a new way, with “technology” as an enabler and accelerator.

So, how does this relate to a career in technology?

Modern technology is architected as a set of services. It’s paramount that the consumption and experience of the service are measured. Measurement and feedback improve the service. Feedback could be defects or improvement opportunities, and addressing them enhances the experience and consumption of the service. Collect data about consumption and experience – logs, click-streams, and user feedback circles. Analyze data to improve the service quality attributes – functionality, reliability, scalability, etc. It’s a digital pursuit to improve a service experience and consumption.

This continuous improvement mindset drives digital. The user/customer is at the center, not technology. Technology is applied to improve the services. Don’t just hear them; listen to the user’s feedback. If the user is a critique, you are lucky. It’s an opportunity to improve. Whether you are building a platform or an application, it’s a service with a user/customer that uses the service. Move away from software to service.

It’s a digital economy powered by services. Digital is customer experience and consumerism.

While striving for technology expertise/excellence, focus on users/customers. You can then add “digital” to your CV.