Kashmir

It’s Tvisha’s summer vacation, and due to work pressures, we could not take her out of Bangalore in April. So she was bugging us to get out of Bangalore. After comparing Leh/Ladakh, Sikkim, Andamans, and Kashmir, we finally settled on Kashmir. I had been to Kashmir when I was very young and had fond memories of the place. I was unsure whether it was safe to visit Kashmir, and after much online research and talking to the travel agents, we zeroed in on Kashmir.

Some friends were going on a Bangalore-Leh drive (9K+ Kms), and others were going to monasteries in Sikkim. We had recently been to monasteries in Bhutan, and the girls were not in the mood for another long drive! In December 2021, we enjoyed a seven-day long drive (Bangalore-Coonoor-Kotagiri-Coimbatore-Mahabalipuram-Bangalore). The weatherman predicted rains in Andamans, and the daughter wanted to make Olaf! So, Kashmir was inevitable!

After comparing rates and dates with our favorite travel agent (SOTC), we finally booked our trip with the MakeMyTrip holidays.


Day 1 – Bangalore to Srinagar

There is a direct Indigo Flight (6E 797) from Bangalore to Srinagar that starts in Bangalore at a convenient time (~9:15 AM) and reaches Srinagar post noon (~2:00 PM) with a brief halt at Amritsar.

When the plane landed at the Srinagar airport, all passengers were in awe of Srinagar’s beautiful hills (some capped with snow). At the airport, our driver greeted us, and he instantly realized that we South Indians were in Kashmir to see snow in summer! He promised us ample snow in Sonmarg and Gulmarg, making Tvisha very happy!

It was a short drive from the airport to our first hotel in Srinagar. It was not a classic hotel but a houseboat. The houseboat we stayed in was Naaz Kashmir (https://www.naazkashmir.com/) and was located in Nageen Lake (and not Dal Lake). All the lakes in Srinagar are connected, but Nageen Lake is less commercial and crowded.

The rest of the day was free for us to enjoy the houseboat! We spent our time taking photos, watching fishermen and birds catch fish, eating pakoras, listening to the music of the water and birds, listening to the prayers from different surrounding Mosques, dressing up in a traditional dress, and a short shikara ride.

It was an abrupt stop to the fast life we are used to, staring back at nature and adoring its beauty. We talked to each other as a family and were not immersed in our gadgets! However, Tvisha was excited to show off her first day in Kashmir to her friends in a WhatsApp call!

Naaz Kashmir served us well – candlelight dinners, food per our needs, and recommendations about Shikara rides. So we took their advice and decided to take the 4:30 AM four hours Shikara ride from Nageen Lake to Dal Lake.

Staying in a houseboat is like sharing a house with other equally clueless and excited families.


Day 2 – Srinagar (Shikara Ride)

We kept the alarm to wake us up at 4:00 AM to prepare for our 4:30 AM shikara ride. We woke up to the alarm and morning prayers at the lake. It reminded me of my pre-college days when 4:00 AM had become a routine for studies. The caretakers were already up and knocked on our doors to ensure that we were ready and sent us on the shikara with some hot Kashmiri tea (Kahwa) and snacks.

It was bitter cold (for us) and pleasant for the locals. We tugged ourselves into the blanket available in the shikara. Our little braveheart sandwiched herself conveniently between her parents, refused to step out of the blanket throughout the ride, enjoyed the frequent warm rubs, and did not hesitate to nap. The shikara moved slowly, confidently, and thoughtfully through the lake.

The cold air on the face and the calming sound of the shikara moving in the lake is an unforgettable experience. Most of the locals and birds were up at 4:30 AM!! As we rowed through the lake, we could see the homes of the locals – men and women at work. Everyone that met the eye returned a welcoming smile.

We spotted a water snake, various colored water lilies, and birds like eagles, geese, mallards, pochards, gadwalls, pintails, waders, coots, and the common teal. The shikara rider helped us identify the birds.

We could see water vapor causing fog in some places due to the difference in water temperatures and the surroundings.

The shikara rider explained to us in detail the unique farming done by the locals in the lake, i.e., the process in which they grow vegetables. These farmers grow carrots, radishes, turnips, and other vegetables in the soil that floats on flora beds.

Such floating gardens are maintained all over the lake, and the farmers move through their plots on boats carrying their harvest to the lake’s market area. The rider took us through the floating gardens and the lotus farms to the floating vegetable market on dal lake. We bought some flowers and seeds and had some hot Kahwa (Kashmiri Tea) and snacks packed for us by the Naaz Kashmir at this market.

The rider then took us to the dal lake (pronounced as दल and not दाल), and at this time of the day, it was empty except for the locals fishing out the water weeds to compost them for use in their farms. It was a beautiful sunrise to watch at the dal lake, wade through the markets (Meena Bazaar), and spot traditional homes. The ride back was slow, and the gentle swaying motion of the shikara ride put us to sleep for a few minutes.


Day 2 – Sonmarg (सोनमर्ग and not सोनमIर्ग)

At around 9:30 AM, after breakfast, we started driving to Sonmarg, a hill station in the Ganderbal district.

It was about two hours drive. The roads were not too good, and there were traffic jams in a few places. If we had left about an hour earlier, we might have saved about 1/2 hour of jam time. It was visitor traffic. However, after the early morning Shikara ride, we needed some time to fuel and freshen ourselves before our next adventure.

As we reached closer to Sonmarg, we could see the snow-capped mountains, feel the drop in temperature, and breathe the superior quality air. Sonmarg was much cooler than Srinagar but fortunately pleasant (even for our visitor skin).

During the drive, we could see streams of water – the tributaries of Jhelum – Lidder, Sind, and Neelum. The sight of white water forcing itself down the mountains was strangely peaceful. It reminded us of our stay in Salt Lake City, Utah.

The driver told us that during the winter months, heavy snowfall blocks NH-1. So a tunnel is being built to keep the road open year-round. We found trekkers ready to trek from Sonmarg to Leh (“the rooftop of the world”) and groups wanting to drive through the Zoji La pass. Our driver informed us that driving through the Zoji La pass is a must-do off-road adventure. However, he recommended we not take our city car and take a four-wheel drive as the roads are narrow and rocky. It seems it’s a trek worth doing! So, now this one is also added to the backlog!

However, our goal for today was relatively simple. We either trek or take ponies to the Thajiwas glacier. This glacier is a favorite summer destination at Sonmarg.

My daughter is always excited and happy around animals. We were not sure whether we (“The Adults”) needed ponies; however, the agents there convinced us that the pony ride is a “must-do” in Sonmarg. We gave in; however, it would have been a simple to moderate walk/trek and less burden on the animals in hindsight. The ponies were expensive, about 2750 INR / person (we negotiated out more than 50% of the original demand. Better negotiators got it at around 1500 INR / person). We also got a photographer for us to be more hands-free. The pony ride was uphill, downhill, and through cold water streams.

The scenery was picture perfect, a silvery scene set against green meadows and a clear blue sky. The views were captivating, and we outsourced the picture capture activity to our photographer and instead enjoyed the views. The air was too fresh to keep our masks on.

At the glacier, Tvisha finally made her Olaf! She was throwing snowballs at us in all directions, even when we were haggling with the locals to reduce the sledding costs! She was able to go up and down the snow and was soaking in the happiness, unlike us. Then, after an uphill trek in the snow, we sat down on a rock and came down sledding. The weather was not too cold, and if it were not for the time to return to Srinagar and limited food options, we would have spent more time up at the glacier.

We encountered the local police (to the surprise of the locals), who helped us reduce the cost of sledding from 3500/- to 500/- INR (though we ended up paying 1000 INR per person for sledding). It’s funny that the locals keep saying (क्या आप खुश है, हमे भी खुश कीजिये) “Are you happy? Make us happy!”, a method to get more money from the visitors. These people make money only during the summer months (visitor months) and try to make the most of it. COVID & CONFLICT closures have been hard on them. We found these people to be cheerful, happy, and helpful. So, we did not hesitate to give (tips) more than we thought was reasonable!

We stopped at a roadside restaurant to have some delicious chole-puri and dal-makhani for Tvisha on our way back. We reached back around 7:00 PM for another candlelight dinner at Naaz Kashmir. The owners of Naaz Kashmir had moved our luggage out of the room as they thought that we booked for only one night, and they realized that there was a communication issue between MakeMyTrip and their reservation team. They made up for it by giving us a superior room, a chocolate cake for Tvisha, and several apologies.

We took a warm bath and crashed soon after!

We were woken up by Tvisha mid-night as she started vomiting and was feeling very sick! She had no temperature but did not look well. We were not sure whether it was the altitude, change in food/water, or a stomach bug. Thankfully, we had carried medicines – “Enterogermina” and “Calpol” in our first aid kit!


Day 3 – Pahalgam

We started around 9:45 AM after thanking the caretakers for their service at Naaz Kashmir and completing the check-out formalities. Tvisha was sick and only managed to eat mangoes for breakfast. The drive to Pahalgam was long but on more convenient and motorable roads. We wanted to stop for Kahwa and visit apple farms, but Tvisha would only sleep in the car. So, we drove straight to our hotel, the Radisson Golf Resort.

The Pahalgam scenery was unique, with white water streams and a backdrop of pine trees and snow-capped mountains. The temperature was pleasant and leaning towards too cool.

All we could do on day-3 was check in and rest. Tvisha slept all afternoon and night. She was running a slight fever, so we consulted our family doctor, and she diagnosed her to have the stomach flu. So we requested the hotel to get us medicines (the medicine shop was ~2KMS far), and they helped us.

The trip managers advised us to visit Abu Valley, Betaab Valley, Chandanwari, and Baisaran valley (Mini-Switzerland). However, we had lost 1/2 a day and had to check out by 1:00 PM the next day (Day-4). So we decided that if Tvisha feels better on Day-4, we will do the Baisaran valley.

Pahalgam (First Village) gets its name from Hindi “पहला गांव” and Shiva devotees visit this place frequently in summer to pilgrimage to the Amarnath Cave for the Darshan of the only ice stalagmite Shiva Linga. The pilgrimage usually starts from Chandanwari but was closed due to road work during our travel.


Day 4 – Pahalgam

Tvisha woke up ready for her next adventure; however, I felt queasy in my stomach. It was my turn to fall sick. However, Dolo came to my rescue, and after an insignificant breakfast, we all jumped up on horses to visit the Baisaran valley. This trek is captivating, and it’s better to trek than take horses. However, a just recovered Tvisha and a queasy daddy were not in any state to trek. So, horses again!

Baisaran Valley is a hilltop green meadow dotted with dense pine forests and surrounded by snowcapped mountains. This famous offbeat tourist place is excellent for those wanting to spend a quiet time in the company of nature. It also serves as a campsite for trekkers going to Tulian Lake. Some of the famous tourist points you can see en route to Baisaran are Pahalgam Old Village, Kashmir Valley Point, and Deon Valley Point. You can also enjoy panoramic sights of Pahalgam town & Lidder Valley from here.

We returned to our hotel after about four hours (~1:00 PM) ride to Baisaran. We completed our checkout and had a delicious rainbow trout tandoori (a local delicacy) for lunch. The driver told us that butter or olive fry is better. After a sumptuous meal, we departed by car to Gulmarg.


Day 4 – Pahalgam to Gulmarg (गुलमर्ग )

It took about 3.5 hours from Pahalgam to reach Gulmarg. Gulmarg is a ski destination and is famous for winter sports. It is called the meadow of flowers in summers. Our driver informed us that they have to use chains on wheels in winter, and the snowfall in Gulmarg is heavy. The beauty of Gulmarg was different (in a good sense) than Pahalgam and Sonmarg. The first thing that strikes out is the lush green meadows (in summers).

We camped in the Hilltop Hotel. At first glance, the hotel seems jaded, faded, and under maintenance. While that is true, the rooms were well designed, and the in-room dining service was good. The service at this hotel was better than any of our previous experiences in Kashmir, even though they were ok at room cleaning services, breakfast variety, bathroom accessories, changing towels, or responding to your hails. We were satisfied that we could have a warm bath, eat something edible, and reach on time for the gondola ride the following day. The hotel is close to the gondola ride and ice skating rink.

Dolo carried me only this far, and I had slight chills and crashed for the night. I hoped, wished, and prayed that the fever would help kill the virus/bacteria (stomach bug) for me to enjoy Gulmarg the next day.


Day 5 – Gulmarg – Gondola Ride to Kongdoori

The chills were gone in the morning; however, I was still queasy. So, I trusted my best friend (Dolo) and braved the gondola.

MakeMyTrip was able to arrange gondola tickets for the first stage (Base Station to Kongdoori). The tour guide told us that the second stage tickets (Kongdoori to Apharwat) were unavailable (sold out). However, in hindsight, we did not regret not being able to do the second stage.

The gondola wait lines and wait times are infamous. People start queuing at 7:00 AM, even though the ride opens at 9:30 AM. We reached the queue at around 8:00 AM and found our tour guide. The people standing in the line entertained themselves by fighting with others who tried to move ahead. Words and punches were flying until the ride opened at 9:30 AM. The locals were also amused at the sight. The trip was about 10 minutes, and the wait time to board the ride was 2 hours. Dolo kept me on my feet.

The gondola ride is short, and the views are terrific. The valley is picturesque, and we can spot the snowcapped Himalayas, Apharwat, and Mud houses from the ride. We could also see the unlucky visitors (those who could not get a gondola ticket) trekking or using horses to climb up to Kongdoori. This trek is a moderate to challenging hike.

At Kongdoori, people had to queue again to ride to Apharwat, and the queue was equally long. We were happy that we don’t have to wait in another queue.

Gondola Ride to Kongdoori (Friend: Dolo)

At Kongdoori, the horse owners were bugging us to take a horse ride to the waterfall. However, we discussed it with our tour guide and decided to trek. Trekking (and not horse riding) was the best decision. We could stop to look at the multi-colored flowers in the meadows, jump over streams, trek with the goats, stop to hear the sound of silence, spot lizards, take photos, and experience rocky trails.

When we reached the waterfall, we had to trek in snow to get up to the mouth of the waterfalls. The snow trek was challenging.

However, the mountain water was delicious and pure. We drank from the waterfall, and this water tasted better than any mineral or filtered water. So we filled a bottle to quench our thirst for the return journey.

If I travel again to destinations like these, I will remind myself to buy shoes with some grip.

Finally, we sledded downhill and relished on some delicious maggie cooked in the mountains waters. Strangely, maggie gave us the strength to trek back to the gondola ride station.

We took a different route to see the valley and meadows from a different perspective. This route was shorter and required us to climb uphill and roll downhill.

Again, the view was picture perfect!

After reaching the base station, we rushed to feed ourselves a late lunch. The lunch was good. We did not have any energy left to do the ATV rides and decided to skip them and relax in the room. Then, in the evening, we went down to the cafeteria to have some snacks. Our legs could not tolerate any more walking and would only walk back in the direction of the hotel room. So, we snuggled back into the room and watched the evening walkers from the comfort of the room.

Days are long in Gulmarg, and it’s bright even at 7:00 P.M.

I recovered from the stomach bug (Thanks! Enterogermina), and now it was time for my wife to fall sick to the same bug! She had a better immune reaction to the bug than my daughter or me; however, she sought help from Dolo and Enterogermina to fight off the bug.

Day 6 – Back to Srinagar

Gulmarg is about ~50Kms from Srinagar. So, the return journey was short. We woke up late and lazy, and left for Srinagar after a late breakfast.

We stopped to see apple farms and drink delicious green apple juice. We tasted various homemade pickles and bought lotus stem pickles from the farmer. Lotus stem (कमल-ककड़ी), locally known as “Nadru” is grown in shallow parts of water bodies like ponds and lakes and is a vastly enjoyed ingredient in Kashmiri cuisine.

We also stopped to have some premium Kahwa and buy some dry fruits (Walnuts) and condiments (Kesar).

We checked into Radisson Srinagar, the first hotel in Kashmir, where we found women employees. My wife had a heart-to-heart talk about women empowerment with the ladies there!

After a good lunch, we headed to see the oldest temple in Kashmir, the Shankaracharya Temple, dedicated to Lord Shiva. The temple is a monument of national importance and is protected by ASI (Archeological Society of India). There are many steps to climb, and the view of Kashmir valley from the hilltop is superb. It was very windy up the hill and pleasant. Unfortunately, cameras were not allowed for a picture remembrance. After blessings from Lord Shiva, we decided to stroll the gardens of Srinagar.

Unfortunately, due to the holiday rush and this day being a Sunday (many locals were out sightseeing), we saw only the Botanical Garden and the Chashme Shahi. We missed the Tulips as the Tulip Garden was closed a few days back (Peak season: April). The botanical garden was a nice walk, and the flowers in Chashme Shahi were exquisite. My daughter enjoyed taking photos of several flowers.

We had earlier decided to ride the Shikara again at Dal Lake; however, we decided to dart back to the hotel, looking at the rush and weather. Finally, we finished the day with a lavish buffet dinner.

Day 7 – Back to Home @ Bangalore

The only eventful activity was the security checks at the Srinagar Airport. We must step out of our cars at least a kilometer before the airport and have to get ourselves, the car, and the bags checked.

We left Srinagar entirely mesmerized by the beauty of Kashmir, and I decided to pen this down in a blog (for us) so that this never fades from (our) memory. So this blog is my first travel blog.

After a few more doses of Enterogermina and home food, my wife got better. We have returned to our workaholic ways and keep discussing our Kashmir trip with friends and family.

Data Descriptors (Stats, Relations, Patterns)

Data analysts look for descriptors in data to generate insights.

For a Data aggregator, descriptive attributes of data like size, speed, heterogeneity, lineage, provenance, and usefulness are essential to decide the storage infrastructure scale, data life cycle, and data quality. These aggregator-oriented descriptions are black-box perspectives.

For a Data analyst, descriptive statistics, patterns, and relationships are essential to generate actionable insights. These analyst-oriented descriptions are white-box perspectives. The analysts then use inferential methods to test various hypotheses.

Descriptive Statistics

Data analysts usually work with a significant sample of homogenous records to statistically analyze features. The typical descriptive statistics are – measures of location, measures of center, measures of skewness, and measures of spread.

E.g., A 23 member cricket team of three different states has players of the following ages:

Karnataka: [19,19,20,20,20,21,21,21,21,22,22,22,22,22,22,23,23,23,23,24,24,24,25,25]

Kerala: [19,19,20,20,20,21,21,21,22,22,22,22,23,23,23,23,23,24,24,24,24,24,24]

Maharashtra: [19,19,19,19,19,19,20,20,20,20,20,21,21,21,21,22,22,22,23,23,24,24,25]

Numbers represented this way does not help us detect patterns or explain the data. So, it’s typical to see the tabular distribution view:

AGEKarnatakaKeralaMaharashtra
19226
20335
21434
22543
23452
24362
25201
Age Distribution of State Players

This distribution view is better. So, we would like to see measures of center for this data. These are usually – MEAN, MEDIAN, and MODE.

  • MEAN is the average (Sum Total / # Total)
  • MEDIAN is the middle number
  • MODE is the highest frequency number
MeasureKarnatakaKeralaMaharashtra
MEAN2222.121
MEDIAN222221
MODE222419
Measures of Center

This description is much better. So, we would like to see this graphically to understand the skewness.

Measuring skewness

The age distribution is symmetrical for Karnataka, skewed to the left for Kerala, and skewed to the right for Maharashtra. The data analyst may infer that Karnataka prefers a good mix of ages, Kerala prefers player experience, and Maharashtra prefers the young.

The data analyst may also be interested in standard deviation, i.e., a measure of spread. The standard deviation symbol is sigma (σ) for a sample and is the MEAN distance from the mean value of all values in the sample. Since a distance can be positive or negative, the distance is squared, and the result is square-rooted.

MeasureKarnatakaKeralaMaharashtra
Standard Deviation1.81.71.8
Measure of Spread

In our example, a measure of location (quartiles, percentiles) is also of interest to the data analyst.

PercentileKarnatakaKeralaMaharashtra
25 Percentile212119.5
50 Percentile222221
75 Percentile2323.522
100 Percentile252425
Measure of Location

The table above shows that the 50 percentile value is the median, and the 100 percentile is the maximum value. This location measure is helpful if the values were scores (like in an exam).

Combining statistics and display to explain the data is the art of descriptive statistics. There are several statistics beyond the ones described in this short blog post that could be useful for data analysis.

Time-series Data Patterns

The time-series data has trends, variations and noise.

  1. A trend is the general direction (up, down, flat) in data over time.
  2. Cyclicity variation is the cyclic peaks and troughs in data over time.
  3. Seasonality variation is the periodic predictability of a peak/trough in data.
  4. Noise is meaningless information in data.

The diagrams below provide a visual explanation:

“Ice cream sales are trending upward,” claims an excited ice-cream salesman.

“Used Car value is trending downward,” warns the car salesman

Every business has up and down cycles, but my business is trending upwards,” states a businessman.

It’s the end of the month, so, Salary and EMI season in user accounts, so the transaction volume will be high,” claims the banker.

“There is some white noise in the data,” declared the data scientist.

Data Relationships

Data analysts seek to understand relationships between different features in a data set using statistical regression analysis.

There could be a causal (cause and effect) relationship or simply a correlation. This relationship analysis helps to build predictors.

A simple measure of linear relationship is the correlation coefficient. The measure is not relevant for non-linear relationships. Correlation coefficient of two variables x and y is is calculates as:

correlation(x, y) = covariance(x, y) / (std-dev(x) * std-dev(y))

It’s a number that in the range [-1,1]. Any number closer to zero implies no correlation, and closer to either extremity means higher linear correlation.

  • Negative one (-1) means negatively linearly correlated
  • Positive one (1) means positively linearly correlated
  • Zero (0) means no correlation

Example: Let’s take this random sample.

XYY1Y2Y3
13-383-100
28-8108250
315-15146-50
424-2498150
535-35231-50
648-48220155
763-63170-125
880-80100-150
999-99228-12
10120-120234190
Sample Data
X and YX and Y1X and Y2X and Y3
1-10.60
Correlation coefficient

Visually, we can see that as X increases, Y increases linearly, and Y1 decreases linearly. Hence, the correlation coefficient is positive (1) and negative (-1), respectively. There is no linear relation between X and Y3, and hence, the correlation is 0. The relationship between X and Y2 is somewhere in between with a positive correlation coefficient.

Scatter plot X against (Y, Y1, Y2, Y3)

If X is the number of hours bowler practices and Y2 is the number of wickets, then the correlation between the two can be considered positive.

If X is the number of hours bowler practices and Y3 is the audience popularity score, then the correlation between the two can be considered negligible.

If X is the number of years a leader leads a nation, and Y or Y1 is his popularity index, then the correlation between the two can be considered linearly increasing or decreasing, respectively.

Summary

Data analysts analyze data to generate insights. Insights could be about understanding the past or using the past to predict the (near) future. Using statistics and visualization, the data analysts describe the data and find relationships and patterns. These are then used to tell the story or take actions informed by data.

V’s of Data

Volume, Velocity, Variety, Veracity, Value, Variability, Visibility, Visualization, Volatility, Viability

What are the 3C’s of Leadership? “Competence, Commitment, and Character,” said the wise.

What are the 3C’s of Thinking? “Critical, Creative, and Collaborative,” said the wise.

What are the 3C’s of Marketing? “Customer, Competitors, and Company,” said the wise.

What are the 3C’s of Managing Team Performance? “Cultivate, Calibrate, and Celebrate,” said the wise.

What are the 3C’s of Data? “Consistency, Correctness, and Completeness,” said the wise; “Clean, Current, and Compliant,” said the more intelligent; “Clear, Complete, and Connected,” said the smartest.

“Depends,” said the Architect. Technologists describe data properties in the context of use. Gartner coined the 3V’s – Volume, Velocity, and Variety to create hype around BIG Data. These V’s have grown in volume 🙂

  • 5V’s: Volume, Velocity, Variety, Veracity, and Value
  • 7 V’s: Volume, Velocity, Variety, Veracity, Value, Visualization, and Visibility

This ‘V’ model seems like blind men describing an elephant. A humble engineer uses better words to describe data properties.

Volume: Multi-Dimensional, Size

“Volume” is typically understood in three dimensions. Data is multi-dimensional and stored as bytes—a disk volume stores data of all sizes. Data does not have volume! It has dimensions and size.

A person’s record may include age, weight, height, eye color, and other dimensions. The size of the record may be 24 bytes. When a BILLION person records are stored, the size is 24 BILLION bytes.

Velocity: Speed, Motion

Engineers understand the term velocity as a vector and not a scalar.

A heart rate monitor may generate data at different speeds, e.g., 82 beats per minute. I can’t say my heart rate is 82 beats per minute to the northwest. Hence, heart rate is a speed. It’s not heart velocity. I can say that a car is traveling 35 kilometers per hour to the northwest. The velocity of the vehicle is 35KMPH NW.

Data does not have direction; hence it does not have velocity. Data in motion has speed.

Variety: Heterogeneity

The word variety is used to describe differences in an object type, e.g., egg curry varieties, pancake varieties, sofa varieties, tv varieties, image data format varieties (jpg, jpeg, bmp), and data structure varieties (structured, unstructured, semi-structured). Data variety is abstract and is a marketecture term.

Heterogeneity is preferred because it explicitly states that:

  1. Data has types (E.g., String, Integer, Float, Boolean)
  2. Composite types are created by composing other data types (E.g., A Person Type)
  3. Composite types could be structured, unstructured, or semi-structured (E.g., A Person Type is semi-structured as the person’s address is a String type)
  4. Collections contain the same or different data types.
  5. Types, Composition, and Collections apply to all data (BIG or not).

Veracity: Lineage, Provenance

Veracity means Accurate, Precise, and Truthfulness.

Let’s say that a weighing scale reports the weight of a person as 81.5 KG. Is this accurate? Is the weighing scale calibrated? If the same person measures her weight on another weighing scale, the reported weight might be 81.45 KG. The truth may be 81.455 KG.

Data represent facts, and when new facts are available, the truth may change. Data cannot be truthful; it’s just facts. Meaning or truthfulness is derived using a method.

Lineage and provenance meta-data about Data enables engineers’ to decorate the fact with other useful facts:
1. Primary Source of Data
2. Users or Systems that contributed to Data
3. Date and Time of Data collection
4. Data creation method
5. Data collection method

Value: Useful

If Data is a bunch of facts, how can it be valuable? Understandably, the information generated from data by analyzing the facts is valuable. Data (facts) can either be useful to create valuable information or useless and discarded. We associate a cost to a brick and a value to a house. Data is like bricks used to build valuable information/knowledge.

Summary

I did not go into every V, but you get the drill. If an interviewer asks you about 5V’s in an interview, I request you to give the standard marketecture answer for their sanity. The engineer’s vocabulary is not universal; technical journals publish articles in the sales/marketing vocabulary. As engineers/architects, we have to remember the fundamental descriptive properties of data so that the marketecture vocabulary does not fool us. However, we have to internalize the marketecture vocabulary and be internally consistent with engineering principles.

It’s not a surprise that Gartner invented the hype cycle.

Data Aggregation (Map, Filter, Reduce)

Data engineers think in batches!

Thinking in batches reminds me of a famous childhood story.

Once upon a time, a long, long time ago, there was a kind and gentle King. He ruled people beyond the horizon, and his subjects loved him.

One day, a tired-looking village man came to the King and said, “Dear King, help us. I am from a village beyond the horizon. It’s been raining for several days. My village chief asked me to fetch help from you before disaster strikes. It took me five days to walk to the Kingdom, and I am tired but glad that I could deliver this message to you.”

“I am glad that you came for help. I will send Suppandi, my loyal Chief of Defence, to assess the damage and then send help,” said the King. “Suppandi, you have your orders. Now, go. Assess the damage, report to me, and help,” ordered the King.

Suppandi left to the village beyond the horizon on his fastest horse. When he reached the town, the town was flooded, and Suppandi felt the urge to return to the King quickly to inform him about the floods. So, he drove his horse faster and reached the Kingdom in 1/2 day. He went to the King and told him. “Dear King, the village is flooded. I went in a day and came back in 1/2 day to give you this information.”

Suppandi was pleased with himself. However, the King wanted more information. “Suppandi, please tell me whether people in the village have food, are children hurt? What can we do more to help?”

“I will find out, Dear King,” said Suppandi. He left again on his fastest horse. This time he reached in 1/2 day. He figured that people don’t have food, and many children are hurt and homeless. He raced back to the Kingdom. “Dear King, I reached in 1/2 day and came back in another 1/2. The villagers don’t have food to eat, and they are hungry. Several children are hurt and need medical attention,” said Suppandi.

This time the King had more questions. “Dear Suppandi, what did the village chief say? What can we do for him?”

“Dear King, I will find out. Let me leave to the village immediately,” said Suppandi.

Chanakya was eagerly listening in to the conversation. He told Suppandi, “Dear Suppandi, you must be tired. Let me take over. Take some rest.”

Immediately, Chanakya ordered his men to collect food, water, clothes, medicines, and doctors. He asked for the fastest horses, and along with several men and doctors, he left for the village beyond the horizon. When he reached, the town was flooded, and people were on their home terraces. He found several houses destroyed and hungry kids taking shelter under the trees, and many wounded villagers.

He ordered his men to save the villagers skirting the flood, protect all children, feed them, and take them to a safe place. He also called the doctors to attend to the wounds.

The men built a temporary home outside the village to give shelter to the homeless. They waited for a few days for the rain and flood to subside. When it was bright and sunny, Chanakya, his men, and the villagers cleaned the village, re-built the homes, and deposited enough food and grains for six months before saying goodbye.

Chanakya reached the Kingdom and immediately reported to the King. The King was anxious. He said, “Chanakya, you were gone for two weeks with no message from you. I was worried. Did you speak to the village Chief?”

“Dear King, Yes, on your behalf, I spoke to the village chief. I found that the village was flooded, so we rescued all the villagers, attended to the wounded, fed them, re-built their homes, and left food and grains for six months. The people have lost their belongings in flood, but all of them are safe, and they have sent their wishes and blessings for your timely help,” said Chanakya.

The King was pleased. “Chanakya, I should have sent you earlier. You are a batch thinker! Thank you,” said the King.

Suppandi was disappointed. He had worked hard to drive to the village and report to the King as instructed, but Chanakya gets all the praises. To this date, he still does not understand and is hurt.

Most non-data engineers are like Suppandi; they use programming constructs like “for,” “if,” “while,” and “do” on remote data. Most data engineers are like Chanakya; they use the programming constructs like “map,” “filter,” “reduce,” and “forEach.” Programming with data is always functional/declarative, while traditional programming is imperative.

There is nothing wrong with acting like Suppandi; he is the Chief of Defence. But, some cases require Chanakya thinking. In architectural language, Suppandi actions move data to algorithms, and Chanakya actions move algorithms to data. The latter works better when there is a distance and cost-to-travel between data and algorithms.

This difference in thinking is why data engineers use SQL, and traditional engineers use C#/Java. SQL uses declarative commands that are sent to the database to pipeline a set of actions on data. The conventional programming languages have caught up to the declarative programming paradigm by supporting lambda functions (arrow functions), and map/filter/reduce style functions on data collections. The map/filter/reduce style functions allow compilers/interpreters to leverage the underlying parallel compute backbone (the expensive eight-core CPU) or use a set of inexpensive machines for parallel computing. They are abstracting away parallelism from the programmer. The programmer helps the compiler/interpreter to identify speed-improvement opportunities by explicitly programming declaratively.

Mapping

Instead of iterating over a collection one at a time, a map is a function to apply another function to all elements of a collection. The map function may split the collection into parts to distribute to different cores/machines. The underlying collection remains immutable. In general, mapping could mean one-2-one, one-2-many, and many-2-one; and is the process of applying a relation (function) to map an element in the domain with an element in the range. In the case of computing, mapping does not change the size of the collection.

E.g., [1,2,-1,-2] => [1,4,1,4] using the squared relation is a many-2-one mapping

var numbers = [1, 2, -1, -2];
var x = numbers.map(x => x ** 2);
console.log(x);
[1,4,1,4]

E.g., [1,2,-1,-2] => [2,3,0,-1] using the plus one relation is a one-2-one mapping

var numbers = [1, 2, -1, -2];
var x = numbers.map(x => x + 1);
console.log(x);
[2, 3, 0, -1]

E.g., [1,2,-1,-2] using the plus one and squared relation is a one-2-many mapping

var numbers = [1, 2, -1, -2];
var x = numbers.map(x => [x + 1, x ** 2]);
console.log(x);
[[2, 1], [3, 4], [0, 1], [-1, 4]]

E.g., An SQL Example of a one-2-one mapping

SELECT Upper(ContactName)
FROM Customers
MARIA ANDERS
ANA TRUJILLO
ANTONIO MORENO
THOMAS HARDY

Filtering

Instead of iterating over a collection one at a time, a filter is a function to return a subset of elements that match criteria. The filter function may split the collection into parts to distribute to different cores/machines. The underlying collection remains immutable. Examples:

var numbers = [1, 2, -1, -2];
var x = numbers.filter(x => x > 0);
console.log(x);
[1, 2]
SELECT *
FROM Customers
WHERE Country="USA"

Number of Records: 13

CustomerIDCustomerNameContactNameAddressCityPostalCodeCountry
32Great Lakes Food MarketHoward Snyder2732 Baker Blvd.Eugene97403USA
36Hungry Coyote Import StoreYoshi LatimerCity Center Plaza 516 Main St.Elgin97827USA
43Lazy K Kountry StoreJohn Steel12 Orchestra TerraceWalla Walla99362USA
45Let’s Stop N ShopJaime Yorres87 Polk St. Suite 5San Francisco94117USA

Reduce

Instead of iterating over a collection one at a time, a reduce is a function to return a single value. The reduce function may split the collection into parts to distribute to different cores/machines. The underlying collection remains immutable. Examples:

var numbers = [1, 2, -1, -2];
var x = numbers.reduce((sum,x) => sum + x, 0);
console.log(x);
0
SELECT count(*)
FROM Customers
Number of Records: 1
count(*)
91

Pipelining

When multiple actions need to be performed on the data then it’s a norm to pipeline the actions. Examples:

var numbers = [1, 2, -1, -2];
var x = numbers
  .map(x => x + 1) //[2,3,0,-1]
  .filter(x => x > 0) //[2,3]
  .map(x => x ** 2) //[4,9]
  .reduce((sum, x) => sum + x, 0) //13
console.log(x);
13
SELECT Country, Upper(Country), count(*)
FROM Customers
WHERE Country LIKE "A%"        
GROUP BY Country
Number of Records: 2
Country Upper(Country) count(*)
Argentina ARGENTINA 3
Austria AUSTRIA 2

Takeaway

Data Engineers use Chanakya thinking to get work done in batches. Even streaming data is processed in mini-batches (windows). Actions on data are pipelined and expressed declaratively. The underlying compiler/interpreter abstracts away parallel computing (single device, multiple devices) from the programmer.

Think in Batches for Data.

Data Quality (Dirty vs. Clean)

Data Quality has a grayscale, and data quality engineers can continually improve data quality. Continual quality improvement is a process to achieve data quality excellence.

Dirty data may refer to several things: Redundant, Incomplete, Inaccurate, Inconsistent, Missing Lineage, Non-analyzable, and Insecure.

  • Redundant: A Person’s address data may be redundant across data sources. So, the collection of data from these multiple data sources will result in duplicates.
  • Incomplete: A Person’s address record may not have Pin Code (Zip Code) information. There could also be cases where the data may be structurally complete but semantically incomplete.
  • Inaccurate: A Person’s address record may have the wrong city and state combination (E.g., [City: Mumbai, State: Karnataka], [City: Salt Lake City, State: California])
  • Inconsistent: A Person’s middle name in one record is different from the middle name in another record. Inconsistency happens due to redundancy.
  • Missing Lineage (and Provenance): A Person’s address record may not reflect the current address as the user may not have updated it. It’s an issue of freshness.
  • Non-analyzable: A Person’s email record may be encrypted.
  • Insecure: A Person’s bank account number is available but not accessible due to privacy regulations.

The opposite of Dirty is Clean. Cleansing data is the art of correcting data after it is collected. Commonly used techniques are enrichment, de-duplication, validation, meta-information capture, and imputation.

  1. Enrichment is a mitigation technique for incomplete data. A data engineer enriches a person’s address record by adding country information by mapping the (city, state) tuple to a country.
  2. De-Duplication is a mitigation technique for redundant data. The data system identifies and drops duplicates using data identities. Inconsistencies caused by redundancies require use-case-specific mitigations.
  3. Validation is a mitigation technique that applies domain rules to verify correctness. An email address can be verified for syntactical correctness by using a regular expression (\A[\w!#$%&’+/=?{|}~^-]+(?:\.[\w!#$%&'*+/=?{|}~^-]+)@(?:[A-Z0-9-]+.)+[A-Z]{2,6}\Z). Data may be accepted or rejected based on validations.
  4. Lineage and Provenance capture is a mitigation technique for data where source or freshness is critical. An image grouping application will require meta-data about an image series (video) collected like phone type and captured date.
  5. Imputation is a mitigation technique for incomplete data (data with information gaps due to poor collection techniques). A heartrate time-series data may be dirty with missing data in minutes 1 and 12. Using data with holes may lead to failures, so a data imputation may use the previous or next value to fill the gap.

These are cleansing techniques to reduce data dirtiness after data is collected. However, data dirtiness originates at creation time, collection time, and correction time. So, a data cleansing process may not always result in non-dirty data.

A great way to start with data quality is to describe the attributes of good quality data and related measures. Once we have a description of good quality data, incrementally/iteratively use techniques like CAPA (corrective action, preventive action) with a continual quality improvement process. Once we are confident about data quality given current measures, the data engineer can introduce new KPIs or set new targets for existing ones.

Example: A research study requires collecting stroke imaging data. A description of quality attributes would be:

Data Quality AttributeDescription
Data Lineage & Provenance– Countries: {India}
– Imaging Types: {CT}
– Source: {Stroke Centers, Emergency}
– Method – Patient Position: supine
– Method – Scan extent: C2-2-vertex
– Method – Scan direction: caudocranial
– Method – Respiration: suspended
– Method – Acquisition-type: volumetric
– Method – Contrast: {Non-contrast CT, PCT with contrast}
RedundancyMultiple scans of the same patient are acceptable but need to be separated by one week.
CompletenessEach imaging scan should be accompanied by a radiology report that describes these features of the stroke:
– Time from onset: { early hyperacute (0-6H), late hyperacute (6-24H), acute (1-7D), sub-acute (1-3W), chronic (3W+) }
– CBV (Cerebral Blood Volume) in ml/100g of brain tissue
– CBF (Cerebral Blood Flow) in ml/min/100g of brain tissue
– Type of Stroke: {Hemorrhagic-Intracerebral, Hemorrhagic-subarachnoid, Ischemic-Embolic, Ischemic-Thrombotic}
AccuracyThree reads of the image by separate radiologists to circumvent human errors and bias. Anonymized Patient history is sent to the radiologist.
Security and PrivacyPatient PII is not leaked to the radiologist interpreting the result or the researcher analyzing the data.
Data Quality Attributes

As you can see from the table of attributes for CT Stroke imaging data, the quality description is data-specific and use-specific.

Data engineers compute attribute-specific metrics using data attribute descriptions on a data sample to measure overall data quality. These attribute descriptions are the N* to pursue excellence in data quality.

Summary: The creation, collection, and correction improve over some time when measured using criteria. There will always be data quality blind spots and leakages. Hence, data engineers report data quality on a grayscale with multiple attribute-specific metrics.

Streaming vs. Messaging

We already have pub/sub messaging infrastructure in our platform. Why are you asking for a streaming infrastructure? Use our pub/sub messaging infrastructure” – Platform Product Manager

Streaming and Messaging Systems are different. The use-cases are different.

Both streaming and messaging systems use the pub-sub pattern with producers posting messages and consumers subscribing. The subscribed consumers may choose to poll or get notified. Consumers in streaming systems generally poll the brokers, and the brokers push messages to consumers in messaging systems. Engineers use streaming systems to build data processing pipelines and messaging systems to develop reactive services. Both systems support delivery semantics (at least once, exactly once, at most once) of the messages. Brokers in streaming systems are dumber than messaging systems that build routing and filtering intelligence in the brokers. Streaming systems are faster than messaging systems due to a lack of routing and filtering intelligence 🙂

Let’s look at the top three critical differences in detail:

#1: Data Structures

In streaming, the data structure is a stream, and in messaging, the data structure is a queue.

Queue” is FIFO (First In First Out) data structure. Once a consumer consumes an element, it is removed from the queue, reducing the queue size. A consumer cannot fetch the “third” element from the queue. Queues don’t support random access. E.g., A queue of people waiting to board a bus.

Stream” is a data structure that is partitioned for distributed computing. If a consumer reads an element from a stream, the stream size does not reduce. The consumer can continue to read from the last read offset within a stream. Streams support random access; the consumer may choose to seek any reading offset. The brokers managing streams keep the state of each consumer’s reading offset (like a bookmark while reading a book) and allow consumers to read from the beginning, the last read offset, a specific offset, or the latest. E.g., a video stream of movies where each consumer resumes at a different offset.

In streaming systems, consumers refer to streams as Topics. Multiple consumers can simultaneously subscribe to topics. In messaging systems, the administrator configures the queues to send messages to one consumer or numerous consumers. The latter pattern is called a Topic used for notifications. A Topic in the streaming system is always a stream, and it’s always a queue in a messaging system.

Both stream and queue data structures order the elements in a sequence, and the elements are immutable. These elements may or may not be homogenous.

Queues can grow and shrink with publishers publishing and consumers consuming, respectively. Streams can grow with publishers publishing messages and do not shrink with consumers consuming. However, streams can be compacted by eliminating duplicates (on keys).

#2: Distributed (Cluster) Computing Topology

Since a single consumer consumes an element in a queue in a load-balancing pattern, the fetch must be from the central (master) node. The consumers may be in multiple nodes for distributed computing. The administrator configures the master broker node to store and forward data to other broker nodes for resiliency; however, it’s a single master active-passive distributed computing paradigm.

In the notification (topic) pattern, multiple consumers on a queue can consume filtered content to process data in parallel. The administrator configures the master node to store and forward data to other broker nodes that serve consumers. The publishers publish to a single master/leader node, but consumers can consume from multiple nodes. This pattern is the CQRS (Command Query Responsibility Segregation) pattern of distributing computing.

The streaming pattern is similar to the notification pattern w.r.t. distributed computing. Unlike messaging, partition keys break streams into shards/partitions, and the lead broker replicates these partitions to other brokers in the cluster. The leader election process selects a broker as a leader/master for a given shard/partition, and shard/partition replications serve multiple consumers in the CQRS pattern. The consumers read streams from the last offset, random offset, beginning, or latest.

If the leader fails, either a passive slave can take over, or the cluster elects a new leader from existing slaves.

#3: Routing and Content Filtering

In messaging systems, the brokers implement the concept of exchanges, where the broker can route the messages to different endpoints based on rules. The consumers can also filter content delivered to them at the broker level.

In streaming systems, the brokers do not implement routing or content filtering. Consumers may filter content, but utility libraries in the consumer filter out the content after the broker delivers the content to the consumer.

Tabular Differences View

CategoryStreamingMessaging
Support Publish and Subscribe ParadigmYesYes
Polling vs. NotificationPolling by ConsumersNotification by Brokers to consumers
Use CaseData Processing PipelinesReactive (micro)services
Delivery Semantics Supportedat-most-once
at-least-once
exactly-once
at-most-once
at-least-once
exactly-once
Intelligent BrokerNoYes
Data StructureStreamQueue
PatternsCQRSContent-Based Routing/Filtering
Worker (LB) Distribution
Notification
CQRS
Data ImmutabilityYesYes
Data RetentionYes. Not deleted after delivery.No. Deleted after delivery.
Data compactionYes. Key de-duplication.N/A
Data HomogeneityHeterogenous by Default. Supports schema checks on data outside the broker.Heterogenous by Default.
SpeedFaster than MessagingSlower than Streaming
Distributed Computing TopologyBroker cluster with single master per stream partition and consumers consuming from multiple brokers with data replicated across brokersBroker cluster with single master per topic/queue. Active-passive broker configuration for the load-balancing pattern. Data replicated across brokers for multiple consumer distribution.
State/MemoryBrokers remember the consumers’ bookmark (state) in the streamConsumers always consume from time-of-subscription (latest only)
Hub-and-Spoke ArchitectureYesYes
Vendors/Services (Examples)Kafka
Azure Event Hub
AWS Kinesis
RabbitMQ
Azure Event Grid
AWS SQS/SNS
Domain ModelA stream of GPS positions of a moving carA queue to buy train tickets
Table of Differences between Streaming/Messaging Systems

Visual Differences View

Summary

Use the right tool for the job. Use messaging systems for event-driven services and streaming systems for distributed data processing.

Data Batching, Streaming and Processing

The IT industry likes to treat data like water. There are clouds, lakes, dams, tanks, streams, enrichments, and filters.

Data Engineers combine Data Streaming and Processing into a term/concept called Stream Processing. If data in the stream are also Events, it is called Event Stream Processing. If data/events in streams are combined to detect patterns, it is called Complex Event Processing. In general, the term Events refers to all data in the stream (i.e., raw data, processed data, periodic data, and non-periodic data).

The examples below help illustrate these concepts:

Water Example:

Let’s say we have a stream of water flowing through our kitchen tap. This process is called water streaming.

We cannot use this water for cooking without first boiling the water to kill bacteria/viruses in the water. So, boiling the water is water processing.

If the user boils the water in a kettle (in small batches), the processing is called Batch Processing. In this case, the water is not instantly usable (drinkable) from the tap.

If an RO (Reverse Osmosis) filtration system is connected to the plumbing line before the water streams out from the tap, it’s water stream processing with filter processors. The water stream output from the filter processors is a new filtered water stream.

A mineral-content quality processor generates a simple quality-control event on the RO filtered water stream (EVENT_LOW_MAGNESIUM_CONTENT). This process is called Event Stream Processing. The mineral-content quality processor is a parallel processor. It tests several samples in a time window from the RO filtered water stream before generating the quality control event. The re-mineralization processor will react to the mineral quality event to Enrich the water. This reactive process is called Event-Driven Architecture. The re-mineralization will generate a new enriched water stream with proper levels of magnesium to prevent hypomagnesemia.

Suppose the water infection-quality control processor detects E-coli bacteria (EVENT_ECOLI), and the water mineral-quality control processor detects low magnesium content (EVENT_LOW_MAGNESIUM_CONTENT). In that case, a water risk processor will generate a complex event combining simple events to publish that the water is unsuitable for drinking (EVENT_UNDRINKABLE_WATER). The tap can decide to shut the water valve reacting to the water event.

Water Streaming and Processing generating complex events

Data Example:

Let’s say we have a stream of images flowing out from our car’s front camera (sensor). This stream is image data streaming.

We cannot use this data for analysis without identifying objects (person, car, signs, roads) in the image data. So, recognizing these objects in image data is image data processing.

If a user analyses these images offline (in small batches), the processing is called Batch Processing. In the case of eventual batch processing, the image data is not instantly usable. Any events generated from such retrospective batch processing are too late to react.

If an image object detection processor connects to the image stream, it is called image data stream processing. This process creates new image streams with enriched image meta-data.

If a road-quality processor generates a simple quality control event that detects snow (EVENT_SNOW_ON_ROADS), then we have Event Stream Processing. The road-quality processor is a parallel processor. It tests several image samples in a time window from the image data stream before generating the quality control event.

Suppose the ABS (Anti-lock Braking Sub-System) listens to this quality control event and turns on the ABS. In that case, we have an Event-Driven Architecture reacting to Events processed during the Event Stream Processing.

Suppose the road-quality processor generates snow on the road event (EVENT_SNOW_ON_ROAD), and a speed-data stream generates vehicle speed data every 5 seconds. In that case, an accident risk processor in the car may detect a complex quality control event to flag the possibility of accidents (EVENT_ACCIDENT_RISK). The vehicle’s risk processor performs complex event processing on event streams from the road-quality processor and data streams from the speed stream. i.e., by combining (joining) simple events and data in time windows to detect complex patterns.

Data Streaming and Processing generating complex actionable events

Takeaway Thoughts

As you can see from the examples above, streaming and processing (Stream processing) is more desired than batching and processing (Batch processing) because of actionable real-time event generation capability.

Data engineers define data-flow “topology” for data pipelines using some declarative language (DSL). Since there are no cycles in the data flow, the pipeline topology is a DAG (Directed Acyclic Graph). The DAG representation helps data engineers visually comprehend the processors (filter, enrich) connected in the stream. With a DAG, the operations team can also effectively monitor the entire data flow for troubleshooting each pipeline.

Computing leverages parallel processing at all levels. Even with small data, at the hardware processor level, clusters of ALU (Arithmetic Logic Unit) process data streams in parallel for speed. These SIMD/MIMD (Single/Multiple Instruction Multiple Data) architectures are the basis for cluster computing that combines multiple machines to execute work using map-reduce with distributed data sets. The BIG data tools (E.g., Kafka, Spark) have effectively abstracted cluster computing behind common industry languages like SQL, programmatic abstractions (stream, table, map, filter, aggregate, reduce), and declarative definitions like DAG.

We will gradually explore big data infrastructure tools and data processing techniques in future blog posts.

Data stream processing is processing data in motion. Processing data in motion helps generate real-time actionable events.

Career

People have different balanced expectations from themselves at work: money, technology, networking, challenging project, innovative project, travel, title, equities, work-life balance, job satisfaction, exposure, health, happiness, social responsibility, job type, startup, strategy, public image, leadership, and many more. At different points in their career, these parameters would be stacked differently, and rightly so.

In this blog post, I am sharing good career advice that I received for my career and growth.

Strive for excellence, use your strengths

Pursuit to excellence is like a pursuit of happiness. The bar keeps going up. It’s not the destination but the more meaningful journey. I had the privilege to work with leaders that would ask me to focus on my strengths – demonstrated and non-demonstrated (i.e., potential). Acknowledge and improve areas that you are weak, but not at the cost of your strengths.

Never let anybody dent your confidence. Always create value for your customers and business.

Seek out your dream job/role

It’s deliberately reaching out for discomfort. Doing a job that you are good at is great for the employer and not necessarily suitable for you.

You may be good in engineering but seek discomfort in project management. You may be good at project management but seek discomfort in product management. You may be good at product management but seek discomfort in engineering.

Lateral is the way to grow. Lateral is the way up.

Great leaders encourage you to apply to jobs/roles that give you another milestone to cherish in your life. Mediocre managers leverage your strengths only for the current job/role.

Posted job descriptions are never a good representation of the role demands. Always “talk” to the hiring leader.

Specialize or Diversify? Do it well

If you desire to seek specialization – go after it. Specialization in any subject requires you to spend significant time to acquire the competency/skill and practice. Don’t spread yourself too thin; pick an area and go deep.

If you desire to seek diversification – go after it. Diversification will require that you build networks and teams; you rely on others (in your network or group) but remain accountable. Connect, listen (not hear), and act.

If you desire specialization and diversification – go after it. Some people have successfully navigated both.

Some started with diversification and then specialized. Others began with specialization and then diversified. There is no career recipe; make your recipe.

Performance, Image, Exposure

Jobs/Roles demand performance, and growth requires exposure to new people and projects. Volunteer to work on initiatives that give you more exposure (work/life). Volunteering to do more when you have a demanding job/role requires you to stretch, work smarter (manage time), and delegate.

Stress: Circle of Control, Influence, & Concern

Stress is good for growth.

It’s widely believed that as you go up a corporate ladder, your circle of control gets bigger. However, the reality is that your circle of influence gets more significant, and the circle of control relatively shrinks.

The circles of concern & influence are the primary driver of “stress” in job/roles. It’s also critical to understand that your circle of control could be the primary driver of “stress” for your peers and teams.

If you understand your abilities to control or influence the outcome, you don’t need to eat stress for breakfast.

Treat People Like People

Treat others like you would like them to treat you.

Don’t treat people like “resources.” People are not like a CPU with a fixed capacity.

People have infinite capacity, and capacity increases when they are motivated. If they find inspiration, then you get unbounded capacity.

People get burnt out; they need time to relax, so do you.

Health, Family, Work

If you lose health, you cannot take care of your family or do an excellent job at work. If you lose work, then you cannot support your family. Family is always there for you – in grief and happiness. You can’t afford to lose unconditional family love. So, the priority order has to be – health, family, and work.

Exercise for 45 minutes every day, even if that is a simple morning brisk walk. This “me” time helps you recharge.

Don’t treat your work colleagues like your family. Treat them like your team. Don’t treat your family like your team.

Final Thoughts

Choose to ignore advice that does not make sense, and consider it your common sense to use the advice that makes sense. Please don’t make people your role models; choose to cherish their actions or ideas that made them role models.

Triage Prioritization

In the last blog post, we talked about balanced prioritization. It’s excellent for big-picture-type things like product roadmaps, new technology introduction, and architecture debt management. The balanced prioritization technique helps build north stars (guiding stars) and focus; however, something else is needed when the rubber hits the road. That something is triage prioritization, where people have to “urgently” prioritize to optimize the whole.

Agile pundits like to draw similarities, and this table is some food for thought:

TypesOther Names AOther Names BOther Names C
Balanced PrioritizationDeliberate
Prioritization
BIG ‘P’ Prioritization“Strategic” Prioritization
Triage PrioritizationJust-in-time (JIT) PrioritizationSmall ‘p’ prioritization“Tactical” Prioritization
Synonyms (Kind-of Similar)

The table below talks about some desired properties of the two:

Expected PropertiesBalanced PrioritizationTriage Prioritization
ValuesValues multiple viewpoints (customer, leadership, architect, team, market) to prioritize. Consensus driven.Values newly available learnings/insights to prioritize items. Values collaboration (disagree-and-commit) to consensus.
PerspectiveStrategic, Long-termTactical, Short-term
Driven-asProcess-first, driven by peoplePeople-first process, driven by people
Time-takenHigh. Analysis-biased.

It’s an involved process to get feedback, consolidate, review, discuss, and reach consensus.
Short. Action-biased.

It’s okay to get it wrong but wrong to waste time in analysis. Values failure to analysis.
AssumptionsLargely Independent of any constraints and uncertainties. Weighted-Risk-Assessment approach. Strives in uncertainties. Uses a mindset of maximizing returns – “effort should not get wasted” and “optimize the whole.”
ToolsRISE Score, Prioritization Matrix, Eisenhower MatrixCommunication (say, listen); and a Kanban to track WIP of priority items.
Properties of prioritization

Okay, so the story 🙂 This story is a personal one:

I have spent most of my career in digitalizing healthcare. The most memorable of this journey was when I spent my time digitalizing emergency care. Emergency care is a hospital department where patients can walk in without an appointment and are assured that they will not be turned away. So, the department gets patients that cannot afford care and patients that need urgent attention. The emergency department is capacity constrained and has to serve a variety of workloads. Most days, it’s just a single emergency, and then there are days where there is a public health emergency or people wheeling in from a significant accident nearby. The department cannot be staffed to handle the worst crisis but is reasonably staffed to handle the typical workload. Then the worst happens – a significant spike in workload – say patients are coming in from a train accident. Some are alive, some are barely alive, and some are dead on arrival. You cannot use a consensus-driven approach to sort these people into an urgent-important matrix or derive a RISE score for each patient. Emergency departments use a scoring system (ESI) where one person identifies the score – “Triage” Nurse; and the team collaborates in the absence of consensus.

I have seen triage nurses shut down the computers, and move to the whiteboard. Computers are great to organize, but organizing is a slow computerized process. Need something faster – whiteboards & instant communication. They don’t need to know the names of the people (patient on bed-2 is a good name). If someone needs help, they shout, and the person who can help offers help.

Commonly heard statements in an emergency care room during high workload emergency:

“That patient is not likely to survive. The damage is beyond repair. Move on, and stop wasting time on this patient.”

“We have only one ventilator. Use this on that younger fellow and not the elderly. We can save more life-hours.”

“That patient is screaming and in pain and will survive if we don’t attend her for the next 45 minutes. She has a fracture. Focus on this patient; our effort here and now can save her leg. We will get back to the screaming one. Somebody shut her up.”

Automated Computerized Workflow can’t respond well to such emergencies. This triage prioritization is a people-first process driven by people. People can increase their capacity – both individually and as a team – in emergencies. People are not like CPUs; they can work at 150% capacity. Capacity is measured not by the elapsed time of effort but outcomes achieved per unit of elapsed time.

End of the day, most people will be saved, and there would be an unfortunate loss. But the triage prioritization values maximizing life hours (optimizing the whole), effort not getting wasted, quick pivot when effort does not yield results (new insights to prioritize), and action bias. The responsibility and authority to prioritize is with the ER care team and stays in those closed doors: the best tools are collaboration and communication (shout, scream, listen, act). When the emergency is behind them, they will retrospect to improve their behavior in an emergency and request for infrastructure that could help them in the future (E.g., that second ventilator could have helped).

Shifting back to software development:

There are parallels in the software team – Tiger team to fix product issues, WAR Room to GO-LIVE, Support Team for 24×7 health of applications in operations during Black Friday, and many more.

While the examples above are like ER teams, triage prioritization also happens in scrum teams executing a well-defined plan. This triage prioritization is hidden from the work tracking boards. When engineers are pair-programming to get something done, many things can go wrong (blind spots) that must be prioritized and fixed. The team cannot fix everything in the sprint duration and carries over some debt. Big Bad debt gets on the boards (some debt remains in code: TODOs). Big Bad debt is prioritized with other big-ticket items using balanced prioritization.

Summary: The outcome quality is directly proportional to triage prioritization, a people-first process driven by people and works best with delegated responsibility and authority.

Some of my friends have argued with me that the story of “mom prioritizing her child’s homework” (last post) is triage prioritization and not balanced prioritization. My friends say, “Mom is the “Triage Nurse” collaborating with the child, and the time taken to prioritize is short, with learnings from execution fed back into the prioritization process.”. There is a grayscale (balanced-triage), and my only argument against the statement above is that mom is not working in a capacity-constrained environment of uncertainties. If she had to choose between a burning car and her child, the choice is obvious.

Balanced Prioritization

In an (agile) product development team, everybody has a say about the priority of the backlog features. However, only the product owner decides (vetoes). The product owner has to consider the customer’s perspectives, market trends, leadership vision, architecture enablers, technical debt, team autonomy, and most importantly, her biased opinions. She has to use some processes to prioritize and justify the priorities.

Let’s move from the corporate dimension to the family dimension. Here’s a story that every parent will relate:

Child: Mom – I have too many things to do. I have got Math, English, Hindi, and Kannada homework to complete. Also, I have to prepare for my music exam before the following Monday. I want to watch the newly released “Frozen” movie – my friends have already watched it. I have a play-date today evening – I can’t skip it; I have promised not to miss it. This is too much to do.

Mom: Hey! I want to watch the “Frozen” movie with you too! Let’s do that after your music exam following Monday? I will book the tickets today.

Child: Ok. Yay!!

Mom: When is your homework due?

Child: English and Kannada are due today. Math and Hindi are due tomorrow.

Mom: Ok, let’s look at the homework. Oh, English and Hindi look simple; you have to fill in the blanks. Kannada is an essay that you have to write about “all homework, and no play makes me sad.” So, you will need to spend some time thinking through that 🙂 Math seems to be several problems to solve; let’s do this piecemeal.

Child: Can I start with Math? I love to solve math problems.

Mom: I know. It’s fun to solve math. Let’s just get done with the English first – its due today and simple.

Child: Ok.

<10 minutes later>

Child: Done, Mom! Can I do Hindi? That is simple too.

Mom: Hindi is not due today. It’s easy to get that done tomorrow. Let’s start with your Kannada essay and do some math today.

<30 minutes later>

Child: I have been writing this essay for 30 minutes. It’s boring.

Mom: Ok, solve some math problems then.

<30 minutes later>

Mom: Having fun? Time to complete the Kannada essay; you have to submit it today.

Child: Ok – Grrr.

<30 minutes later>

Child: Done! Phew. I have one more hour before my friend comes. Today’s homework is done. I will loiter around now.

Mom: No, you should practice for your music exam. Only practice makes it perfect. Why don’t you practice for the next one hour, and then play? After your play-date, you can do some more math; you like to do it anyway. However, 8:00 PM is sleep time.

Child: I am tired. Can I loiter around for 15 minutes and then practice music?

Mom: Ok – I will remind you in 15.

<15 minutes later>

Mom: Ready to practice music?

Child: Grr, I was enjoying doing nothing and loitering around.

Mom: Quickly, now finish your music practice before your friend comes. Or. I will ask her to come tomorrow.

Child: Mommy! How can you do that! Ok – I will practice music.

<45 minutes later>

Friend: Hello! Let’s play.

<2 hours later>

Mom: Playtime is over, kids. Have your dinner, and then complete some math; 8:00 PM is sleep time. Remember.

Child: Ok. The playtime was too short.

<completes dinner, completes some more math problems, sleeps>

Mom: Hey, good morning. No school today. It’s Saturday. Finish your remaining homework, and you can play all day. You can start with Hindi or Math. Your choice.

Child: I will do Hindi first. It’s simple. Then math.

<20 minutes later>

Child: Done. Now, I can play, loiter, and do anything?

Mom: Yes, let me know when you want to do one more hour of music practice. We will do that together.

Child: Ok, Mom. Love you!

<After a successful music exam on Monday>

Mom: Let’s watch Frozen.

The story above is balanced prioritization. Mom balances work-play, urgent-important, big-small effort, short-long term, like-dislike, carrot-stick, confidence level, reach, and impact. While she considers priorities, she also allows her child to make some decisions, delegating authority.

Balanced prioritization is an exercise for execution control. There is no use of prioritization without a demand for execution control.

When mom comes to the corporate world, the n-dimensional common sense transforms to the 2-dimensional corporate language: RISE Score, MoSCoW, Eisenhower’s Time Management Matrix, and Prioritization ranking matrix.

Top Prioritization Techniques

Balanced prioritization is part of the backlog grooming process and extends to sprint planning, where the team slices the work items to fit in a sprint boundary.

Balanced prioritization is a continuous process.

In our story, mom did not have to deal with capacity constraints. However, in the corporate world, there are capacity constraints that push for more aggressive continuous real-time prioritization. I will share a (healthcare) story in my next blog.