How to deliver on Data Science?

I was recently invited to give a virtual keynote at Suneratech’s Digital Acceleration Summit (DAS) 2020. Delivering on promise of data science is more about culture, and less about data, algorithms or technology. Here are some of my ideas on what is very often the missing ingredient. Let me know what you think.

DAS 2020: https://youtu.be/G4rwhIUZQ1k

Other awesome talks at https://www.suneratech.com/digital-acceleration-summit-2020/#event-highlights

Herd Immunity for COVID

After 6-months of COVID-19, we are starting to feel this is a long-drawn-out affair. Naturally, thoughts turn to how we can adapt to COVID as a way of life. We should do that, and look at ways to keep everyone safe while vaccines are developed and distributed to sufficient volumes to allow for “herd immunity.”

Herd immunity occurs when a sufficient number of people in the population have antibodies (either through the disease itself or vaccines) to stem the reinfection rate. [1] The percentage of the population that needs to be immunized or have COVID antibodies varies based on the R0 number for a disease.

“R0 represents the average number of people infected by one infectious individual. If R0 is larger than 1, the number of infected people will likely increase exponentially, and an epidemic could ensue. If R0 is less than 1, the outbreak is likely to peter out on its own.” [2]

An R0 of 2 means one infectious person typically infects two additional individuals. An R0 of 15 means, 15 people are usually infected by a contagious person. The higher the R0 number, the more aggressively the population needs to be immunized to stem to spread and build up herd immunity. For example, Measles has R0 of 12-18, which requires 92-94% of the population to be immune to Measles to stop the spread.[1] That is why it is critical to get babies vaccinated for measles. Similarly, Polio has an R0 of 2-15 and can require 50% – 90% immunization rates.

The R0 for COVID-19 when it struck Wuhan ranged from 1.4 – 5.7.[2] When COVID-19 hit Florida around March 2020, the R0 was around 3.2. We all responded by taking precautions – social distancing, masks, closing bars, limiting numbers of people in restaurants, shifting work and school to be virtual. As a result, the Rt (Reinfection rate now in September 2020) is hovering around 0.94. [3] The drop in reinfection rate down to 0.94 is due to human behavior changes rather than a fundamental shift in the virus’ ability to infect people. As Florida opens loosen up the restrictions on bars, restaurants, and bars, it will increase reinfections.

Let us play out a simple scenario of what it would be like to relax all restrictions and go back to pre-COVID behaviors. R0 for Florida was around 3.2. We can approximate that this would need about 65% of the population to have COVID antibodies. [1] Now vaccines are being developed, tested or produced, so currently, the only way someone can have antibodies for COVID is by actually catching it.

Since we have had 695,879 cases of COVID reported so far in Florida, are we close to attaining herd immunity? [4]

No.

A 2019 estimate for Florida’s population puts it around 21.48 million.[5] So 65% of the population means we need 19.96 million people infected with COVID in Florida. With 695,879 cases so far, we would need 28.68x more people infected with COVID to get close to start establishing herd immunity. In allowing for the virus’s natural spread, we would have three responses at a minimum for people. First, some people will not be affected and not get sick. Secondly, some people will get sick and recover. Finally, some people will get sick and die. (I’m ignoring others who may die because of side-effects of not having hospital beds through the explosion in the number of cases).

So far, with 695,879 cases, we have had 13,914 deaths in Florida. Now, suppose we assume (very simply) that we would have a similar death ratio, as we scale up to a higher number of infections to reach 19.96 million people infected. I suspect the death rates will be higher, as there will be dramatic flow-on effects because the health care in Florida will struggle to cope with an increase in volume, but let us assume that it would scale linearly for the sake of argument. We would end up with 399,097 deaths just in the state of Florida. So far, the USA has had 203,479 deaths across the country. If we scale this for the country, we would have 9.5 million deaths in the USA, which is ten times the total world deaths attributed to COVID so far. To put it in perspective, heart disease, which is the biggest killer in the USA, kills 655,381 people per year. [6] We are talking about 14 years of heart disease-related deaths in the USA just with COVID.

If we assume that official figures capture only 1 in 10 cases, we would expect the results to scale by 2.9 times, leading to about 40,000 deaths in Florida and close to 600,000 deaths in the USA. Even on the lower-end, we would have 3x more deaths in the USA than the deaths so far and the same number of total annual deaths attributed to heart disease.

There is a broad range here between 600,000 and 9.5 million deaths. I concede these are very rough calculations, and I’m sure there are much more precise by controlling for different age-groups and factors. The bottom line, however, is to ask – how many unavoidable deaths are acceptable through COVID?

COVID is here to stay for a while. We do have some very knowledgable, intelligent and hard-working people working on developing vaccines and improving treatments. Even when we have successful vaccine trials, manufacturing, distribution and administration of vaccines will take time. The largest vaccine producer in the world estimated that it would take 4-5 years to vaccinate the world. [7] Until then, we really must rely on human behavior to reduce the spread of COVID and, consequentially, the deaths. Face-masks and social distancing have been vital in reducing the reinfection rate down from 3.2 to below 1. We must continue doing this for a while yet.

References:

[1] https://theconversation.com/amp/what-is-herd-immunity-and-how-many-people-need-to-be-vaccinated-to-protect-a-community-116355

[2] https://www.the-scientist.com/features/why-r0-is-problematic-for-predicting-covid-19-spread-67690

[3] https://rt.live/us/FL

[4] https://www.nytimes.com/interactive/2020/us/florida-coronavirus-cases.html

[5] https://www.census.gov/quickfacts/FL

[6] https://www.cdc.gov/heartdisease/facts.htm#:~:text=About%20655%2C000%20Americans%20die%20from,1%20in%20every%204%20deaths.&text=Heart%20disease%20costs%20the%20United,year%20from%202014%20to%202015.&text=This%20includes%20the%20cost%20of,lost%20productivity%20due%20to%20death

[7] https://www.businessinsider.com/covid-vaccine-wont-reach-everyone-for-four-years-serum-institute-2020-9

Hello TracFone

Well, Mexico City was great fun. It was nice to take a few weeks off, and then take the family to Mexico City. If you haven’t been to Mexico City, I highly recommend it. It is a city at the heart of an ancient civilization, a rich recent history, a vibrant city full of friendly people and food is AMAZING!! The hike up Teotihuacán (Temple of the Sun) with my son will be a moment I’ll remember for a long time. It was, however time to answer the call (pun intended).

Last week I joined TracFone Wireless Inc as Head of Data Science. This means two things. Firstly I am really excited about using data science to deliver an amazing customer experience across each of the nine (yes, nine!) TracFone brands. This is going to be fun because the team is passionate and has the drive to move quickly towards building the future of customer experience.

Secondly, it means moving to South Florida – the land of sun, beach, Everglades, and of course alligators. Sad as it is to leave the Pacific North West, which has been our home for the last eight years, a new adventure awaits in the sun.

Hollywood Beach FL

Hollywood Beach, FL

 

 

Good-bye Teradata

At the end of March, I bid farewell to my good friends at Teradata. I have had the pleasure of working with some wonderful people, but time had come for pursuing a new adventure. I wish my old team the best of luck and look forward to the time when our paths cross again.

I will announce the next adventure soon, but for the moment it is time to relax, reflect and read.

For those curious, I’m currently reading Antifragile by Nassim Taleb and loving it.

The image is of the Cherry blossoms on the last day of March blooming in the quadrangle of the University of Washington, inspiring the following Haiku –

Cherry blossoms bloom,
with splendor to welcome spring,
witness transient things.

 

Impactful Data Scientists

In 2012, Davenport and Patil’s article in Harvard Business Review titled Data Scientist: The Sexiest Job of the 21st Century, raised the profile of a profession that had been naturally evolving in the modern computing era – an era where data and computing resources are more abundantly and cheaply available than ever before. There was also a shift in our industry leaders adopting a more open and evidence-based approach to guiding the growth of their business. Brilliant data scientists with machine learning and artificial intelligence expertise are invaluable in supporting this new normal.

While there are different opinions on what defines a data scientist, as the leader of the Data Science Practice at Think Big Analytics, the consulting arm of Teradata, I expect data scientist on my team to embody specific characteristics. This expectation is founded on a simple question – Are you having a measurable and meaningful impact on the business outcome?

Any data scientist can dig into data, use statistical techniques to find insights and make recommendations for their business partners to consider. A good data scientist makes sure that the business adopts those insights and recommendations by focusing on the problems that are important to the company and making a compelling case grounded in business value. An impactful data scientist can iterate quickly, address a wide variety of business problems for the organization and deliver meaningful business impact swiftly by using automation and getting their insights integrated into production systems. Consequently, impactful data scientists more often answer ‘yes‘ to the question above.

So what makes a Data Scientist impactful? In my experience, they possess skillsets that I broadly characterize as that of a scientist, a programmer, and an effective communicator. Let us look at each of these in turn.

what_is_a_data_scientist_2.png

Firstly they are a scientist. Data scientists work in highly ambiguous situations and operate on the edge of uncertainty. Not only are they trying to answer the question, they often have to determine what is the question in the first place. They have to ask vital questions to the understand the context quickly, identify the root of the problem that is worth solving, research and explore the myriad of possible approaches and most of all manage the risk and impact of failure. If you are a scientist or have undertaken research projects, you would recognize these as traits of a scientist immediately.

In addition, data scientists are also programmers. Traditional mathematicians, statistician, and analysts who are comfortable using GUI-driven analytical workbenches that allow them to import data and build models with a few clicks often contest this expectation. They argue that they don’t need computer science skills since they are supported by (a) team of data engineers to find and cleanse their data, and (b) software engineers to take their models and operationalize them by re-writing them for the production environment. However, what happens when data engineers are busy, or the sprint backlog of IT department means the model that a data scientist has just found to make a company millions won’t make it to production for the next 6-9 months? They wait, and their amazing insights have no impact on the business.

Programming and computer science skills are essential for data scientists so that they are not ‘blocked’ by organizational constraints. A data scientist shouldn’t have to wait for someone else to find and wrangle the data they need, nor be afraid of getting their hands dirty with the code to ensure their models make it to production. It also means, data scientist do not become a bottleneck to their organization by automating their solutions for production or automatic reports. Given the highly distributed and large volume transactions in online, mobile and IoT applications means data scientists need to consider the design of their solution for scale. For example, will their real-time personalization model scale to the 100,000 requests per second for their company’s website and mobile app?

Finally, a data scientist should be an effective 2-way communicator. Not only should they empathize to understand the business context and customer needs, but also convey the value of their work in a manner that appeals to them. One of the hardest skill to master for some knowledgeable data scientists is often the ability to influence organizations without authority. A data scientist that goes around asserting that everyone should listen to them because he or she has data and insights without cultivating trust is likely to earn them the title of a prima donna and not achieve the impact that they can with those insights. Effective communication is relatable, precise and concise.

Data scientists with these three broad skillsets are in an excellent position to have a meaningful and measurable impact on the business outcomes, making them highly valuable to any organization. Of course, this list doesn’t talk about innate abilities like creativity, bias for action and a sense of ownership. Neither does it consider the organizational culture that may either support or hider their impact. I have focused on skills that can be developed through training and practice. In fact, these are essential elements to the growth and career paths for my team of brilliant and impactful data scientists at Think Big Analytics. 

Credits:

Hello, Think Big Analytics

A little over a month ago I left my role as the Chief Data Scientists for Big Data & Analytics Platform Team at Oracle. It was sad to say goodbye to some wonderfully talented people that I had the pleasure of working with, but change is an inevitable part of our lives. After enjoying a month off at my warmer and sunnier home in Sydney spent with family and friends, I feel energized about what is next.

I am humbled and excited about my new role as the Practice Director – Data Science & Analytics, Americas at Think Big Analytics. There are exciting developments in the world of artificial intelligence that makes it more important than ever for data scientists to understand the customer’s needs, reflect upon the wider context beyond those needs, and develop solutions that have a meaningful impact for the customer. I am looking forward to getting to know a talented team who is focused on the evolving needs of our customers and delivering impactful data science consulting services.

AI & ML – Lessons learnt and real-world challenges

Just before I flew back to Seattle, I gave a talk last week at my alma mater – School of Computer Science & Engineering at UNSW, Australia. It was great to see some familiar faces and meet some new ones that I hope feel more compelled to tackle some interesting problems in data science, machine learning (ML) and artificial intelligence (AI).

In this talk, I shared some the personal lessons that I learnt as part of building AI & ML solutions at companies like Amazon and Oracle. I also opened up about my fears of these technologies, as well as the challenges that the industry faces in delivering intelligent systems for the 99% (?) of businesses. You can find the slides from the talk (PDF) for the references and links that I mentioned. Just send an email to ( avishkar @ gmail dot com) with the subject “AI & ML” to get the password to the PDF.

The most important message that I wanted to impart to the room full of researchers, academics, and industry practitioners was how do we collectively address the shortage of skills needed to develop AI and ML solutions to the broad range of business problems beyond the top 1% of leading-edge tech companies. Education, standards and automated tools can help ensure a certain base level of competency in the application of AI & ML.AddressingSkillsShortage.jpg

The vast majority of the businesses out there are not Google, Amazon or Facebook, with deep pockets and years of R&D experience to tackle the challenge of applying AI and ML. Everyone from schools (i.e. universities) and industry responsible for growing this field must also develop standards and tools that ensure a certain level of quality is maintained for the solutions that we put into production. We have had standards when it comes to mechanical and civil engineering to ensure that things that can impact people’s lives and safety adhere to a certain quality standard. Similarly, we should also develop standards and encourage organizations to validate compliance with those standards when it comes to developing AI & ML solutions with far-reaching consequences.

BiasedDataBiasedModels.jpg

A simple and very personal example was that one of my own photos was rejected by the automated checks to verify that a passport photo complies with the requirements for visas. The fact that the slightly “browner” version of me (left) failed the check seems to suggest an inherent bias in the system due to the kind of data used to build the system. Funny but scary. How many other “brown” people have had their photos rejected by such a system?

Other examples would be Human Resource systems that identify potential candidates, suggests no-/hire decisions or recommends salary packages to new hires. If the system is trained on historical data and uses gender as a feature, is it possible that the system could be biased against women for high-profile or senior positions? Afterall historically women have been under-representative in senior positions. Standards and compliance verification tools can help us identify such biases, ensuring that data and models do not introduce biases that are unacceptable in a modern and equitable society.

Academics, researchers, and industry practitioners cannot absolve themselves of the duty of care and consideration when developing systems that have a broad social impact. Data scientists must think beyond the accuracy metric and the whole ecosystem in which the system operates.

Image Credit:

  • Modeling API by H Alberto Gongora from the Noun Project
  • education by Rockicon from the Noun Project
  • tools by Aleksandr Vector from the Noun Project
  • Checklist by Ralf Schmitzer from the Noun Project

Plant Science Initiative @ NC State University

In my role at Oracle, I get to work across many industries on some very interesting problems. One that I have been involved with recently is the collaboration between North Carolina (NC) State University and Oracle with NC State’s Plant Science Initiative.

In particular, we’ve been working with the College of Agriculture and Life Sciences (CALS) to launch a big data project that focuses on sweet potatoes. The goal is to help geneticists, plant scientists, farmers and industry partners in the sweet potato industry to develop better varieties of sweet potatoes, as well as speed up the pace with which research is commercialized. The big question is can we use the power of Big Data, Machine Learning, and Cloud computing to reduce the time it takes to develop and commercialize a new variety of sweet potato crop from 10 years to three or four years?

One of the well-known secrets to driving innovation is scaling and speeding up experimentation cycles. In addition, reducing the friction associated with collaborative research and development can help bring research to market more quickly.

My team is helping the CALS group to develop engagement models that facilitate interdisciplinary collaboration using the Oracle Cloud. Consider geneticists, plant science researchers, farmers, packers, and distributors of sweet potato being able to contribute their data and insights to optimize different aspects of the sweet potato production – sweet potato from the genetic sequence to the dinner plate.

I am extremely excited by the potential impact open collaboration between various stakeholders can mean for the sweet potato and precision agriculture industry.

More details at cals.ncsu.edu

It is a go for Amazon Go!

The super secret exciting project that I spent days and nights slogging over when I was at Amazon has finally been announced – Amazon Go. A checkout-less, cashier-less magical shopping experience in a physical store. Check out the video to get a sense of the shopping experience that simplifies the CX around the shopping experience. Walk in, pick up what you need and walk out. No line, no waiting, no registers.

I’m very proud of an awesome team of scientists & engineers covering software, hardware, electrical and optics that rallied together to build an awesome solution of machine learning, computer vision, deep learning and sensor fusion. The project was an exercise in iterative experimentation and continually learning, refining all aspects of the hardware, software as well as innovative vision algorithms. I personally was involved in 5 different prototypes and the winning solutions that ticked all the boxes more than 2 years ago.

I remember watching Jeff Bezos and the senior leadership at Amazon, playing with the system by picking and returning the items back to the shelves. Smiles and high-fives all around as the products were added and removed from the shopper’s virtual cart, with the correct quantity of each item.

Needless to say there is a significant effort after the initial R&D is done to move something like this to production, so it is not surprising that it has taken 2 years since then to get it ready for public. Well done to my friends at Amazon for getting the engineering solution over the line to an actual store launch for early 2017.

Photo Credit: Original Image by USDA – Flickr