First appeared on e27 - By Michele Tucci, Chief Product Officer.
Though tech companies in Asia seeking to hire data scientists do mention soft skills in their job descriptions, the one most crucial to the role yet rarely even listed is creativity, given how much our world is changing in the extreme.
In 2015, DOMO produced an infographic on how much activity happens every minute across some of the world’s most popular products, including 4,166,667 likes on Facebook, 110,040 calls on Skype, and 51,000 app downloads on Apple devices. In this kind of data-rich era, data scientists — especially those in Asia where conformity can be cultural — need the creativity and self-awareness to question even their most entrenched beliefs: What can be data?
In many fields, the answer to this question is strict. In consumer credit scoring, the industry to which I belong, data traditionally refers to any information related to a consumer’s repayment history or other financial benchmarks that can help banks, credit card companies, and other institutions determine their creditworthiness. This view is a limited one, and it has social consequences.In all corners of the globe — from Africa and South America to Indonesia, Vietnam, and the Philippines in Southeast Asia — there are millions of unbanked people, especially in emerging markets. Because they have no bank accounts or other financial instruments, they are part of the 4.5 billion people worldwide who have little to no credit history. And since banks, credit card companies, and other financial institutions rely on this data to establish a person’s creditworthiness, these people are excluded from finance and the social mobility it may bring. They cannot avail of a loan to purchase a home, fund a college education, or start a small business. In effect, their lack of credit is a social shackle: It dooms them to where they are.
You could overlook this issue if all 4.5 billion people who have little to no credit were a high risk to default on their loans. That, of course, is not the case. Among this group, there are hundreds of millions of people who are likely to responsibly pay back any loans they take. The key to distinguishing these borrowers is in alternative data (a term sure to surpass big data in buzz for those living in emerging markets where data is just as valuable yet harder to come by).
What kind of data is the most useful?
Alternative data is a catch-all term that refers to non-traditional information.
In the space of credit scoring, the most common are social media data, psychometric data, telco data, and device data. Even within these broad classifications, some are more accurate, efficient, and preferable to others, as will also hold true for the types of alternative data available in your own field of tech.
Founders and the data scientists on their team need to vet the available forms of alternative data to determine which is best for their business needs. We found, for example, that social media data can easily be faked, and when it does belong to a real user, the process of mining it is invasive.
Psychometric data, for its part, is both invasive and inefficient: Consumers will spend more than 30 minutes answering questions that try to magically divine their attitudes and values.
Integration of telco data is cumbersome and must be done on-site, even though it may not be worth the effort: It is still less accurate than the best source of alternative data.
If you’re reading this article from a smartphone, you have the answer in hand — device data is the most reliable behavioral data, easiest to integrate, and the least invasive (data is anonymised).
Because device data is the most superior form of alternative data in consumer credit scoring, our team at CredoLab uses over 50,000 of these data points to determine the creditworthiness of people who would otherwise be unable to avail of finance due to their lack of a credit history.
An interesting look into user behaviour
Some of the indicators of a reliable borrower may surprise you, such as minimal time spent surfing the web at night, a low number of long duration incoming calls during work hours, and even, perhaps inexplicably, a high ratio of missed calls following successful outgoing ones.
Though some of these indicators may raise a few eyebrows, the results speak for themselves: CredoLab has helped banks and non-bank lenders penetrate markets where traditional credit scoring solutions would not work, enabling them to automate the lending process, minimise risk, and maintain an almost instantaneous time-to-yes.
Tech companies in other spaces who tap data scientists creative enough to look to alternative data will surely experience a similar virtuous cycle: The increase in data will not only improve the accuracy of their models but also encourage them to make bolder leaps in creative experimentation and design.