Jan 19, 2023
Learn how using machine learning and artificial intelligence applied to dynamic, fresh and real-time data gives the ability for analysis and utilisation to obtain insights and create more accurate behavioural models.
We have all heard about cybercrime, either from the media or because someone close to us has been affected by it. In response to the COVID-19 pandemic, businesses developed their digital services in record time, technology transformed people's lives, and cybercrime and fraud grew. The use of cloud solutions, VPN connections, and remote access services increased with remote working, and on the other hand, cybercrime proliferated accordingly.
Fraud losses have increased dramatically in recent years. According to the Nilson Report, in the United States (US), fraud losses have grown from $10.09 billion in 2020 to $11.91 billion in 2021, increasing by 18.1%. In addition, the combined fraud losses for countries outside the US increased by 11.4% from $18.34 billion in 2020 to $20.43 billion in 2021. It is also estimated that in 2031, fraud losses could grow to $19.24 billion in the US and $47.22 billion globally.
Consequently, businesses recognise the importance of building a model to prevent fraud by establishing tighter controls and analysing as much data as possible to detect fraud more accurately and quickly.
However, before determining a model, it is imperative to understand how to use data to overcome fraud risks and how different fraud risk assessment models work.
Historical records can provide insight into the past, but they are not necessarily reliable indicators of the future. Most fraud cases that could occur have not occurred yet.
A probabilistic assessment model simulates future incidents that are likely to occur based on scientific evidence. Such risk assessments thus resolve the limitations of historical data. In contrast, a deterministic model treats the probability of an event as finite, where input values are known and outcomes are observed.
“A deterministic risk assessment model considers the impact of just one risk scenario. However, a probabilistic fraud risk assessment model considers every possible scenario, including their probability, and impact. In order to enhance the predictiveness of a model, businesses should use signals to identify exact behaviours that might trigger a rejection. This can be achieved, by using behavioural data, which offers highly predictive and contextual signals that can be leveraged as part of any automated, rules-based waterfall of verifications.”
By Michele Tucci, MD Americas and Chief Strategy Officer of credolab
Technological advancements have increased fraud risks, resulting in a loss of revenue and security for businesses, making early fraud detection even more critical.
With Machine Learning (ML) and Artificial Intelligence (AI), businesses can analyse data in real-time and set up patterns that block suspicious logins, identity theft, or fraudulent transactions. Historical data and blatantly obvious data misrepresentations help lenders detect suspicious user activity in the past, but alternative data provides a holistic view now. Alternative data is the set of information on behaviours, habits, interests and transactions obtained from non-traditional sources that a person carries out.
Behavioural data is one of the most popular alternative data sources for fraud detection and stopping identity theft. It provides businesses with decision-useful information to help them understand their customers more accurately. Decision-making is made easier when granular insights (user-level insight data) are uncovered through behavioural data, which helps assess customers’ fraud risk when they apply for a loan or credit card.
For instance, some red flags include hesitancy in application submission, time spent submitting the application, and copying and pasting information that users should know by heart.
It is essential to understand how the overall process works, from application to data collection and conversion into fraud insights.
To begin the process, customers need to fill in an application form. The data collection process also starts at this stage as data is collected while the customer types and interacts. Within seconds after submitting an application, metadata is uploaded onto credolab's secure cloud, and scores or insights can be obtained for businesses and financial institutions to use.
Data is collected from a web page or mobile device via a proprietary library of embedded code in the form of anonymised metadata and stored in a secure cloud environment. However, this is but one part of data collection. Here’s how it works as a whole:
Although collecting data may seem a simple task, it is actually a delicate balancing act that requires a number of considerations often with competing objectives.
“Collecting data requires lenders and banks to pay close attention to legal aspects related to data privacy and protection; marketing aspects related to enabling seamless and frictionless onboarding processes that eliminate unnecessary drop-off points; and technical aspects related to the quality of the data being collected and the speed of such collection”
By Dmytro Kurov, Chief Technology of credolab.
At credolab, a large amount of data is used to calculate and determine a score. Furthermore, unique user behavioural data is collected only once from each mobile or web device upon application submission.
After collecting and analysing all data via a sequence of algorithms, a behavioural scorecard is developed. This scorecard can provide a deep understanding of a customer’s probability of behaving in a similar way to confirmed fraudulent customers. Insights gained from mobile phone metadata, such as the brand, operating system and device model, can predict a person's technological literacy and, combined with other data points, detect red flags pointing to potential fraud risks. Thanks to modern algorithms, there is a dynamic data collection system where data can be analysed in real time to create behavioural models and obtain predictive signals.
Credolab’s ML platform detects micro-behavioural patterns from over 70,000 permissioned and privacy-consented data points transformed into 10 million possible signals. For example, credolab accesses mobile phone data such as device model, battery usage, storage, most frequent app category and interactions with the loan application form such as hesitation to submit and the total time spent on the application. These signals are then converted into granular insights about the customer so businesses can make informed decisions.
However, it is necessary to analyse the type of data first before converting the same into insights. Each feature consists of a number and can then be converted into insights. These numbers can be used to compare users, for example, through their typing speed, as some users input text with different speeds and speed times between typing and pressing the submit button. Insights generated from this could indicate possible fraudulent or, at least, suspicious activities.
Insights can provide valuable information across various sectors, enabling a more accurate understanding of individual customers. Through a single API, credolab helps businesses deliver immediate financial benefits in fraud, using metadata and without any personally identifiable information (PII) leaving the device:
Credolab's insights are provided to businesses via the SDK embedded in clients’ smartphone and web applications to provide stronger protection against fraud, making it possible to:
Insights are also very effective in analysing delinquency rates for fraud. Delinquency rates refer to the percentage of loans with delinquent payments in a financial institution's loan portfolio. A loan becomes delinquent when the customer makes late payments or fails to make a payment or regular instalment payments.
In contrast, a default rate is the percentage of loans issued by a financial institution that has gone unpaid for a prolonged period of time. Defaulting loans are a much more serious issue, as they affect the customer’s lending relationship with the lender, their credit bureau score and, as a consequence, the customer’s ability to obtain loans from other lenders in the future. The account might also be sent to a collection agency when a loan defaults, and it is written off as uncollectible on financial statements.
As a result of these insights gathered, credolab offers a solution through the SDK that improves the underwriting process by:
From a technical perspective, fraud scores are validated using the same process as credit risk scores. Credolab validates the data and calibrates the model by cross-checking the data with "bad customers" (aka confirmed fraudulent customers). Score quality improves as more data is collected and, over time, the model becomes more accurate.
After the data is collected and analysed, values are assigned, but these values may differ among score providers. One example is allocating values to mobile data usage to detect red flags. If a user has 60GB of mobile data but only uses 1GB, this may be a red flag for fraudulent activity. As a result, this red flag detection determines a low score.
Another example is allocating values to smartphone brands to categorise users. Additionally, past analysis and experience are required to determine potential red flags, such as that most iPhone users are less likely to breach requirements than Samsung, Oppo and Huawei users. Using all these value allocations and calculations, businesses can calculate final scores and plan accordingly.
Credolab uses the Gini coefficient to measure the predictive power of a credit risk model. The same coefficient is used to measure the predictiveness of fraud models, and the same data can be used to predict the rate of fraud events.
The Gini coefficient measures a model's ability to separate bad customers from good customers by estimating and distinguishing, for instance, its capacity to default in the future. The Gini coefficient is primarily known for calculating income inequality. This is done by measuring the extent to which the distribution of income or consumption deviates from a perfectly equal distribution. A Gini index of 0 represents perfect equality, and an index of 100 implies perfect inequality. In fraud detection, the Gini coefficient can be used to measure the model's ability to identify fraudulent applications accurately. A higher Gini coefficient indicates better performance since it suggests that the model is able to identify a greater proportion of fraudulent events correctly.
Besides the Gini coefficient, credit risk modellers often use the Kolmogorov-Smirnov (KS) statistics to measure the discriminatory power of a model that looks at the maximum difference between the distribution of cumulative events and non-events. In fraud detection, KS statistics can be used to measure the similarity between the distribution of fraudulent and non-fraudulent events. Similar to the Gini coefficient, higher KS statistics indicate better performance since it suggests that the model is able to distinguish between these transactions accurately.
Using ML and AI applied to dynamic fresh and real-time data gives the ability to analyse and use this data to obtain insights and create more accurate behavioural models. By identifying user behaviour changes expertly, businesses are able to learn faster and make more effective decisions on the spot.
Access data insights solutions that deliver growth - Fraud detection | Credit scoring | Marketing segmentation. Helps you say "YES" more confidently to more customers!
Learn more about credolab's products and possibilities with our features through our Blog section, and feel free to share our content with your team!