Jan 19, 2023

Getting to know fraud detection: How to use insights gathered

Learn how using machine learning and artificial intelligence applied to dynamic, fresh and real-time data gives the ability for analysis and utilisation to obtain insights and create more accurate behavioural models.

Subscribe to our newsletter

We have all heard about cybercrime, either from the media or because someone close to us has been affected by it. In response to the COVID-19 pandemic, businesses developed their digital services in record time, technology transformed people's lives, and cybercrime and fraud grew. The use of cloud solutions, VPN connections, and remote access services increased with remote working, and on the other hand, cybercrime proliferated accordingly.

Fraud losses have increased dramatically in recent years. According to the Nilson Report, in the United States (US), fraud losses have grown from $10.09 billion in 2020 to $11.91 billion in 2021, increasing by 18.1%. In addition, the combined fraud losses for countries outside the US increased by 11.4% from $18.34 billion in 2020 to $20.43 billion in 2021. It is also estimated that in 2031, fraud losses could grow to $19.24 billion in the US and $47.22 billion globally.

Consequently, businesses recognise the importance of building a model to prevent fraud by establishing tighter controls and analysing as much data as possible to detect fraud more accurately and quickly.

However, before determining a model, it is imperative to understand how to use data to overcome fraud risks and how different fraud risk assessment models work.

Understand the differences: Probabilistic vs Deterministic fraud risk assessment models

Historical records can provide insight into the past, but they are not necessarily reliable indicators of the future. Most fraud cases that could occur have not occurred yet. 

A probabilistic assessment model simulates future incidents that are likely to occur based on scientific evidence. Such risk assessments thus resolve the limitations of historical data. In contrast, a deterministic model treats the probability of an event as finite, where input values are known and outcomes are observed.  

“A deterministic risk assessment model considers the impact of just one risk scenario. However, a probabilistic fraud risk assessment model considers every possible scenario, including their probability, and impact.  In order to enhance the predictiveness of a model, businesses should use signals to identify exact behaviours that might trigger a rejection. This can be achieved, by using behavioural data, which offers highly predictive and contextual signals that can be leveraged as part of any automated, rules-based waterfall of verifications.”
By Michele Tucci, MD Americas and Chief Strategy Officer of credolab

Technological advancements have increased fraud risks, resulting in a loss of revenue and security for businesses, making early fraud detection even more critical.

Reducing fraud risk with behavioural data

With Machine Learning (ML) and Artificial Intelligence (AI), businesses can analyse data in real-time and set up patterns that block suspicious logins, identity theft, or fraudulent transactions. Historical data and blatantly obvious data misrepresentations help lenders detect suspicious user activity in the past, but alternative data provides a holistic view now. Alternative data is the set of information on behaviours, habits, interests and transactions obtained from non-traditional sources that a person carries out.

Behavioural data is one of the most popular alternative data sources for fraud detection and stopping identity theft. It provides businesses with decision-useful information to help them understand their customers more accurately. Decision-making is made easier when granular insights (user-level insight data) are uncovered through behavioural data, which helps assess customers’ fraud risk when they apply for a loan or credit card.

For instance, some red flags include hesitancy in application submission, time spent submitting the application, and copying and pasting information that users should know by heart.

Taking a closer look at the process

It is essential to understand how the overall process works, from application to data collection and conversion into fraud insights.

To begin the process, customers need to fill in an application form. The data collection process also starts at this stage as data is collected while the customer types and interacts. Within seconds after submitting an application, metadata is uploaded onto credolab's secure cloud, and scores or insights can be obtained for businesses and financial institutions to use.

Understanding the process

How is data collected?

Data is collected from a web page or mobile device via a proprietary library of embedded code in the form of anonymised metadata and stored in a secure cloud environment. However, this is but one part of data collection. Here’s how it works as a whole:

  • Step 1: Clients collect data from each customer’s application form
  • Step 2: Clients enrich this data by using the services of third-party data and scoring providers such as credit bureaus and credolab
  • Step 3: Clients use all data points collected as input into their underwriting process
  • Step 4: Clients make a credit decision. 

Although collecting data may seem a simple task, it is actually a delicate balancing act that requires a number of considerations often with competing objectives.

“Collecting data requires lenders and banks to pay close attention to legal aspects related to data privacy and protection; marketing aspects related to enabling seamless and frictionless onboarding processes that eliminate unnecessary drop-off points; and technical aspects related to the quality of the data being collected and the speed of such collection” 
By Dmytro Kurov, Chief Technology of credolab.

At credolab, a large amount of data is used to calculate and determine a score. Furthermore, unique user behavioural data is collected only once from each mobile or web device upon application submission.

How to detect fraudulent patterns?

How does credolab work for fraud

After collecting and analysing all data via a sequence of algorithms, a behavioural scorecard is developed. This scorecard can provide a deep understanding of a customer’s probability of behaving in a similar way to confirmed fraudulent customers. Insights gained from mobile phone metadata, such as the brand, operating system and device model, can predict a person's technological literacy and, combined with other data points, detect red flags pointing to potential fraud risks. Thanks to modern algorithms, there is a dynamic data collection system where data can be analysed in real time to create behavioural models and obtain predictive signals.  

Credolab’s ML platform detects micro-behavioural patterns from over 70,000 permissioned and privacy-consented data points transformed into 10 million possible signals. For example, credolab accesses mobile phone data such as device model, battery usage, storage, most frequent app category and interactions with the loan application form such as hesitation to submit and the total time spent on the application. These signals are then converted into granular insights about the customer so businesses can make informed decisions.

However, it is necessary to analyse the type of data first before converting the same into insights. Each feature consists of a number and can then be converted into insights. These numbers can be used to compare users, for example, through their typing speed, as some users input text with different speeds and speed times between typing and pressing the submit button. Insights generated from this could indicate possible fraudulent or, at least, suspicious activities.

Insights can provide valuable information across various sectors, enabling a more accurate understanding of individual customers. Through a single API, credolab helps businesses deliver immediate financial benefits in fraud, using metadata and without any personally identifiable information (PII) leaving the device:

Delivering fraud insights

Credolab's insights are provided to businesses via the SDK embedded in clients’ smartphone and web applications to provide stronger protection against fraud, making it possible to:

  • Detect repeated loan applications using the same device, browser or IP address
  • Analyse the frequency a customer changes their income, address or other key information in the application form
  • Check if the device is locked to a mobile operator or open to other networks
  • Predict the device’s age, and determine the owner’s activity based on the number of multimedia files and contacts added over time

Insights are also very effective in analysing delinquency rates for fraud. Delinquency rates refer to the percentage of loans with delinquent payments in a financial institution's loan portfolio. A loan becomes delinquent when the customer makes late payments or fails to make a payment or regular instalment payments

In contrast, a default rate is the percentage of loans issued by a financial institution that has gone unpaid for a prolonged period of time. Defaulting loans are a much more serious issue, as they affect the customer’s lending relationship with the lender, their credit bureau score and, as a consequence, the customer’s ability to obtain loans from other lenders in the future. The account might also be sent to a collection agency when a loan defaults, and it is written off as uncollectible on financial statements.

As a result of these insights gathered, credolab offers a solution through the SDK that improves the underwriting process by:

  • Lowering the default rate, which lowers credit costs and cost per application
  • Improving the persistency rate by predicting the likelihood that a new customer will not pay the first instalment
  • Reducing fraud occurrence by detecting it from the origination stage
  • Using automated processes to avoid manual human review and, thereby, additional fraud investigation costs
  • Achieving a better strategy with multiple customised dashboards simultaneously by combining credolab scoring with internal or external customer checks

Diving into fraud insights: Fraud scores and fraud checks

From a technical perspective, fraud scores are validated using the same process as credit risk scores. Credolab validates the data and calibrates the model by cross-checking the data with "bad customers" (aka confirmed fraudulent customers). Score quality improves as more data is collected and, over time, the model becomes more accurate. 

After the data is collected and analysed, values are assigned, but these values may differ among score providers. One example is allocating values to mobile data usage to detect red flags. If a user has 60GB of mobile data but only uses 1GB, this may be a red flag for fraudulent activity. As a result, this red flag detection determines a low score. 

Another example is allocating values to smartphone brands to categorise users. Additionally, past analysis and experience are required to determine potential red flags, such as that most iPhone users are less likely to breach requirements than Samsung, Oppo and Huawei users. Using all these value allocations and calculations, businesses can calculate final scores and plan accordingly.

Smartphone Brands example

Making decisions with fraud insights: Gini coefficients

Credolab uses the Gini coefficient to measure the predictive power of a credit risk model. The same coefficient is used to measure the predictiveness of fraud models, and the same data can be used to predict the rate of fraud events.

The Gini coefficient measures a model's ability to separate bad customers from good customers by estimating and distinguishing, for instance, its capacity to default in the future. The Gini coefficient is primarily known for calculating income inequality. This is done by measuring the extent to which the distribution of income or consumption deviates from a perfectly equal distribution. A Gini index of 0 represents perfect equality, and an index of 100 implies perfect inequality. In fraud detection, the Gini coefficient can be used to measure the model's ability to identify fraudulent applications accurately. A higher Gini coefficient indicates better performance since it suggests that the model is able to identify a greater proportion of fraudulent events correctly.

Besides the Gini coefficient, credit risk modellers often use the Kolmogorov-Smirnov (KS) statistics to measure the discriminatory power of a model that looks at the maximum difference between the distribution of cumulative events and non-events. In fraud detection, KS statistics can be used to measure the similarity between the distribution of fraudulent and non-fraudulent events. Similar to the Gini coefficient, higher KS statistics indicate better performance since it suggests that the model is able to distinguish between these transactions accurately. 

Using ML and AI applied to dynamic fresh and real-time data gives the ability to analyse and use this data to obtain insights and create more accurate behavioural models. By identifying user behaviour changes expertly, businesses are able to learn faster and make more effective decisions on the spot.

Interested in learning how our products can help you? Request a free demo, or drop us your questions here.


Access data insights solutions that deliver growth - Fraud detection | Credit scoring | Marketing segmentation. Helps you say "YES" more confidently to more customers!

Learn more about credolab's products and possibilities with our features through our Blog section, and feel free to share our content with your team!

Follow us on social or get in touch today: Book a meeting | Blog | LinkedIn  | Twitter |Contact Us