Credit Scoring

Nov 17, 2022

Credit Scoring Models: How to create a model that works best for you

Learn how the best models leverage as many data sources as possible to assess a high volume of the population with similar predictive power.

Subscribe to our newsletter

Risk-based scorecards measure a person's probability of defaulting on an unsecured lending product, which creates the basis for credit scoring models. Although these have been well established in the financial system, there is no "silver bullet" for scoring. Risk-based approaches, underwriting policies, and procedures differ from organisation to organisation, and no single solution works perfectly. Ideally, the best models leverage as many data sources as possible to assess a large population with similar predictive power and accuracy.


Historically, credit scoring relied solely on socio-demographic information reported by an applicant while filling in the application form and, whenever available, from other banks and credit bureaus, such as credit history. However, with the emergence of new data sources, Machine Learning (ML), and Artificial Intelligence (AI), combining traditional and alternative data to create a scoring model has become possible. Furthermore, combining these concepts develops a more holistic understanding of human behaviour, which helps make models become sounder and more predictive.

A wide array of data sources

Alternative data is information derived from non-traditional data sources, such as credit bureaus that offer a diverse set of data from mobile devices, Telecom operators, social media interactions, credit and debit card transactions, and so much more.  However, not all sources are created equally, and alternative credit data quality can differ.


Source reliability can be determined by taking into account the following factors:

  • Coverage: the source should provide data specifically about the individual
  • Specificity: the data source should have a broad and consistent coverage
  • Accuracy and timeliness: the source of data should be accurate and frequently updated
  • Predict power: the source of data should be relevant to the behaviour being predicted
  • Orthogonality: the source of data should be additive to traditional credit bureau data
  • Regular compliance: the source of data must comply with existing financial and privacy regulations


Within these various sources of information, we can find different types of data: 

  • For instance, psychometric data is one type of alternative data that describes an individual's psychological abilities and behavioural styles. Through psychometrics, it is possible to understand at a deeper level an individual’s latent skills, qualities, self-esteem, capacities and capabilities and emotional intelligence, which would otherwise be challenging to assess face-to-face. However, some downsides include the presence of cultural barriers and in-built bias that may add friction. Nevertheless, as with other alternative sources, this new method is attractive to fintech firms because it allows lenders to determine the borrowers' creditworthiness independently of their credit scores.


  • Another example is telco data. This information is collected by telecommunication companies, including mobile phone usage, call and SMS records, network status, server logs, SIM card swap, billing and top-ups, and roaming data. This data holds useful customer insights that, when analysed accordingly, can help financial institutions reduce credit risk, prevent fraud, and increase approval rates. However, some downsides include finding ways to use it in a manner compatible with personal privacy and high ethical standards.

  • Open banking data is transactional information shared by different financial institutions through an API. Some lenders use this data to verify the identity of an applicant by piggy banking on the KYC done by the bank. For example, a customer can verify his identity by accessing his online banking account. If the login and password work, the lender can verify that the applicant's identity matches the identity on the bank statement. However, there are some frictions related to this type of information:


  • Creates a tedious onboarding process: The user has to remember the login and password of their online banking account and give permission to the lender to access data of up to 12 months of bank statements. However, not all users remember their login credentials and may decide to abandon the application. 

  •  Difficult identity management: Developers might need a more reliable way of identifying and tracking users across open banking applications. Public APIs, for example, make it possible to pull data from banks, but they don’t provide means of keeping track of which accounts belong to which customers in each bank. Thus far, developers will need to continue building their identity management systems and integrate with a bank’s API.


  • Third-party risks: Some risks are associated with third parties within the Open Banking system, including potential process, technology and data risks.


  • Banked customers: Accessing this data source implies that the customer is banked or banked by a bank that makes its information available to third parties. This means the lenders may have a lower hit rate in emerging markets and even a low hit rate in developed countries.

  • Data is used only after verifying other data points in the onboarding process: Open banking data is never top of the funnel and almost always post-credit bureau pull to verify income and affordability.

  • Credolab uses behavioural data: which is data generated by, or in response to, a customer’s engagement. The data is collected in the form of anonymous metadata (without any PII ever leaving the device), in a JSON format, and processed to assess the creditworthiness of any applicant and detect fraud. This form of data is complementary to the other forms of traditional and alternative information, offering new knowledge about the subscriber, like their willingness to repay, besides the affordability and the propensity to repay.


From manual models to automation

Traditionally, credit scoring models have been used for decision-making and to provide predictive information on the potential for delinquency or default in a loan approval process. These models were based on manual processes with tedious onboarding systems. Some of the challenges they face include the following:

  • Hit rate and the resulting asymmetry between scorable and unscorable populations
  • The potential to increase false positives through a boost in customer approvals without clearly knowing the expected PD (Probability of Default)
  • Lowering the default percentage and reducing the number of clients not paying for a loan
  • The reduction of false negatives, preventing the approval of risky customers

Historically, banks had bureaucratic credit assessment processes and were regulated more stricter by governments. Today, banks are still regulated, and financial services authorities are increasingly putting pressure on fintechs to adhere to the same rules. BNPL fintechs, especially, are being scrutinised more closely.

Banks mostly approved people with good credit history because they rejected those without and focused on a prime segment of the population. Traditionally, banks have a low-risk appetite, while payday lenders, on the opposite end of the spectrum, have a very high-risk appetite. Therefore, even though most banks have already moved toward automation by using Loan Origination Systems (LOS) and Decision Engines (DE), there is still a long way to go.

manual vs automated loan assessments

Loan Origination Systems utilise various sources and leverage different engines to do multiple calculations and combine them. Therefore, through artificial intelligence, lenders could collect and analyse enormous volumes of data, which would be impossible with manual processes. This access to information helps lenders find new creditworthy customers, despite their little or non-existent credit history.


How does credolab’s credit scoring model work?

Credolab takes digital footprints from customers' mobile phones to get smartphone metadata and collect activity and information from the device, including

  • # of gambling app(s) installed,
  • Existing competitor(s) app installed, 
  • # of photos in the photo gallery,
  • Whether a user uses the cut/paste function when filling out a form,
  • Device brand/model,
  • OS, Device ID, memory ram
  • Social media apps > how many there are, but NOT your username on social media
  • Battery Health, and many more.

Credolab processes this information into scores that identify the correlation with micro-behavioural patterns similar to confirmed fraudulent or risky applicants. When switching from manual to automated engines that calculate vast amounts of data in a shorter time, a loan decision can be made in seconds. By using real-time scores and features, credolab reduces the time to yes to just seconds while helping lenders lower their risk.

However, for credolab to successfully calculate and produce the desired results above, it is imperative first to install credoSDK. This is because credoSDK must first be installed to collect information from new clients. This tool is triggered when the customer clicks on “submit application”, captures data from smartphones and returns this information in the form of a score to the client. Therefore, clients need to integrate credolab SDK into their mobile applications to collect customer data. 

This is important for several reasons:

  1. Credolab’s information is derived uniquely from smartphone metadata; getting the user’s consent first, along with the SDK initiated is imperative.
  2. As credolab can only access data once the SDK is installed, the only way to run a backtest is after the SDK is embedded into the client app. 
  3. It is necessary to validate the uplift generated by credolab scores used as input into the existing model.

For any new client, credolab will be evaluated with any other data sources used, their orthogonality and user base penetration (hit rate). Finally, after collecting data from their smartphone, credoSDK is used to validate the data and the process.

The importance of data validation

In some cases, depending on the size of a company, the lender will be available to process more or fewer applications and create more or less reliable scorecards. 

However, as a whole, data validation is important due to several reasons, including: 

  1. Giving risk officers the confidence that the data works and generates the expected uplift
  2. Knowing exactly how to optimise the waterfall of verifications that each lender makes at the onboarding stage
  3. Identifying ways to maximise revenue from risky clients while lowering the average cost of acquisition. Basically, a pricing-based risk where the lower the risk, the lower the price and vice versa.

Without data, the easiest risk solution is always to decline an applicant and avoid onboarding unknown risks. To overcome this challenge, credolab helps validate the data by calibrating the model by cross-checking the numbers against those “bad customers”. In the end, predictivity is higher as the more data is collected, the more stable the analysis and the more meaningful the applications are.

the end-to-end loan process part 1
the end-to-end loan process part 2

How does credolab differ from its competitors?

  1. Credolab is the only alternative credit scoring company that uses only privacy-consented, permissioned, depersonalised and anonymised metadata. By accessing only metadata, credolab generates scores and data points that are predictive and calculated responsibly. It is also GDPR, CCPA, LGDP, POPIA compliant and ISO 27001:2013 certified.
  2. Credolab is the only provider that has a 100% hit rate. As a result, it is the only alternative data provider to generate insights for any incoming application. The data is collected in the form of anonymous metadata (without any PII ever leaving the device) in JSON format and processed to assess, for example, the customer’s probability of defaulting a payment, detect fraudulent applicants, and enrich customer segmentations.
  3. Credolab provides scores, flags, and insights at the very top of the onboarding funnel and can be used to optimise the waterfall of subsequent verifications (including eKYC and affordability checks).
  4. Credolab delivers alternative risk scores, fraud scores and flags, marketing insights, device velocity checks, approval scores, and intent scores. When integrating credolab’s SDKs into a mobile or web frontend, credolab clients get access to a behaviour-based platform that delivers a wealth of insights via one API.
  5. Credolab is the first to market with over 10,000,000 behavioural features engineered and a best-in-class proprietary data modelling pipeline rooted in Machine Learning algorithms that could not be beaten even by a Yale Professor of Statistics. Features are insights obtained from metadata coming from mobile phones and web behaviours. On mobile, credolab collects permissioned metadata across the following categories: Device Information, Registered Accounts, Contacts, Calendars, Media, Application (Android only), and User Interface interactions, including the total time spent to apply for a loan or at registration, the time spent in the same position, Latency, and Keystroke patterns. On the web, credolab collects Device and Browser Information, Language and Operating System, and User Interface (UI) interactions, including the total time spent applying for a loan or at registration, the time spent in the same position, Latency, and Keystroke patterns
  6. Credolab offers easy and flexible implementation for its customers. It can be easily integrated into a Lender's IT Infrastructure via credolab’s flexible API and SDK, decreasing time-to-yes and simplifying the client onboarding process.

In summary, credolab’s scorecard shortens manual models and simplifies the end-to-end process using real-time scores or insights, keeping the time to yes to seconds. In addition, using our data improves the predictive power of existing models by reducing data asymmetry between scorable and unscorable populations, with less friction and full compliance with privacy regulations.

The best scoring model combines automation with various data sources, providing a holistic assessment of customer behaviour and allowing flexibility to lenders’ needs. Therefore, credolab provides custom-built solutions that adapt to each lender's loan process, reducing risk and increasing profits in compliance with data privacy.

Interested in learning how our products can help you? Request a free demo, or drop us your questions here.


Access data insights solutions that deliver growth - Fraud detection | Credit scoring | Marketing segmentation. Helps you say "YES" more confidently to more customers!

Learn more about credolab's products and possibilities with our features through our Blog section, and feel free to share our content with your team!

Follow us on social or get in touch today: Book a meeting | Blog | LinkedIn  | Twitter |Contact Us