Entity Resolution Explained

Entity Resolution Explained

 

alert-icon-red-11.png

 

ATTENTION:

This page has been migrated to the Tazama GitHub repository and is now located at:

https://github.com/frmscoe/docs/blob/dev/Knowledge-Articles/Entity-Resolution/Entity-Resolution-Explained.md

This page will no longer be maintained in Confluence.

alert-icon-red-11.png

 

ATTENTION:

This page has been migrated to the Tazama GitHub repository and is now located at:

https://github.com/frmscoe/docs/blob/dev/Knowledge-Articles/Entity-Resolution/Entity-Resolution-Explained.md

This page will no longer be maintained in Confluence.

 

What is it?

Entity resolution is a process of data disambiguation, which is a fancy way of saying that we want to identify a unique entity out of the sum of available data. To understand entity resolution, we must first define what we mean by an “entity”.

Entity

An entity is a thing that exists in the real world or language (i.e. a noun). An entity can be loosely defined by a grouping of related data that collectively defines or describes an object (a thing). In a database, an entity may comprise a whole table, and is similar to the “entity” in “Entity Relationship Diagram”. An entity may also be described by other things that may themselves be entities (i.e. a person may be associated with a device, or an account, or an address).

In an Actio or Mojaloop ecosystem, examples of entities may include:

  • People / Persons / Individuals (customers, users, employees, etc)

  • Businesses / Non-individuals (business customers, vendors, merchants, DFSPs, Mojaloop Operators, Actio Operators, etc)

  • Accounts

  • Devices (personal computers, tablets, mobile phones, etc)

  • Locations

An entity is described by its data. For example, the sum of data that defines a person could be mapped out as follows:

 

(Effective) Resolution

The resolution part of the entity resolution process aims to determine if a new (prospective) entity is the same entity that had been introduced to the ecosystem previously, or if an entity is indeed a brand new entity. Sometimes this process is necessary because there is no explicit identifier for the entity in the system and the system must then calculate the probability that the entity is known and identifiable.

A resolution process evaluates the uniqueness of an entity based on the combination of its attributes. The more information we have available about an entity, the more accurate the result. Conversely, the less we know about an entity, the less accurate the result.

Imagine you’re travelling to a different country on holiday and you make a friend. He hears you’re from India, and he says: “Hey, I knew someone from India once. Do you know Sunita Singh?” And so you start negotiating: Male or female? How old is she? Which part of India is she from? Maybe you eventually manage to narrow it down and discover how small the world really is. But probably not.

The amount and variety of information about a person is essential to the resolution process, but it is also important to evaluate information that is itself strong indicators of identity, such as a government-issued identification document that contains a unique government-issued identification number. This way, if you know the country that someone hails from, and you have their government issued ID number, then you can be fairly certain of your ability to uniquely identify a person. (It’s still not a guarantee though… South Africa identified 517,249 duplicated identification numbers in 2020).

The most complex part of the resolution process is to define the strength of the match between a prospective entity and an existing entity.

The type of information related to an entity can be ranked in order of priority, and given a weighting. For example, a government issued identification number is considered a ultra-slow-to-almost-never-changing dimension over time and is usually a high-priority attribute with a high weighting. An address, which is a slow-changing dimension over time will have a lower priority weighting. E-mail addresses or telephone numbers are transient attributes and in certain environments may have a very low priority rating.

Some combinations of attributes are expected to have self-referential integrity. A person’s name by itself may have a medium weighting, but it is expected that a person’s name associated with a government identification document should create a great pairing; however very few people call themselves by their full given name all of the time, and sometimes people can change their names, for instance if they marry and adopt the name of their partner. And then there are nicknames. Names are generally a poor choice as a driver for entity resolution and it is essential to pair names with additional and higher priority information.

Effective entity resolution relies on what could be called the “big five” attributes of entity resolution:

 

Why do we need it?

The prevention of fraud and money laundering requires us to be able to explicitly identify the person behind a transaction, and not just the account from or to which the transaction was performed. It is a common money-laundering method for a criminal to open multiple accounts with different financial institutions. Financial institutions do not have the means to identify the banking facilities held by another institution and much fraud and money laundering is perpetrated in this blind-spot between institutions.

A switching hub such as Mojaloop is largely concerned with the flow of funds from account to account, and less with the persons operating those accounts. The identity of the account-holders, including the processes to onboard customers and verify their identities, is assumed to be the responsibility of the associated DFSPs. Sometimes, account-holder information may be shows to a payer to verify that they are paying the right payee, though this transparency may depend on applicable privacy legislation.

As Actio, we would need to be able to explicitly identify the persons operating an account in order to increase the likelihood of detecting fraud and money-laundering behaviour through transaction monitoring. For example, if we need to review the transaction history for a person so that we can model patterns of behaviour, we need to be able to identify that person beyond a doubt, otherwise the modelling will be flawed and we the detection will be compromised.

Entity Resolution Data in Mojaloop

Analysis of Mojaloop message fields

The table below lists the data fields available in the Mojaloop messages that could be used to support an entity resolution process for the payer or payee in a transaction.

Field Name

Description

POST /quotes

PUT /quotes

POST /transfers

PUT /transfers

Comments

payee

Information about the Payee in the proposed financial transaction.

Y

 

 

 

This field and sub-fields define a container structure for information about the payee

payee.partyIdInfo

Party Id type, id, sub ID or type, and FSP Id.

Y

 

 

 

This group of fields define a container structure for information about the payee. The set of information aims to uniquely identify a payee within the payee DFSP network. The Mojaloop API provides for a number of different ID Types (MSISDN, EMAIL, PERSONAL_ID, etc) but it is important to note that this information is intended to assist the eco-system in identifying the account, and not necessarily the account-holder. In all uses, this information is assumed to be unique within the context of a DFSP, and with possible intrinsic meaning only to that DFSP.

Where the partyIdType is PERSONAL_ID, the partySubIdOrType field contains additional descriptors for the identification information (e.g. PASSPORT, NATIONAL_ID_CARD, etc).

Depending on which ID Types are in use, this information is essential in resolving the identity of the payee, or at the very least the payee account information.

It is also possible, given the way that the Account Lookup Service (ALS) functions, that the same account may be identified through vastly different sets of partyIdType information, e.g. an MSISDN vs an EMAIL may resolve to the same payee account. This decision and information is not transparent even to Mojaloop - a DFSP will confirm that the target payee can be reached on the DFSP network using the credentials provided, but nothing more.

This information is initially supplied by the Payer in the POST /quotes message to specify the destination (Payee) of the transaction.

While a payee may be identified by a number of different attributes, the Mojaloop API only provides for the specification of a single attribute per transaction, which complicates the entity resolution process somewhat. If additional information is deemed essential to the entity resolution process, it would have to be provided in the extension list.

{OPEN QUESTION} What happens if a DFSP has multiple facilities for the same payee?

payee.partyIdInfo.partyIdType

The type of the identifier.

Y

 

 

 

payee.partyIdInfo.partyIdentifier

An identifier for the Party.

Y

 

 

 

payee.partyIdInfo.partySubIdOrType

A sub-identifier or sub-type for the Party.

Y

 

 

 

payee.partyIdInfo.fspId

The FSP identifier.

Y

 

Y

 

This field defines the DFSP that says that it can reach an account that is linked to the partyIdInfo information provided. The DFSP provides necessary context for the identity of the payee and is essential in entity resolution.

payee.partyIdInfo.extensionList

Optional list of extensions to the payee information, specific to deployment.

Y

 

 

 

While this data structure information does not explicitly contain information that identifies the payee entity, additional information may be provided in the extension list for this purpose. The range of additional information is defined by the scheme rules under which a Mojaloop switch operates.

payee.partyIdInfo.extensionList.extension

An optional extension element defined by a key-value pair

Y

 

 

 

payee.partyIdInfo.extensionList.extension.key

The key for an optional extension element value

Y

 

 

 

payee.partyIdInfo.extensionList.extension.value

The value for an optional extension element key

Y

 

 

 

payee.name

The name of the party, could be a real name or a nick name.

Y

 

 

 

This information has limited utility since it is largely scheme-specific. It is as likely that this field will contain an alias to describe the account as it is to have the name of the payee; however since the use of this field may be prescribed by the Mojaloop implementation scheme rules, it may be used to contain information useful for entity resolution.

{OPEN QUESTION} Which of the 4 messages contains this information?

payee.personalInfo

Personal information used to verify identity of Party such as first, middle, last name and date of birth.

Y

 

 

 

This field and sub-fields define a container structure for personal information about the payee.

payee.personalInfo.complexName

First, middle and last name.

Y

 

 

 

This group of fields is expected to contain the name of the payee and is essential in resolving the identity of the payee.

{OPEN QUESTION} Which of the 4 messages contains this information?

payee.personalInfo.complexName.firstName

First name

Y

 

 

 

payee.personalInfo.complexName.middleName

Middle name

Y

 

 

 

payee.personalInfo.complexName.lastName

Last name

Y

 

 

 

payee.personalInfo.dateOfBirth

Date of birth

Y

 

 

 

The date of birth is an essential additional component to resolve the identity of a person.

{OPEN QUESTION} Which of the 4 messages contains this information?

payer

Information about the Payer in the proposed financial transaction.

Y

 

 

 

This field and sub-fields define a container structure for information about the payer

payer.partyIdInfo

Party Id type, id, sub ID or type, and FSP Id.

Y

 

 

 

This group of fields define a container structure for information about the payer. The set of information aims to uniquely identify a payer within the payer DFSP network. The Mojaloop API provides for a number of different ID Types (MSISDN, EMAIL, PERSONAL_ID, etc) but it is important to note that this information is intended to assist the eco-system in identifying the account, and not necessarily the account-holder. In all uses, this information is assumed to be unique within the context of a DFSP, and with possible intrinsic meaning only to that DFSP.

Where the partyIdType is PERSONAL_ID, the partySubIdOrType field contains additional descriptors for the identification information (e.g. PASSPORT, NATIONAL_ID_CARD, etc).

Depending on which ID Types are in use, this information is essential in resolving the identity of the payer, or at the very least the payer account information.

It is expected that the payer that is associated with a DFSP is usually identified in a standardised way from one transaction to the next by that payer’s DFSP, based on the DFSP’s preferred identification method, as well as, possibly, scheme rule directives; however it is also possible, given the way that the Account Lookup Service (ALS) functions, that the payer’s account may be identified through different sets of partyIdType information when the payer becomes a payee.

While a payer may be identified by a number of different attributes, the Mojaloop API only provides for the specification of a single attribute per transaction, which complicates the entity resolution process somewhat. If additional information is deemed essential to the entity resolution process, it would have to be provided in the extension list.

payer.partyIdInfo.partyIdType

The type of the identifier.

Y

 

 

 

payer.partyIdInfo.partyIdentifier

An identifier for the Party.

Y

 

 

 

payer.partyIdInfo.partySubIdOrType

A sub-identifier or sub-type for the Party.

Y

 

 

 

payer.partyIdInfo.fspId

The FSP identifier.

Y

 

Y

 

This field defines the DFSP that hosts the account of the payer identified by the partyIdInfo information provided. The DFSP provides necessary context for the identity of the payee and is essential in entity resolution.

payer.partyIdInfo.extensionList

Optional list of extensions to the payer information, specific to deployment.

Y

 

 

 

While this data structure information does not explicitly contain information that identifies the payee entity, additional information may be provided in the extension list for this purpose. The range of additional information is defined by the scheme rules under which a Mojaloop switch operates.

payer.partyIdInfo.extensionList.extension

An optional extension element defined by a key-value pair

Y

 

 

 

payer.partyIdInfo.extensionList.extension.key

The key for an optional extension element value

Y

 

 

 

payer.partyIdInfo.extensionList.extension.value

The value for an optional extension element key

Y

 

 

 

payer.name

The name of the party, could be a real name or a nick name.

Y

 

 

 

This information has limited utility since it is largely scheme-specific. It is as likely that this field will contain an alias to describe the account as it is to have the name of the payer; however since the use of this field may be prescribed by the Mojaloop implementation scheme rules, it may be used to contain information useful for entity resolution.

This information is expected in the POST /quotes message.

payer.personalInfo

Personal information used to verify identity of Party such as first, middle, last name and date of birth.

Y

 

 

 

This field and sub-fields define a container structure for personal information about the payer

payer.personalInfo.complexName

First, middle and last name.

Y

 

 

 

This group of fields is expected to contain the name of the payer and is essential in resolving the identity of the payer.

payer.personalInfo.complexName.firstName

First name

Y

 

 

 

payer.personalInfo.complexName.middleName

Middle name

Y

 

 

 

payer.personalInfo.complexName.lastName

Last name

Y

 

 

 

payer.personalInfo.dateOfBirth

Date of birth

Y

 

 

 

The date of birth is an essential additional component to resolve the identity of a person.

geoCode

Longitude and Latitude of the initiating Party. Can be used to detect fraud.

Y

Y

 

 

While this information does not necessarily resolve to a specific address that is related to a person, this location information can be used to differentiate between two persons who share significant ambiguous attributes between them. For example, a person who (usually or exclusively) transacts from one town or country, as opposed to a person with similar information who transacts from another town or country.

This information is to be used to resolve an ambiguous profile via inspection by a user, and not as part of the automated entity resolution process due to a higher level of inference.

geoCode.latitude

The Latitude of the service initiating Party.

Y

Y

 

 

geoCode.longitude

The Longitude of the service initiating Party.

Y

Y

 

 

Ref: ISO20022 and Actio | Mojaloop to ISO 20022 mapping

Ref: https://docs.mojaloop.io/api-snippets/?urls.primaryName=v1.1

In Mojaloop, the information contained in the partyIdInfo data structure, is used to identify the account in the ecosystem from or to which a transfer is performed. The identification information by itself may not necessarily be unique: for example a personal identification number may be duplicated between sovereign territories and you would also need to include the issuer of an identification number to ensure that it is unique world-wide. A simple account number may also be duplicated between DFSPs and you would need to define the account number within the context of the DFSP as well.

In addition to the partyIdInfo information, Mojaloop is also able to transmit the account-holder’s first name, middle name and last name, but this information by itself is also not necessarily unique. One John No-middle-name Smith is indistinguishable from another John No-middle-name Smith.

The last piece of information that Mojaloop offers is the person’s date of birth to further narrow down the explicit identity of the account holder. It is less likely that two John Smiths with the same partyIdInfo and the same date of birth, are two different people.

But it is not impossible.

The information available out of Mojaloop can be illustrated in the following diagram:

 

Values in the PartyIdType are:

  • MSISDN - An MSISDN (Mobile Station International Subscriber Directory Number, that is, the phone number) is used as reference to a participant. The MSISDN identifier should be in international format according to the ITU-T E.164 standard. Optionally, the MSISDN may be prefixed by a single plus sign, indicating the international prefix.

  • EMAIL - An email is used as reference to a participant. The format of the email should be according to the informational RFC 3696.

  • PERSONAL_ID - A personal identifier is used as reference to a participant. Examples of personal identification are passport number, birth certificate number, and national registration number. The identifier number is added in the PartyIdentifier element. The personal identifier type is added in the PartySubIdOrType element.

  • BUSINESS - A specific Business (for example, an organization or a company) is used as reference to a participant. The BUSINESS identifier can be in any format. To make a transaction connected to a specific username or bill number in a Business, the PartySubIdOrType element should be used.

  • DEVICE - A specific device (for example, a POS or ATM) ID connected to a specific business or organization is used as reference to a Party. For referencing a specific device under a specific business or organization, use the PartySubIdOrType element.

  • ACCOUNT_ID - A bank account number or FSP account ID should be used as reference to a participant. The ACCOUNT_ID identifier can be in any format, as formats can greatly differ depending on country and FSP.

  • IBAN - A bank account number or FSP account ID is used as reference to a participant. The IBAN identifier can consist of up to 34 alphanumeric characters and should be entered without whitespace.

  • ALIAS An alias is used as reference to a participant. The alias should be created in the FSP as an alternative reference to an account owner. Another example of an alias is a username in the FSP system. The ALIAS identifier can be in any format. It is also possible to use the PartySubIdOrType element for identifying an account under an Alias defined by the PartyIdentifier.

Values under the PERSONAL_ID from PartySubIdOrType (PersonalIdentifierType) are:

  • PASSPORT - A passport number is used as reference to a Party.

  • NATIONAL_REGISTRATION - A national registration number is used as reference to a Party.

  • DRIVING_LICENSE - A driving license is used as reference to a Party.

  • ALIEN_REGISTRATION - An alien registration number is used as reference to a Party.

  • NATIONAL_ID_CARD - A national ID card number is used as reference to a Party.

  • EMPLOYER_ID - A tax identification number is used as reference to a Party.

  • TAX_ID_NUMBER - A tax identification number is used as reference to a Party.

  • SENIOR_CITIZENS_CARD - A senior citizens card number is used as reference to a Party.

  • MARRIAGE_CERTIFICATE - A marriage certificate number is used as reference to a Party.

  • HEALTH_CARD - A health card number is used as reference to a Party.

  • VOTERS_ID - A voter’s identification number is used as reference to a Party.

  • UNITED_NATIONS - An UN (United Nations) number is used as reference to a Party.