Skip to main content

client2vec: Generic Clients repository for Banking Applications

-By Leonardo Baldassini, Jose Antonio Rodr´ıguez Serrano 
BBVA Data & Analytics


Abstract

Designing the client2vec an internal library to rapidly build baselines for banking applications. Client2vec uses marginalized stacked de-noising autoencoders on current account transactions data to create vector embeddings which represent the behaviors of our clients. These representations can then be used in, and optimized against, a variety of tasks such as client segmentation, profiling and targeting.



Most data analytics and commercial campaigns in retail banking revolve around the concept of behavioral similarity, for instance: studies and campaigns on client retention; product recommendations; web applications where clients can compare their expenses with those of similar people in order to better manage their own finances; data integrity tools. The analytic work behind each of these products normally requires the construction of a set of customer attributes and a model, both typically tailored to the problem of interest. The aim is to systematize this process in order to encourage model and code reuse, reduce project feasibility assessment times and promote homogeneous practices.  

Client2vec: a library to speed up the construction of informative baselines for behavior centric banking applications. In particular, client2vec focuses on behaviors which can be extracted from account transactions data by encoding that information into vector form (client embedding). These embeddings make it possible to quantify how similar two customers are and, when input into clustering or regression algorithms, outperform the socio demographic customer attributes traditionally used for customer segmentation or marketing campaigns. The proposed solution is with minimal computational and preprocessing requirements that could run even on simple infrastructures.  Client2vec offers our data scientists the possibility to optimize the embeddings against the business problem at hand. For instance, the embedding may be tuned to optimize the average precision for the task of retrieving suitable targets for a campaign.


Approach

client2vec following an analogy with unsupervised word embeddings, whereby account transactions can be seen as words, clients as documents (bags or sequence of words) and the behavior of a client as the summary of a document. Just like word or document embeddings, client embeddings should exhibit the fundamental property that neighboring points in the space of embeddings correspond to clients with similar behaviors.


First Approach : To extract vector representations of transactions and compose them into client embeddings, as done with word embeddings to extract phrase or document embeddings via averaging or more sophisticated techniques.

Second Approach : To embed clients straight away

We explored the former option by applying the famed word2vec algorithm to our data and then pooling the embeddings of individual transactions into client representations with a variety of methods. For the latter approach, which is the one currently employed by client2vec, we built client embeddings via a marginalized stacked denoising autoencoder (mSDA). For comparison and benchmarking purposes, we also tested the embedding comprising the raw transactional data of a client and the one produced by sociodemographic variables. Embeddings are then turned into actionable baselines by casting business problems as nearest neighbor regressions. This builds on successful works in computer vision which adopt the principle of the unreasonable effectiveness of data.



Sociodemographic variables

The obvious fundamental benchmark to which we compared all methods are sociodemographic variables: age, gender, income range, postcode, city and province. Such variables are typically considered by banks, retailers and other organizations for purposes like segmentations or campaigns. All of these variables are categorical, even the income, having been binned in several ranges. As such, we one-hot encode them and then reduce the dimensionality of the vector thus obtained in order to measure the Euclidean distance between two sociodemographic representations.

Raw transactions


Embedding via word2vec

Word2vec is a family of embeddings of words in documents, which express each word token with a dense vector. These vectors result from the intermediate encoding of a 2- layer network trained to reconstruct the linguistic context of each token and exhibit strong semantic properties, e.g. two nearby vectors refer to words that may share the same topic or even be synonyms.

Model selection

We treat the preprocessing options for mSDAs listed above like hyperparameters to optimize at train time. Likewise, the hyperparameters for the word2vec benchmark are the word-embedding dimension and the context window size [28], while for the raw transaction embeddings we only choose whether to L2-normalize, log-normalize or binarize. The optimization is carried out separately for each use case we consider.


Results


Conclusions

 An attempt to develop an internal tool that could catalyze the data-driven decision making for BBVA. They described how we worked towards a solution that was simple to use, fast to deploy and integrate in colleagues’ processes and that required minimal preprocessing. Along the way, we learned that composing transactional embeddings extracted with word2vec into customer embeddings doesn’t always offer an acceptable performance, while mSDAs help us capture a good deal of behavioral information. Furthermore, we highlighted how this information can be extracted even from simple, coarse transactional data. We plan to keep expanding the client2vec library by adding new representations as new use cases arise, as well as by proactively exploring algorithms that fit its philosophy of simplicity, such as the nonlinear extension of mSDA or metric learning to further boost the performance mSDA embeddings in client targeting. 


Comments

Popular posts from this blog

ABOD and its PyOD python module

Angle based detection By  Hans-Peter Kriegel, Matthias Schubert, Arthur Zimek  Ludwig-Maximilians-Universität München  Oettingenstr. 67, 80538 München, Germany Ref Link PyOD By  Yue Zhao   Zain Nasrullah   Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada  Zheng Li jk  Northeastern University Toronto, Toronto, ON M5X 1E2, Canada I am combining two papers to summarize Anomaly detection. First one is Angle Based Outlier Detection (ABOD) and other one is python module that  uses ABOD along with over 20 other apis (PyOD) . This is third part in the series of Anomaly detection. First article exhibits survey that covered length and breadth of subject, Second article highlighted on data preparation and pre-processing.  Angle Based Outlier Detection. Angles are more stable than distances in high dimensional spaces for example the popularity of cosine-based sim...

Ownership at Large

 Open Problems and Challenges in Ownership Management -By John Ahlgren, Maria Eugenia Berezin, Kinga Bojarczuk, Elena Dulskyte, Inna Dvortsova, Johann George, Natalija Gucevska, Mark Harman, Shan He, Ralf Lämmel, Erik Meijer, Silvia Sapora, and Justin Spahr-Summers Facebook Inc.  Software-intensive organizations rely on large numbers of software assets of different types, e.g., source-code files, tables in the data warehouse, and software configurations. Who is the most suitable owner of a given asset changes over time, e.g., due to reorganization and individual function changes. New forms of automation can help suggest more suitable owners for any given asset at a given point in time. By such efforts on ownership health, accountability of ownership is increased. The problem of finding the most suitable owners for an asset is essentially a program comprehension problem: how do we automatically determine who would be best placed to understand, maintain, ev...

Hybrid Approach to Automation, RPA and Machine Learning

- By Wiesław Kopec´, Kinga Skorupska, Piotr Gago, Krzysztof Marasek  Polish-Japanese Academy of Information Technology Paper Link Courtesy DZone   Abstract One of the more prominent trends within Industry 4.0 is the drive to employ Robotic Process Automation (RPA), especially as one of the elements of the Lean approach.     The full implementation of RPA is riddled with challenges relating both to the reality of everyday business operations, from SMEs to SSCs and beyond, and the social effects of the changing job market. To successfully address these points there is a need to develop a solution that would adjust to the existing business operations and at the same time lower the negative social impact of the automation process. To achieve these goals we propose a hybrid, human-centred approach to the development of software robots. This design and  implementation method combines the Living Lab approach with empowerment through part...