Skip to main content

Which Channel to Ask My Question?

 

Personalized Customer Service Request Stream Routing using Deep Reinforcement Learning


-By ZINING LIU1 , CHONG LONG , XIAOLU LU , ZEHONG HU , JIE ZHANG , YAFANG WANG

Paper Link


Courtesy:facebook

Abstract

Customer services are critical to all companies, as they may directly connect to the brand reputation. Due to a great number of customers, e-commerce companies often employ multiple communication channels to answer customers’ questions, for example, chatbot and hotline. On one hand, each channel has limited capacity to respond to customers’ requests, on the other hand, customers have different preferences over these channels. The current production systems are mainly built based on business rules, which merely considers tradeoffs between resources and customers’ satisfaction. To achieve the optimal tradeoff between resources and customers’ satisfaction, we propose a new framework based on deep reinforcement learning, which directly takes both resources and user model into account. In addition to the framework, we also propose a new deep-reinforcement-learning based routing method – double duelling deep Q-learning with prioritized experience replay (PER-DoDDQN).


The quality of customer service is crucial to a company’s reputation: its quality is measured by how quickly a company responds to customers’ requests and how satisfied customers are when seeking help. Obviously, the satisfaction of a customer can merely be measured using the problem-solving quality alone. In practice, a company also adopts customers’ queuing time as one of the indicators for measuring satisfaction. To shorten customers’ waiting time, major companies often provided multiple communication channels for customers to choose, for example, mobile App,web-based message and the traditional hotline. Different communication channels have their own limited quota for responding to customers’ requests, and also require different cognitive loads. Customers are impatient, especially when they have a request to be resolved, but they want to express their need using the least effort.

 The proposed framework is in sharp contrast to the rule-based system, as our framework directly captures: (i) customers’ preference and (ii) each channels future traffic. Customers have different preferences over different communication channels, and a simple routing method may result in low satisfaction. For example, if the problem is not so urgent, a student may be willing to leave messages on the offline service desk if the current hotline is busy; someone who is doubtful about the chatbot may be stuck to the hotline channel, regardless the waiting time. Recommending customers their preferrable surrogate channel is essential as it affects the overall customers’ experience. Another key to solving the allocation problem is to be able to predict the channel’s traffic over the next time window. The requests data stream often comes in a high speed, especially in peak hours and a mass corporation, a system can fail miserably if it ignores the time-series feature of the request stream. For example, if our prediction suggests that the hotline is not busy in the next time window we can then let customers be served using the hotline if they prefer to; similarly if our the prediction shows the hotline’s capacity will be exceeded in the next time window, we may try to route customers’ request to alternative channels if possible. Hence, the framework we proposed in this paper is to seek tradeoffs among the channel capacity, user preference and the predicted traffic of the current channel.


Contributions can be concluded in three-fold: 

1) Model the customer service requests routing problem using deep reinforcement learning, considering both channel resources and customers’ satisfaction. 

2) They propose the double duelling deep Q-learning with prioritized experience replay method to solve the routing problem, which achieves that better performance than its counterparts in practice. 

3) Authors perform an extensive evaluation using both real and synthetic data to demonstrate the practical value of our proposed methods.


Application Background

Modern business often provides several communication channels for customers convenience, ranging from traditional call center service, online chatbot, to mobile APP. Besides the traditional call center (or hotline) services, other channels may be served by a mixture of automatic chatbot and human customer representatives. Apparently, the hotline service has the least capacity to deal with customers’ request, while it is the most preferred one by a majority of the customers, due to its low communication cost. Other channels face a similar trade-off. For example, it is easier for users to access mobile Apps for issuing customer service requests, but it requests a lot of cognitive effort for users to describe their information need precisely and the interaction is often less efficient compared to using the hotline service. While most of the time the mobile APP may not be preferred, for simple requests, customers’ may resolve them more efficient by selecting or browsing pre-selected questions. Similarly, for the online interface, customers are required to input their requests precisely, which may prevent customers from using this channel. However, compared to the mobile APP and the hotline service, the web interface provides a way for customers to interact with each other, which may help to reduce the workload of customer service representatives to some extent. When a large number of customers are making requests to one channel, the channel’s capacity may be exceeded, and customers need to wait a long time. In this case, most customer services platforms will re-direct users to other channels, however, users have a large chance to reject such recommendation.


image





BASELINE CUSTOMER SERVICE ROUTING SYSTEM 

To compare with our proposed RL-based algorithm, we consider a real customer service routing system as an example, shown in Figure 1. This is a real product system adopted by a large financial technology company, which is designed based on business rules. The system considers two communication channels: self-service and hotline, where self-service let customers solve simple requests and the hotline is the traditional call-based customer service. The system works as follows: when a customer calls in, the system always asks whether the customer would like to use a self-service, as the hotline may be congested and the customer’s question may be simple. If the customer agrees, he or she will be routed to the self-service channel, otherwise, the system will then make a decision depending on the current queue length of the hotline channel. A customer will always be re-directed to hotline service when there are no people waiting in the queue. However, when the service channel is busy and there is a queue, the system will randomly select a set of customers to ask if they are willing to switch to the drainage channel. The exact number of customers who will be chosen to ask for a second preference depends on the current length of the hotline queue – if the queue is long, then more customers will be selected than a short queue.


The Routing Model

The overall routing model is based on deep reinforcement learning, and more specifically, the DQN variants. They approximate qθ(s, a), which means the value of "state"(s) and "action"(a) pairs with the deep learning.


We start to describe how we formulate the three key elements (action, state and reward function) in any reinforcement learning model, and then we describe our proposed PER-DoDDQN in Algorithm 1. Action. The system aims to learn a policy that can determine which channel should be recommended to users, so the action here is to select a channel among n candidates, where n is determined by the total number of channels in the real application. Representd the action as a, (a ∈ {1, 2, 3, . . . , n}). State. A state is expected to express the customer’s channel preference, and channel feasibility of handling furthermore requests. So the state includes the customers’ preferences over different channels u, the channel capacity c and the channel’s future request flow traffic eˆt. The entire state is represented as s = hu, eˆt, ci. Reward Function. There are duel reward aspects we need to consider: from the customers’ perspective and from the channel capacity, based on which the reward is: 






Results on Synthetic Data. 

They showed evaluation results on the synthetic dataset in table 2 and figure 5. As the hotline channel is the main concern in the real production system, we first focus on the absolute congestion percentage of this particular channel, and show results in Figure 5. We show the comparison of our proposed PER-DoDDQN in Figure 5(a). As we can see, our environment model is crucial for finding the optimal routing plan: the method PER-DoDDQN-e, which doesn’t have flow estimation is consistently worse than the others. Without user modeling also hurt the performance, but only slightly on the synthetic data. We then compare our method to other DQN variants in Figure 5(b). The trend is clear – our proposed PER-DoDDQN is the best among all. Also, we can see that the prioritized experience replay plays an important role in the model, without which the performance can be degraded a lot. We then consider the overall evaluation using proposed metrics, in Table 2. Among all evaluation metrics, the CCR is the most important one, and the proposed PER-DoDDQN is the best across all configurations; this trend also holds when evaluating using AC. DoDDQN shows a better performance when considering the peak congestion, and it achieves a similar but worse performance than its PER variant. Using a simple DQN is the best if only drainage percentage or average free rate, however, it simply means a non-optimal plan is shown. Further, results on both DoDDQN and DDQN suggest that our proposed PER-DoDDQN is more advanced than considering any of the components alone. We compare RL models with standard heuristics algorithms, including Simulated annealing (SA), CNN and Baselines(rule-based system) which can be viewed as heuristic methods. As shown in Figure 6 and Table 3, the traditional machine learning (SA, SVM, KNN, CNN) perform worse than the RL algorithms, especially in the metrics CCR, AC and PC. Because the long term gains are not considered in these models. The RL framework takes into account not just immediate reward but also the impact of the selected action in the future. But other ML models recommend channels to the customers solely on the current state. Note that the average idle degree of the catering staff(AFR) can be very low when the customers are more likely to be assigned to hotline regardless of the capacity of channels. And the percentages of customers who accepted the switchto-self-service suggestions(SP) and switch-to-app suggestions(DP) are independent of the future state. Supervised machine learning methods only care about the current state, so it can perform similarly to or even perform better than the RL models in one or two of the above three metrics, but obviously it can not change the fact that it can not beat RL algorithms among all the metrics in general. 


Conclusion

They formulated the classic customer request routing problem into an optimization problem by considering both channel resources and customers’ satisfaction. To address the real problem, we proposed a novel framework, which is based on the deep reinforcement learning method. In addition to the framework, we also propose a new routing method by combining DDQN and DoDQN methods. Extensive experiments on both real and synthetic data show that our proposed framework greatly improves the existing system and our proposed PER-DoDDQN method is the best configuration. In future work, we plan to further improve our method from the following perspectives: (i) improve our user profiling by understanding users’ description of requests, instead of considering attributes alone; (ii) we plan to incorporate real-time features into the proposed PER-DoDDQN model, for a better model of the environment; (iii) we also plan to generalize our model to more routing or dispatching related problems. 



Comments

Popular posts from this blog

ABOD and its PyOD python module

Angle based detection By  Hans-Peter Kriegel, Matthias Schubert, Arthur Zimek  Ludwig-Maximilians-Universität München  Oettingenstr. 67, 80538 München, Germany Ref Link PyOD By  Yue Zhao   Zain Nasrullah   Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada  Zheng Li jk  Northeastern University Toronto, Toronto, ON M5X 1E2, Canada I am combining two papers to summarize Anomaly detection. First one is Angle Based Outlier Detection (ABOD) and other one is python module that  uses ABOD along with over 20 other apis (PyOD) . This is third part in the series of Anomaly detection. First article exhibits survey that covered length and breadth of subject, Second article highlighted on data preparation and pre-processing.  Angle Based Outlier Detection. Angles are more stable than distances in high dimensional spaces for example the popularity of cosine-based similarity measures for text data. Object o is an out

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

 - By Haoyu Dong, Shijie Liu, Shi Han, Zhouyu Fu, Dongmei Zhang Microsoft Research, Beijing 100080, China. Beihang University, Beijing 100191, China Paper Link Abstract Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges. Automatic table detection is a key enabling technique and an initial step in spreadsheet data intelligence. However, the detection task is challenged by the diversity of table structures and table layouts on the spreadsheet. Considering the analogy between a cell matrix as spreadsheet and a pixel matrix as image, and encouraged by the successful application of Convolutional Neural Networks (CNN) in computer vision, we have developed TableSense, a novel end-to-end framework for spreadsheet table detection. First, we devise an effective cell featurization scheme to better leverage the rich information in each cell; second, we develop an enhanced convolutional neural network model for tab

DEEP LEARNING FOR ANOMALY DETECTION: A SURVEY

-By  Raghavendra Chalapathy  University of Sydney,  Capital Markets Co-operative Research Centre (CMCRC)  Sanjay Chawla  Qatar Computing Research Institute (QCRI),  HBKU  Paper Link Anomaly detection also known as outlier detection is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data. Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or errors in a text. Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions Hawkins defines an outlier as an observation that deviates so significantly from other observations as to arouse suspicion that it was generated by a different mechanism. Aim of this paper is two-fold, First is a structured and comprehensive overview of research methods in deep learning-based anomaly detection. Furthermore the adoption of these methods