# a_fraud_resilient_medical_insurance_claim_system__c4ae9175.pdf A Fraud Resilient Medical Insurance Claim System Yuliang Shi1, Chenfei Sun1, Qingzhong Li1, Lizhen Cui1, Han Yu2, Chunyan Miao2 1School of Computer Science and Technology, Shandong University, Jinan, China 2Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly, Nanyang Technological University, Singapore shiyuliang@sdu.edu.cn, sun.chenfei@163.com, {lqz, clz}@sdu.edu.cn, {han.yu, ascymiao}@ntu.edu.sg As many countries in the world start to experience population aging, there are an increasing number of people relying on medical insurance to access healthcare resources. Medical insurance frauds are causing billions of dollars in losses for public healthcare funds. The detection of medical insurance frauds is an important and difficult challenge for the artificial intelligence (AI) research community. This paper outlines HFDA, a hybrid AI approach to effectively and efficiently identify fraudulent medical insurance claims which has been tested in an online medical insurance claim system in China. Introduction Medical insurance frauds are causing billions of dollars in losses for public healthcare funds around the world. According to estimates by the Federal Bureau of Investigation (FBI), healthcare frauds cost American tax payers over US$80 billion a year (Aldrich, Crowder, and Benson 2014). Detecting medical insurance frauds is an important and difficult challenge. As more medical insurance claims are being filed and processed online, claimants behavior trajectory big data can be tracked during the claim process. However, because of the complex granularity of data, existing fraud detection approaches face difficulties in recalling fraudulent claim behaviors (Musal 2010; Liu et al. 2015). Traditional fraud detection techniques use rules designed by experts as a basis to identify fraudulent behaviors based on assessing if any of these rules have been violated (Ngai et al. 2011). As medical insurance claim activities move online, data-driven approaches for medical insurance fraud detection has now become a distinct possibility. The combination of behavior trajectory big data and machine learning techniques offer promising solutions to the medical insurance fraud problem. Human behaviors have two main attributes: 1) category and 2) frequency. Existing intelligent medical insurance fraud detection methods focus on detecting either abnormal categories of behaviors or abnormal frequencies of behaviors (Phua et al. 2010). The accuracy of these methods are often affected by the complex granularity of behaviors. To address this problem, we outline the hybrid fraud detection approach (HFDA) system which has been incorpo- Copyright c 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. rated into the Dareway Medical Insurance Claim System in China. Through the proposed Semi-Supervised Isomap (SSIsomap) behavior clustering method, the Simple Local Outlier Factor (Sim LOF) outlier detection method and the Dempster s Rule of Combination (Shafer 1976) based evidence aggregation method under HFDA, the system can detect abnormal categories and frequencies of behaviors simultaneously to help guard against medical insurance frauds. The HFDA System The HFDA system can be divided into four modules as illustrated in Figure 1(a): 1. Transforming the data records into behavior sequences; 2. Obtaining behavior pattern-based evidences through the proposed SSIsomap method; 3. Obtaining outlier-based fraud evidences through the proposed Sim LOF method; and 4. Combining the two sources of evidences to determine the probability of fraud through Dempster s Rule of Combination. In an online medical insurance claim system, there can be millions of transactions from a large number of users. In order to make better sense of the users actions, it is advantageous to organize the transactions into behavior sequences. Firstly, the history claims can be transformed into behavior sequences. This can be achieved by collecting relevant information from the claimants and other stakeholders through the system interface. The Dareway Medical Insurance Claim System, which is being used by Zibo City in China, collects information about the claimant, the hospital and the approving authorities as shown in Figure 1(b). Then, the clustering results for the behavior trajectory data are saved and expert users can modify these results to incorporate their domain knowledge. The behavior patterns are shown to the approvers and transformed into rules which are used to determine the presence of fraud in future claims. As medical insurance claim data for each claimant tend to be sparse, peer group comparison is favored over selfcomparison when it comes to outlier-based fraud detection. Therefore, we need to first obtain the behavior distributions of many claimants. The HDFA system analyzes the daily cost distributions of different groups of claimants (Figure Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Medical Insurance Claim Behaviors Behaviour Sequences Behaviour Classes SSIsomap Recode Behaviour Patterns Clinical Processes & Pharmacopia Domain Knowledge Outlier-based Evidence Dempster s Rule of Combination Probability of Fraud (a) The HFDA system. (b) The medical insurance claim submission user interface of the Dareway Medical Insurance Claim System. 0 100 200 300 400 500 600 700 0.000 0.005 0.010 0.015 0.020 0.025 daily cost of different diseases pneumonia RTI gastroenteritis CPN (c) Daily cost distributions of different groups of claimants. Figure 1: The system architecture, interface, and results from HFDA. 1(c)) to build up a baseline to identify outliers. For new claims, the proposed Sim LOF method looks for groups of related applicants and check if the new expenditure is within the baseline distribution. With the obtained pattern-based evidence and the outlierbased evidence, the HFDA system calculates the probability of fraud for new claims using Dempster s Rule of Combination (Shafer 1976). Claim approvers can check the status of new claims through the HDFA system. Records highlighted in red indicate high probabilities of fraud. Discussions and Future Work HFDA serves as a useful tool for medical insurance claim approvers to leverage people s behavior trajectory data to combat frauds. In future research, we will design decision support mechanisms to recommend suitable actions against potential medical insurance frauds for claim approvers. Human factors concepts such as emotion (Yu et al. 2010), curiosity (Yu et al. 2011), reputation (Yu et al. 2013), wellbeing considerations (Yu et al. 2014a) and decision-making characteristics (Yu et al. 2014b) will be explored to help the HFDA interface agent build trust with the users. The behaviour trajectory data in wellness games (Cai et al. 2014) will also be incorporated into HFDA for analysis. Acknowledgements This research is supported, in part, by the National Natural Science Foundation of China under Grant No. 61572295, 61573212, 61272241; the Natural Science Foundation of Shandong Province of China under Grant No. ZR2014FM031, ZR2013FQ014; the Shandong Province Independent Innovation Major Special Project No. 2015ZDXX0201B03; the Shandong Province key research and development plan No. 2015GGX101015, 2015GGX101007; the Fundamental Research Funds of Shandong University No. 2015JC031; the National Research Foundation, Prime Minister s Office, Singapore under its IDM Futures Funding Initiative and administered by the Interactive and Digital Media Programme Office; and the Lee Kuan Yew Post-Doctoral Fellowship Grant. References Aldrich, N.; Crowder, J.; and Benson, B. 2014. How much does medicare lose due to fraud and improper payments each year? The Sentinel. Cai, Y.; Shen, Z.; Liu, S.; Yu, H.; Han, X.; Ji, J.; Mc Keown, M. J.; Leung, C.; and Miao, C. 2014. An agent-based game for the predictive diagnosis of parkinson s disease. In AAMAS 14, 1663 1664. Liu, J.; Bier, E.; Wilson, A.; Honda, T.; Kumar, S.; Gilpin, L.; Guerra-Gomez, J.; and Davies, D. 2015. Graph analysis for detecting fraud, waste, and abuse in healthcare data. In IAAI-15, 3912 3919. Musal, R. M. 2010. Two models to investigate medicare fraud within unsupervised databases. Expert Systems with Applications: An International Journal 37(12):8628 8633. Ngai, E.; Hu, Y.; Wong, Y. H.; Chen, Y.-J.; and Sun, X. 2011. The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decision Support Systems 50(3):559 569. Phua, C.; Lee, V.; Smith, K.; and Gayler, R. 2010. A comprehensive survey of data mining-based fraud detection research. ar Xiv 1 14. Shafer, G. 1976. A Mathematical Theory of Evidence. Princeton University Press. Yu, H.; Cai, Y.; Shen, Z.; Tao, X.; and Miao, C. 2010. Agents as intelligent user interfaces for the net generation. In IUI 10, 429 430. Yu, H.; Shen, Z.; Miao, C.; and Tan, A.-H. 2011. A simple curious agent to help people be curious. In AAMAS 11, 1159 1160. Yu, H.; Miao, C.; An, B.; Leung, C.; and Lesser, V. R. 2013. A reputation management approach for resource constrained trustee agents. In IJCAI 13. Yu, H.; Miao, C.; An, B.; Shen, Z.; and Leung, C. 2014a. Reputation-aware task allocation for human trustees. In AAMAS 14, 357 364. Yu, H.; Yu, X.; Lim, S. F.; Lin, J.; Shen, Z.; and Miao, C. 2014b. A multi-agent game for studying human decision-making. In AAMAS 14, 1661 1662.