An Adaptive Approach on Credit Card Fraud Detection Using Transaction Aggregation and Word Embeddings
ABSTRACT: Due to the surge of interest in online retailing, the use of credit cards has been rapidly expanded in recent years. Stealing the card details to perform online transactions, which is called fraud, has also seen more frequently. Preventive solutions and instant fraud detection methods are widely studied due to critical financial losses in many industries. In this work, a Gradient Boosting Tree (GBT) model for the real-time detection of credit card frauds on the streaming Card-Not-Present (CNP) transactions is investigated with the use of different attributes of card transactions. Numerical, hand-crafted numerical, categorical and textual attributes are combined to form a feature vector to be used as a training instance. One of the contributions of this work is to employ transaction aggregation for the categorical values and inclusion of vectors from a character level word embedding model which is trained on the merchant names of the transactions. The other contribution is introducing a new strategy for training dataset generation employing the sliding window approach in a given time frame to adapt to the changes on the trends of fraudulent transactions. In the experiments, the feature engineering strategy and the automated training set generation methodology are evaluated on the real credit card transactions.
Project description:Human mobility has been traditionally studied using surveys that deliver snapshots of population displacement patterns. The growing accessibility to ICT information from portable digital media has recently opened the possibility of exploring human behavior at high spatio-temporal resolutions. Mobile phone records, geolocated tweets, check-ins from Foursquare or geotagged photos, have contributed to this purpose at different scales, from cities to countries, in different world areas. Many previous works lacked, however, details on the individuals' attributes such as age or gender. In this work, we analyze credit-card records from Barcelona and Madrid and by examining the geolocated credit-card transactions of individuals living in the two provinces, we find that the mobility patterns vary according to gender, age and occupation. Differences in distance traveled and travel purpose are observed between younger and older people, but, curiously, either between males and females of similar age. While mobility displays some generic features, here we show that sociodemographic characteristics play a relevant role and must be taken into account for mobility and epidemiological modelization.
Project description:People are increasingly leaving digital traces of their daily activities through interacting with their digital environment. Among these traces, financial transactions are of paramount interest since they provide a panoramic view of human life through the lens of purchases, from food and clothes to sport and travel. Although many analyses have been done to study the individual preferences based on credit card transaction, characterizing human behavior at larger scales remains largely unexplored. This is mainly due to the lack of models that can relate individual transactions to macro-socioeconomic indicators. Building these models, not only can we obtain a nearly real-time information about socioeconomic characteristics of regions, usually available yearly or quarterly through official statistics, but also it can reveal hidden social and economic structures that cannot be captured by official indicators. In this paper, we aim to elucidate how macro-socioeconomic patterns could be understood based on individual financial decisions. To this end, we reveal the underlying interconnection of the network of spending leveraging anonymized individual credit/debit card transactions data, craft micro-socioeconomic indices that consists of various social and economic aspects of human life, and propose a machine learning framework to predict macro-socioeconomic indicators.
Project description:To assess how easily minors can purchase cigarettes online and online cigarette vendors' compliance with federal age/ID verification and shipping regulations, North Carolina's 2013 tobacco age verification law, and federal prohibitions on the sale of non-menthol flavoured cigarettes or those labelled or advertised as 'light'.In early 2014, 10 minors aged 14-17 attempted to purchase cigarettes by credit card and electronic check from 68 popular internet vendors.Minors received cigarettes from 32.4% of purchase attempts, all delivered by the US Postal Service (USPS) from overseas sellers. None failed due to age/ID verification. All failures were due to payment processing problems. USPS left 63.6% of delivered orders at the door with the remainder handed to minors with no age verification. 70.6% of vendors advertised light cigarettes and 60.3% flavoured, with 23.5% and 11.8%, respectively, delivered to the teens. Study credit cards were exposed to an estimated $7000 of fraudulent charges.Despite years of regulations restricting internet cigarette sales, poor vendor compliance and lack of shipper and federal enforcement leaves minors still able to obtain cigarettes (including 'light' and flavoured) online. The internet cigarette marketplace has shifted overseas, exposing buyers to widespread credit card fraud. Federal agencies should rigorously enforce existing internet cigarette sales laws to prevent illegal shipments from reaching US consumers, shut down non-compliant and fraudulent websites, and stop the theft and fraudulent use of credit card information provided online. Future studies should assess whether these agencies begin adequately enforcing the existing laws.
Project description:Starting university is an important time with respect to dietary changes. This study reports a novel approach to assessing student diet by utilising student-level food transaction data to explore dietary patterns. First-year students living in catered accommodation at the University of Leeds (UK) received pre-credited food cards for use in university catering facilities. Food card transaction data were obtained for semester 1, 2016 and linked with student age and sex. k-Means cluster analysis was applied to the transaction data to identify clusters of food purchasing behaviours. Differences in demographic and behavioural characteristics across clusters were examined using ?2 tests. The semester was divided into three time periods to explore longitudinal changes in purchasing patterns. Seven dietary clusters were identified: 'Vegetarian', 'Omnivores', 'Dieters', 'Dish of the Day', 'Grab-and-Go', 'Carb Lovers' and 'Snackers'. There were statistically significant differences in sex (P < 0·001), with women dominating the Vegetarian and Dieters, age (P = 0·003), with over 20s representing a high proportion of the Omnivores and time of day of transactions (P < 0·001), with Dieters and Snackers purchasing least at breakfast. Many students (n 474, 60·4 %) changed dietary cluster across the semester. This study demonstrates that transactional data present a feasible method for dietary assessment, collecting detailed dietary information over time and at scale, while eliminating participant burden and possible bias from self-selection, observation and attrition. It revealed that student diets are complex and that simplistic measures of diet, focusing on narrow food groups in isolation, are unlikely to adequately capture dietary behaviours.
Project description:This paper describes a dataset of 6284 land transactions prices and plot surfaces in 3 medium-sized cities in France (Besançon, Dijon and Brest). The dataset includes road accessibility as obtained from a minimization algorithm, and the amount of green space available to households in the neighborhood of the transactions, as evaluated from a land cover dataset. Further to the data presentation, the paper describes how these variables can be used to estimate the non-observable parameters of a residential choice function explicitly derived from a microeconomic model. The estimates are used by Caruso et al. (2015) to run a calibrated microeconomic urban growth simulation model where households are assumed to trade-off accessibility and local green space amenities.
Project description:Big Data is the buzzword of the modern century. With the invasion of pervasive computing, we live in a data centric environment, where we always leave a track of data related to our day to day activities. Be it a visit to a shopping mall or hospital or surfing Internet, we create voluminous data related to credit card transactions, user details, location information, and so on. These trails of data simply define an individual and form the backbone for user-profiling. With the mobile phones and their easy access to online social networks on the go, sensor data such as geo-taggings and events and sentiments around them contribute to the already overwhelming data containers. With reductions in the cost of storage and computational devices and with increasing proliferation of Cloud, we never felt any constraints in storing or processing such data. Eventually we end up having several exabytes of data and analysing them for their usefulness has introduced new frontiers of research. Effective distillation of these data is the need of the hour to improve the veracity of the Big Data. This research targets the utilization of the Fuzzy Bayesian process model to improve the quality of information in Big Data.
Project description:We investigate the networked nature of the Japanese credit market. Our investigation is performed with tools of network science. In our investigation we perform community detection with an algorithm which is identifying communities composed of both banks and firms. We show that the communities obtained by directly working on the bipartite network carry information about the networked nature of the Japanese credit market. Our analysis is performed for each calendar year during the time period from 1980 to 2011. To investigate the time evolution of the networked structure of the credit market we introduce a new statistical method to track the time evolution of detected communities. We then characterize the time evolution of communities by detecting for each time evolving set of communities the over-expression of attributes of firms and banks. Specifically, we consider as attributes the economic sector and the geographical location of firms and the type of banks. In our 32-year-long analysis we detect a persistence of the over-expression of attributes of communities of banks and firms together with a slow dynamic of changes from some specific attributes to new ones. Our empirical observations show that the credit market in Japan is a networked market where the type of banks, geographical location of firms and banks, and economic sector of the firm play a role in shaping the credit relationships between banks and firms.
Project description:Identity theft victimization is associated with serious physical and mental health morbidities. The problem is expanding as society becomes increasingly reliant on technology to store and transfer personally identifying information. Guided by lifestyle-routine activity theory, this study sought to identify risk and protective factors associated with identity theft victimization and determine whether individual-level behaviors, including frequency of online purchasing and data protection practices, are determinative of victimization. Data from sequential administrations of the U.S. National Crime Victimization Survey-Identity Theft Supplement (ITS) in 2012 and 2014 were combined (N = 128,419). Using multivariable logistic regression, risk and protective factors were examined for three subtypes: 1) unauthorized use of existing credit card/bank accounts, and unauthorized use of personal information to 2) open new accounts, or 3) engage in instrumental activities (e.g., applying for government benefits, receiving medical care, filing false tax returns). Existing credit card/bank accounts and new accounts identity theft victimization were associated with higher levels of online purchasing activity and prior identity theft victimization. All identity theft subtypes were associated with government/corporate data breaches and other crime victimization experiences. Routine individual-level preventive behaviors such as changing online passwords and shredding/destroying documents were protective. Identity theft subtypes showed divergent socio-demographic risk/protective profiles, with those of higher socioeconomic status more likely to be victims of existing credit card/bank account identity theft. Identity theft is a pervasive, growing problem with serious health and psychosocial consequences, yet individuals can engage in specific protective behaviors to mitigate victimization risk.
Project description:A model is presented for the supervised learning problem where the observations come from a fixed number of pre-specified groups, and the regression coefficients may vary sparsely between groups. The model spans the continuum between individual models for each group and one model for all groups. The resulting algorithm is designed with a high dimensional framework in mind. The approach is applied to a sentiment analysis dataset to show its efficacy and interpretability. One particularly useful application is for finding sub-populations in a randomized trial for which an intervention (treatment) is beneficial, often called the uplift problem. Some new concepts are introduced that are useful for uplift analysis. The value is demonstrated in an application to a real world credit card promotion dataset. In this example, although sending the promotion has a very small average effect, by targeting a particular subgroup with the promotion one can obtain a 15% increase in the proportion of people who purchase the new credit card.
Project description:The field of microfluidics has been struggling to obtain widespread market penetration. In order to overcome this struggle, a standardized and modular platform is introduced and applied. By providing easy-to-fabricate modular building blocks which are compatible with mass manufacturing, we decrease the gap from lab-to-fab. These standardized blocks are used in combination with an application-specific fluidic circuit board. On this board, electrical and fluidic connections are demonstrated by implementing an alternating current Coulter counter. This multipurpose building block is reusable in many applications. In this study, it identifies and counts 6 and 11 μm beads. The system is kept in a credit card-sized footprint, as a result of in-house-developed electronics and standardized building blocks. We believe that this easy-to-fabricate, credit card-sized, modular, and standardized prototype brings us closer to clinical and veterinary applications, because it provides an essential stepping stone to fully integrated point -of -care devices.