Trust & Safety
Detect fraud, spam, and abuse to keep platforms safe • 46 papers
Fraud, Credit and Claim Risk
Detect fraudulent transactions, assess credit risk, and identify insurance claim fraud
Isolation Forest
Tree-based anomaly isolation achieving O(n log n) complexity; the industry standard for fraud detection.
LOF: Identifying Density-Based Local Outliers
Introduced Local Outlier Factor assigning continuous 'degree of outlierness' for variable-density anomaly detection.
Anomaly Detection: A Survey
Comprehensive taxonomy covering classification, nearest-neighbor, clustering, and statistical approaches; 8,000+ citations.
Isolation-Based Anomaly Detection
Extended journal version with theoretical analysis; handles high-dimensional masking and swamping effects.
Estimating the Support of a High-Dimensional Distribution (One-Class SVM)
One-class SVM for novelty detection; foundational method for fraud detection when only normal data is available.
Deep Learning for Anomaly Detection: A Review
Comprehensive survey of deep learning anomaly detection; covers autoencoders, GANs, and self-supervised approaches.
A Survey of Credit Card Fraud, Credit and Claim Risk Techniques: Data and Technique Oriented Perspective
Survey comparing neural networks, genetic algorithms, and expert systems for credit card fraud detection.
Credit Card Fraud, Credit and Claim Risk: A Realistic Modeling and a Novel Learning Strategy
Addresses realistic fraud challenges: extreme class imbalance, concept drift, and delayed feedback loops.
Fraud, Credit and Claim Risk in Healthcare Claims Using Machine Learning: A Systematic Review
Comprehensive review analyzing ML techniques for health insurance fraud over two decades.
Insurance Fraud, Credit and Claim Risk: A Statistically Validated Network Approach
Network-based approach using statistically validated networks to detect coordinated fraud rings.
Detecting Insurance Fraud Using Supervised and Unsupervised Machine Learning
Field experiment showing supervised and unsupervised methods are complements, not substitutes.
OddBall: Spotting Anomalies in Weighted Graphs
Detects anomalies in weighted graphs using egonet features; foundational for fraud ring detection.
FRAUDAR: Bounding Graph Fraud in the Face of Camouflage
Detects dense subgraphs even when fraudsters add random edges to camouflage; handles lockstep behavior.
Heterogeneous Graph Neural Networks for Malicious Account Detection
GEM model using heterogeneous graphs (users, devices, transactions) for Alipay fraud detection.
NetWalk: A Flexible Deep Embedding Approach for Anomaly Detection in Dynamic Networks
Dynamic network embeddings for streaming anomaly detection; updates in O(1) per edge.
BotRGCN: Twitter Bot Detection with Relational Graph Convolutional Networks
Relational GCN exploiting follower/friend graphs achieves SOTA on bot detection benchmarks.
Spam & Abuse
Detect fake accounts and abusive behavior
Online Human-Bot Interactions: Detection, Estimation, and Characterization
Foundational Botometer paper; random forest on 1,000+ features estimating 9-15% of Twitter accounts are bots.
The Rise of Social Bots
Seminal paper defining social bots, detection challenges, and policy implications for platform manipulation.
Botometer 101: Social Bot Practicum for Computational Social Scientists
Practitioner guide for Botometer v4 with CAP scores, threshold selection, and case study methodology.
Scalable and Generalizable Social Bot Detection through Data Selection
Addresses cross-dataset generalization using ensemble of specialized classifiers.
Fake It Till You Make It: Reputation, Competition, and Yelp Review Fraud
First large-scale study of fake reviews; 16% of Yelp reviews flagged as fake, increasing with competition.
Promotional Reviews: An Empirical Investigation of Online Review Manipulation
Compares TripAdvisor vs Expedia reviews; finds review manipulation concentrated among independent hotels.
A Survey on Fake Review Detection Techniques
Comprehensive survey covering linguistic, behavioral, and graph-based fake review detection methods.
Content Moderation & Toxicity
Identify and remove harmful content
Automated Hate Speech Detection and the Problem of Offensive Language
Foundational 3-class dataset (hate/offensive/neither) with 24K labeled tweets; standard benchmark.
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers
Technical architecture behind Google Jigsaw's Perspective API; handles obfuscation, code-switching, multilingual toxicity.
Measuring and Mitigating Unintended Bias in Text Classification
Develops methods for measuring unintended identity-term bias in toxicity classifiers; foundational for fair ML.
HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection
First benchmark with rationale annotations for explainable hate speech detection across 20K posts.
Automatic Detection of Cyberbullying in Social Media Text
Multi-label cyberbullying detection with fine-grained categories; benchmark on Dutch social media.
Internet Argument Corpus 2.0: An SQL Schema for Dialogic Social Media and the Corpora to Go with It
Studies how subreddit norms shape behavior; users posting in banned communities post less toxic content after ban.
Account Security & Identity
Verify identities and secure accounts
Data Breaches, Phishing, or Malware? Understanding the Risks of Stolen Credentials
Google study on account hijacking vectors; SMS verification blocks 100% of automated bots and 96% of bulk phishing.
Risk-Based Authentication: Practical Deployments and Research Challenges
Analyzes RBA deployments at Google, Microsoft, Amazon; develops measurement framework.
Selective Graph Attention Networks for Account Takeover Detection
Graph neural networks modeling account-device-transaction relationships for ATO detection.
DeepAuth: Deep Learning Based Authentication for Anomaly Detection
Deep learning approach combining behavioral biometrics with session features for continuous authentication.
Framing the Underground Economy: An Ecosystem of Underground Market Sellers and Operators
Maps the underground economy of stolen accounts; traces supply chain from compromise to monetization.
Coordinated Manipulation & Information Operations
Detect state-sponsored trolls and coordinated inauthentic behavior
Who Let The Trolls Out? Towards Understanding State-Sponsored Trolls
Characterizes Russian IRA troll activity across platforms; develops detection methodology.
Disinformation as Collaborative Work: Surfacing the Participatory Nature of Strategic Information Operations
Framework for understanding information operations as collaborative work; case study of 2016 election.
Characterizing Twitter Users Who Engage with Russian Internet Research Agency
Analysis of 14M tweets by IRA; identifies patterns distinguishing troll engagement from organic users.
Exploring Content and Design Techniques in Coordinated Manipulation: A Survey
Comprehensive taxonomy of manipulation techniques across platforms and actor types.
Abuse Detection in Human Interaction
Detect personal attacks, harassment, and abusive language
Abusive Language Detection in Online User Content
Yahoo system combining n-grams, syntactic, and semantic features; production-scale abuse detection.
Deep Learning for Detecting Harassment in Social Media
CNN and RNN architectures for harassment detection; analyzes temporal patterns in abuse.
Ex Machina: Personal Attacks Seen at Scale
Wikipedia personal attack corpus with 100K+ labeled comments; crowdsourcing methodology for abuse annotation.
Deep Learning for User Comment Moderation
RNN with attention for comment moderation; deployed at Greek news organization.
Privacy & Data Misuse
Understand economics of privacy and detect data misuse
What Is Privacy Worth?
Experiments showing people value privacy but underestimate risks; foundational behavioral privacy study.
The Economics of Privacy
JEL survey covering market structure, price discrimination, and welfare effects of privacy regulation.
Differential Privacy
Foundational paper defining differential privacy; mathematical framework for privacy-preserving computation.
The Cost of Annoying Ads
Studies tradeoff between ad intrusiveness and platform revenue; privacy implications of targeting.