Skip to content

usnistgov/trojai-literature

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 

Repository files navigation

TrojAI Literature Review

The list below contains curated papers and arXiv articles that are related to Trojan attacks, backdoor attacks, and data poisoning on neural networks and machine learning systems. They are ordered "approximately" from most to least recent and articles denoted with a "*" mention the TrojAI program directly. Some of the particularly relevant papers include a summary that can be accessed by clicking the "Summary" drop down icon underneath the paper link. These articles were identified using variety of methods including:

  • flair embedding created from the arXiv CS subset
  • A trained ASReview random forest model
  • A curated manual literature review
  1. TA-CLEANER: A FINE-GRAINED TEXT ALIGNMENT BACKDOOR DEFENSE STRATEGY FOR MULTIMODAL CONTRASTIVE

  2. WEAK-TO-STRONG BACKDOOR ATTACKS FOR LLMS WITH CONTRASTIVE KNOWLEDGE DISTILLATION

  3. Data-centric NLP Backdoor Defense from the Lens of Memorization

  4. Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning Paradigm

  5. Large Language Models Are Better Adversaries: Exploring Generative Clean-Label Backdoor Attacks Against Text Classifiers

  6. PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models

  7. Here's a Free Lunch: Sanitizing Backdoored Models with Model Merge

  8. TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models

  9. A Survey of Backdoor Attacks and Defenses on Large Language Models: Implications for Security Measures

  10. Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning

  11. Transferring Backdoors between Large Language Models by Knowledge Distillation

  12. Composite Backdoor Attacks Against Large Language Models

  13. CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models

  14. LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario

  15. BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models

  16. BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents

  17. Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models

  18. BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models

  19. Is poisoning a real threat to LLM alignment? Maybe more so than you think

  20. ADAPTIVEBACKDOOR: Backdoored Language Model Agents that Detect Human Overseers

  21. Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment

  22. AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases

  23. Trojan Activation Attack: Red-Teaming Large Language Models using Activation Steering for Safety-Alignment

  24. Scaling Laws for Data Poisoning in LLMs

  25. BACKDOORLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models

  26. Simple Probes can catch sleeper agents

  27. Architectural Backdoors in Neural Networks

  28. On the Limitation of Backdoor Detection Methods

  29. Game of Trojans: Adaptive Adversaries Against Output-based Trojaned-Model Detectors

  30. Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment

  31. Architectural Neural Backdoors from First Principles

  32. ImpNet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks

  33. Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

  34. Physical Adversarial Attack meets Computer Vision: A Decade Survey

  35. Data Poisoning Attacks Against Multimodal Encoders

  36. MARNet: Backdoor Attacks Against Cooperative Multi-Agent Reinforcement Learning

  37. Not All Poisons are Created Equal: Robust Training against Data Poisoning

  38. Evil vs evil: using adversarial examples against backdoor attack in federated learning

  39. Auditing Visualizations: Transparency Methods Struggle to Detect Anomalous Behavior

  40. Defending Backdoor Attacks on Vision Transformer via Patch Processing

  41. Defense against backdoor attack in federated learning

  42. SentMod: Hidden Backdoor Attack on Unstructured Textual Data

  43. Adversarial poisoning attacks on reinforcement learning-driven energy pricing

  44. Natural Backdoor Datasets

  45. Backdoor Attacks and Defenses in Federated Learning: State-of-the-art, Taxonomy, and Future Directions

  46. VulnerGAN: a backdoor attack through vulnerability amplification against machine learning-based network intrusion detection systems

  47. Hiding Needles in a Haystack: Towards Constructing Neural Networks that Evade Verification

  48. TrojanZoo: Towards Unified, Holistic, and Practical Evaluation of Neural Backdoors

  49. Camouflaged Poisoning Attack on Graph Neural Networks

  50. BackdoorBench: A Comprehensive Benchmark of Backdoor Learning

  51. Fooling a Face Recognition System with a Marker-Free Label-Consistent Backdoor Attack

  52. Backdoor Attacks on Bayesian Neural Networks using Reverse Distribution

  53. Design of AI Trojans for Evading Machine Learning-based Detection of Hardware Trojans

  54. PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning

  55. Model-Contrastive Learning for Backdoor Defense

  56. Robust Anomaly based Attack Detection in Smart Grids under Data Poisoning Attacks

  57. Disguised as Privacy: Data Poisoning Attacks against Differentially Private Crowdsensing Systems

  58. Poisoning attack toward visual classification model

  59. Verifying Neural Networks Against Backdoor Attacks

  60. VPN: Verification of Poisoning in Neural Networks

  61. LinkBreaker: Breaking the Backdoor-Trigger Link in DNNs via Neurons Consistency Check

  62. A Study of the Attention Abnormality in Trojaned BERTs

  63. Universal Post-Training Backdoor Detection

  64. Planting Undetectable Backdoors in Machine Learning Models

  65. Natural Backdoor Attacks on Deep Neural Networks via Raindrops

  66. MPAF: Model Poisoning Attacks to Federated Learning based on Fake Clients

  67. PiDAn: A Coherence Optimization Approach for Backdoor Attack Detection and Mitigation in Deep Neural Networks

  68. ADFL: A Poisoning Attack Defense Framework for Horizontal Federated Learning

  69. Toward Realistic Backdoor Injection Attacks on DNNs using Rowhammer

  70. Execute Order 66: Targeted Data Poisoning for Reinforcement Learning via Minuscule Perturbations

  71. A Feature Based On-Line Detector to Remove Adversarial-Backdoors by Iterative Demarcation

  72. BlindNet backdoor: Attack on deep neural network using blind watermark

  73. DBIA: Data-free Backdoor Injection Attack against Transformer Networks

  74. Backdoor Attack through Frequency Domain

  75. NTD: Non-Transferability Enabled Backdoor Detection

  76. Romoa: Robust Model Aggregation for the Resistance of Federated Learning to Model Poisoning Attacks

  77. Generative strategy based backdoor attacks to 3D point clouds: Work in Progress

  78. Deep Neural Backdoor in Semi-Supervised Learning: Threats and Countermeasures

  79. FooBaR: Fault Fooling Backdoor Attack on Neural Network Training

  80. BFClass: A Backdoor-free Text Classification Framework

  81. Backdoor Attacks on Federated Learning with Lottery Ticket Hypothesis

  82. Data Poisoning against Differentially-Private Learners: Attacks and Defenses

  83. DOES DIFFERENTIAL PRIVACY DEFEAT DATA POISONING?

  84. Check Your Other Door! Establishing Backdoor Attacks in the Frequency Domain

  85. HaS-Nets: A Heal and Select Mechanism to Defend DNNs Against Backdoor Attacks for Data Collection Scenarios

  86. SanitAIs: Unsupervised Data Augmentation to Sanitize Trojaned Neural Networks

  87. COVID-19 Diagnosis from Chest X-Ray Images Using Convolutional Neural Networks and Effects of Data Poisoning

  88. Interpretability-Guided Defense against Backdoor Attacks to Deep Neural Networks

  89. Trojan Signatures in DNN Weights

  90. HOW TO INJECT BACKDOORS WITH BETTER CONSISTENCY: LOGIT ANCHORING ON CLEAN DATA

  91. A Synergetic Attack against Neural Network Classifiers combining Backdoor and Adversarial Examples

  92. Backdoor Attack and Defense for Deep Regression

  93. Use Procedural Noise to Achieve Backdoor Attack

  94. Excess Capacity and Backdoor Poisoning

  95. BatFL: Backdoor Detection on Federated Learning in e-Health

  96. Poisonous Label Attack: Black-Box Data Poisoning Attack with Enhanced Conditional DCGAN

  97. Backdoor Attacks on Network Certification via Data Poisoning

  98. Identifying Physically Realizable Triggers for Backdoored Face Recognition Networks

  99. Simtrojan: Stealthy Backdoor Attack

  100. Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Federated Learning

  101. Quantization Backdoors to Deep Learning Models

  102. Multi-Target Invisibly Trojaned Networks for Visual Recognition and Detection

  103. A Countermeasure Method Using Poisonous Data Against Poisoning Attacks on IoT Machine Learning

  104. FederatedReverse: A Detection and Defense Method Against Backdoor Attacks in Federated Learning

  105. Accumulative Poisoning Attacks on Real-time Data

  106. Inaudible Manipulation of Voice-Enabled Devices Through BackDoor Using Robust Adversarial Audio Attacks

  107. Stealthy Targeted Data Poisoning Attack on Knowledge Graphs

  108. BinarizedAttack: Structural Poisoning Attacks to Graph-based Anomaly Detection

  109. On the Effectiveness of Poisoning against Unsupervised Domain Adaptation

  110. Simple, Attack-Agnostic Defense Against Targeted Training Set Attacks Using Cosine Similarity

  111. Data Poisoning Attacks Against Outcome Interpretations of Predictive Models

  112. BDDR: An Effective Defense Against Textual Backdoor Attacks

  113. Poisoning attacks and countermeasures in intelligent networks: status quo and prospects

  114. The Devil is in the GAN: Defending Deep Generative Models Against Backdoor Attacks

  115. BadEncoder: Backdoor Attacks to Pre-trainedEncoders in Self-Supervised Learning

  116. BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning

  117. Can You Hear It? Backdoor Attacks via Ultrasonic Triggers

  118. Poisoning Attacks via Generative Adversarial Text to Image Synthesis

  119. Ant Hole: Data Poisoning Attack Breaking out the Boundary of Face Cluster

  120. Poison Ink: Robust and Invisible Backdoor Attack

  121. MT-MTD: Muti-Training based Moving Target Defense Trojaning Attack in Edged-AI network

  122. Text Backdoor Detection Using An Interpretable RNN Abstract Model

  123. Garbage in, Garbage out: Poisoning Attacks Disguised with Plausible Mobility in Data Aggregation

  124. Classification Auto-Encoder based Detector against Diverse Data Poisoning Attacks

  125. Poisoning Knowledge Graph Embeddings via Relation Inference Patterns

  126. Adversarial Training Time Attack Against Discriminative and Generative Convolutional Models

  127. Poisoning of Online Learning Filters: DDoS Attacks and Countermeasures

  128. Rethinking Stealthiness of Backdoor Attack against NLP Models

  129. Robust Learning for Data Poisoning Attacks

  130. SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics

  131. Poisoning the Search Space in Neural Architecture Search

  132. Data Poisoning Won’t Save You From Facial Recognition

  133. Accumulative Poisoning Attacks on Real-time Data

  134. Backdoor Attack on Machine Learning Based Android Malware Detectors

  135. Understanding the Limits of Unsupervised Domain Adaptation via Data Poisoning

  136. Indirect Invisible Poisoning Attacks on Domain Adaptation

  137. Fight Fire with Fire: Towards Robust Recommender Systems via Adversarial Poisoning Training

  138. Putting words into the system’s mouth: A targeted attack on neural machine translation using monolingual data poisoning

  139. SUBNET REPLACEMENT: DEPLOYMENT-STAGE BACKDOOR ATTACK AGAINST DEEP NEURAL NETWORKS IN GRAY-BOX SETTING

  140. Spinning Sequence-to-Sequence Models with Meta-Backdoors

  141. Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch

  142. Poisoning and Backdooring Contrastive Learning

  143. AdvDoor: Adversarial Backdoor Attack of Deep Learning System

  144. Defending against Backdoor Attacks in Natural Language Generation

  145. De-Pois: An Attack-Agnostic Defense against Data Poisoning Attacks

  146. Poisoning MorphNet for Clean-Label Backdoor Attack to Point Clouds

  147. Provable Guarantees against Data Poisoning Using Self-Expansion and Compatibility

  148. MLDS: A Dataset for Weight-Space Analysis of Neural Networks

  149. Poisoning the Unlabeled Dataset of Semi-Supervised Learning

  150. Regularization Can Help Mitigate Poisioning Attacks. . . With The Right Hyperparameters

  151. Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching

  152. Towards Robustness Against Natural Language Word Substitutions

  153. Concealed Data Poisoning Attacks on NLP Models

  154. Covert Channel Attack to Federated Learning Systems

  155. Backdoor Attacks Against Deep Learning Systems in the Physical World

  156. Backdoor Attacks on Self-Supervised Learning

  157. Transferable Environment Poisoning: Training-time Attack on Reinforcement Learning

  158. Investigation of a differential cryptanalysis inspired approach for Trojan AI detection

  159. Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers

  160. Robust Backdoor Attacks against Deep Neural Networks in Real Physical World

  161. The Design and Development of a Game to Study Backdoor Poisoning Attacks: The Backdoor Game

  162. A Backdoor Attack against 3D Point Cloud Classifiers

  163. Explainability-based Backdoor Attacks Against Graph Neural Networks

  164. DeepSweep: An Evaluation Framework for Mitigating DNN Backdoor Attacks using Data Augmentation

  165. Rethinking the Backdoor Attacks' Triggers: A Frequency Perspective

  166. PointBA: Towards Backdoor Attacks in 3D Point Cloud

  167. Online Defense of Trojaned Models using Misattributions

  168. Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

  169. SPECTRE: Defending Against Backdoor Attacks Using Robust Covariance Estimation

  170. Black-box Detection of Backdoor Attacks with Limited Information and Data

  171. TOP: Backdoor Detection in Neural Networks via Transferability of Perturbation

  172. T-Miner : A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification

  173. Hidden Backdoor Attack against Semantic Segmentation Models

  174. What Doesn't Kill You Makes You Robust(er): Adversarial Training against Poisons and Backdoors

  175. Red Alarm for Pre-trained Models: Universal Vulnerabilities by Neuron-Level Backdoor Attacks

  176. Provable Defense Against Delusive Poisoning

  177. An Approach for Poisoning Attacks Against RNN-Based Cyber Anomaly Detection

  178. Backdoor Scanning for Deep Neural Networks through K-Arm Optimization

  179. TAD: Trigger Approximation based Black-box Trojan Detection for AI*

  180. WaNet - Imperceptible Warping-based Backdoor Attack

  181. Data Poisoning Attack on Deep Neural Network and Some Defense Methods

  182. Baseline Pruning-Based Approach to Trojan Detection in Neural Networks*

  183. Covert Model Poisoning Against Federated Learning: Algorithm Design and Optimization

  184. Property Inference from Poisoning

  185. TROJANZOO: Everything you ever wanted to know about neural backdoors (but were afraid to ask)

  186. A Master Key Backdoor for Universal Impersonation Attack against DNN-based Face Verification

  187. Detecting Universal Trigger's Adversarial Attack with Honeypot

  188. ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

  189. Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks

  190. Data Poisoning Attacks to Deep Learning Based Recommender Systems

  191. Backdoors hidden in facial features: a novel invisible backdoor attack against face recognition systems

  192. One-to-N & N-to-One: Two Advanced Backdoor Attacks against Deep Learning Models

  193. DeepPoison: Feature Transfer Based Stealthy Poisoning Attack

  194. Policy Teaching via Environment Poisoning:Training-time Adversarial Attacks against Reinforcement Learning

  195. Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features

  196. SPA: Stealthy Poisoning Attack

  197. Backdoor Attack with Sample-Specific Triggers

  198. Explainability Matters: Backdoor Attacks on Medical Imaging

  199. Escaping Backdoor Attack Detection of Deep Learning

  200. Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks

  201. Poisoning Attacks on Cyber Attack Detectors for Industrial Control Systems

  202. Fair Detection of Poisoning Attacks in Federated Learning

  203. Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification*

  204. Stealthy Poisoning Attack on Certified Robustness

  205. Machine Learning with Electronic Health Records is vulnerable to Backdoor Trigger Attacks

  206. Data Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses

  207. Detection of Backdoors in Trained Classifiers Without Access to the Training Set

  208. TROJANZOO: Everything you ever wanted to know about neural backdoors(but were afraid to ask)

  209. HaS-Nets: A Heal and Select Mechanism to Defend DNNs Against Backdoor Attacks for Data Collection Scenarios

  210. DeepSweep: An Evaluation Framework for Mitigating DNN Backdoor Attacks using Data Augmentation

  211. Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder

  212. Strong Data Augmentation Sanitizes Poisoning and Backdoor Attacks Without an Accuracy Tradeoff

  213. BaFFLe: Backdoor detection via Feedback-based Federated Learning

  214. Detecting Backdoors in Neural Networks Using Novel Feature-Based Anomaly Detection

  215. Mitigating Backdoor Attacks in Federated Learning

  216. FaceHack: Triggering backdoored facial recognition systems using facial characteristics

  217. Customizing Triggers with Concealed Data Poisoning

  218. Backdoor Learning: A Survey

  219. Rethinking the Trigger of Backdoor Attack

  220. AEGIS: Exposing Backdoors in Robust Machine Learning Models

  221. Weight Poisoning Attacks on Pre-trained Models

  222. Poisoned classifiers are not only backdoored, they are fundamentally broken

  223. Input-Aware Dynamic Backdoor Attack

  224. Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing

  225. BAAAN: Backdoor Attacks Against Autoencoder and GAN-Based Machine Learning Models

  226. Don’t Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks

  227. Toward Robustness and Privacy in Federated Learning: Experimenting with Local and Central Differential Privacy

  228. CLEANN: Accelerated Trojan Shield for Embedded Neural Networks

  229. Witches’ Brew: Industrial Scale Data Poisoning via Gradient Matching

  230. Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks

  231. Can Adversarial Weight Perturbations Inject Neural Backdoors?

  232. Trojaning Language Models for Fun and Profit

  233. Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases

  234. Class-Oriented Poisoning Attack

  235. Noise-response Analysis for Rapid Detection of Backdoors in Deep Neural Networks

  236. Cassandra: Detecting Trojaned Networks from Adversarial Perturbations

  237. Backdoor Learning: A Survey

  238. Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review

  239. Live Trojan Attacks on Deep Neural Networks

  240. Odyssey: Creation, Analysis and Detection of Trojan Models

  241. Data Poisoning Attacks Against Federated Learning Systems

  242. Blind Backdoors in Deep Learning Models

  243. Deep Learning Backdoors

  244. Attack of the Tails: Yes, You Really Can Backdoor Federated Learning

  245. Backdoor Attacks on Facial Recognition in the Physical World

  246. Graph Backdoor

  247. Backdoor Attacks to Graph Neural Networks

  248. You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion

  249. Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks

  250. Trembling triggers: exploring the sensitivity of backdoors in DNN-based face recognition

  251. Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks

  252. Adversarial Machine Learning -- Industry Perspectives

  253. ConFoc: Content-Focus Protection Against Trojan Attacks on Neural Networks

  254. Model-Targeted Poisoning Attacks: Provable Convergence and Certified Bounds

  255. Deep Partition Aggregation: Provable Defense against General Poisoning Attacks

  256. The TrojAI Software Framework: An OpenSource tool for Embedding Trojans into Deep Learning Models*

  257. Influence Function based Data Poisoning Attacks to Top-N Recommender Systems

  258. BadNL: Backdoor Attacks Against NLP Models

    Summary
    • Introduces first example of backdoor attacks against NLP models using Char-level, Word-level, and Sentence-level triggers (these different triggers operate on the level of their descriptor)
      • Word-level trigger picks a word from the target model’s dictionary and uses it as a trigger
      • Char-level trigger uses insertion, deletion or replacement to modify a single character in a chosen word’s location (with respect to the sentence, for instance, at the start of each sentence) as the trigger.
      • Sentence-level trigger changes the grammar of the sentence and use this as the trigger
    • Authors impose an additional constraint that requires inserted triggers to not change the sentiment of text input
    • Proposed backdoor attack achieves 100% backdoor accuracy with only a drop of 0.18%, 1.26%, and 0.19% in the models utility, for the IMDB, Amazon, and Stanford Sentiment Treebank datasets
  259. Neural Network Calculator for Designing Trojan Detectors*

  260. Dynamic Backdoor Attacks Against Machine Learning Models

  261. Vulnerabilities of Connectionist AI Applications: Evaluation and Defence

  262. Backdoor Attacks on Federated Meta-Learning

  263. Defending Support Vector Machines against Poisoning Attacks: the Hardness and Algorithm

  264. Backdoors in Neural Models of Source Code

  265. A new measure for overfitting and its implications for backdooring of deep learning

  266. An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks

  267. MetaPoison: Practical General-purpose Clean-label Data Poisoning

  268. Backdooring and Poisoning Neural Networks with Image-Scaling Attacks

  269. Bullseye Polytope: A Scalable Clean-Label Poisoning Attack with Improved Transferability

  270. On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping

  271. A Survey on Neural Trojans

  272. STRIP: A Defence Against Trojan Attacks on Deep Neural Networks

    Summary
    • Authors introduce a run-time based trojan detection system called STRIP or STRong Intentional Pertubation which focuses on models in computer vision
    • STRIP works by intentionally perturbing incoming inputs (ie. by image blending) and then measuring entropy to determine whether the model is trojaned or not. Low entropy violates the input-dependance assumption for a clean model and thus indicates corruption
    • Authors validate STRIPs efficacy on MNIST,CIFAR10, and GTSRB acheiveing false acceptance rates of below 1%
  273. TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents

  274. Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection

  275. Regula Sub-rosa: Latent Backdoor Attacks on Deep Neural Networks

  276. Februus: Input Purification Defense Against Trojan Attacks on Deep Neural Network Systems

  277. TBT: Targeted Neural Network Attack with Bit Trojan

  278. Bypassing Backdoor Detection Algorithms in Deep Learning

  279. A backdoor attack against LSTM-based text classification systems

  280. Invisible Backdoor Attacks Against Deep Neural Networks

  281. Detecting AI Trojans Using Meta Neural Analysis

  282. Label-Consistent Backdoor Attacks

  283. Detection of Backdoors in Trained Classifiers Without Access to the Training Set

  284. ABS: Scanning neural networks for back-doors by artificial brain stimulation

  285. NeuronInspect: Detecting Backdoors in Neural Networks via Output Explanations

  286. Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs

  287. Programmable Neural Network Trojan for Pre-Trained Feature Extractor

  288. Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection

  289. TamperNN: Efficient Tampering Detection of Deployed Neural Nets

  290. TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems

  291. Design of intentional backdoors in sequential models

  292. Design and Evaluation of a Multi-Domain Trojan Detection Method on ins Neural Networks

  293. Poison as a Cure: Detecting & Neutralizing Variable-Sized Backdoor Attacks in Deep Neural Networks

  294. Data Poisoning Attacks on Stochastic Bandits

  295. Hidden Trigger Backdoor Attacks

  296. Deep Poisoning Functions: Towards Robust Privacy-safe Image Data Sharing

  297. A new Backdoor Attack in CNNs by training set corruption without label poisoning

  298. Deep k-NN Defense against Clean-label Data Poisoning Attacks

  299. Transferable Clean-Label Poisoning Attacks on Deep Neural Nets

  300. Revealing Backdoors, Post-Training, in DNN Classifiers via Novel Inference on Optimized Perturbations Inducing Group Misclassification

  301. Explaining Vulnerabilities to Adversarial Machine Learning through Visual Analytics

  302. Subpopulation Data Poisoning Attacks

  303. TensorClog: An imperceptible poisoning attack on deep neural network applications

  304. DeepInspect: A black-box trojan detection and mitigation framework for deep neural networks

  305. Resilience of Pruned Neural Network Against Poisoning Attack

  306. Spectrum Data Poisoning with Adversarial Deep Learning

  307. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks

  308. SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems

    Summary
    • Authors develop SentiNet detection framework for locating universal attacks on neural networks
    • SentiNet is ambivalent to the attack vectors and uses model visualization / object detection techniques to extract potential attacks regions from the models input images. The potential attacks regions are identified as being the parts that influence the prediction the most. After extraction, SentiNet applies these regions to benign inputs and uses the original model to analyze the output
    • Authors stress test the SentiNet framework on three different types of attacks— data poisoning attacks, Trojan attacks, and adversarial patches. They are able to show that the framework achieves competitive metrics across all of the attacks (average true positive rate of 96.22% and an average true negative rate of 95.36%)
  309. PoTrojan: powerful neural-level trojan designs in deep learning models

  310. Hardware Trojan Attacks on Neural Networks

  311. Spectral Signatures in Backdoor Attacks

    Summary
    • Identified a "spectral signatures" property of current backdoor attacks which allows the authors to use robust statistics to stop Trojan attacks
    • The "spectral signature" refers to a change in the covariance spectrum of learned feature representations that is left after a network is attacked. This can be detected by using singular value decomposition (SVD). SVD is used to identify which examples to remove from the training set. After these examples are removed the model is retrained on the cleaned dataset and is no longer Trojaned. The authors test this method on the CIFAR 10 image dataset.
  312. Defending Neural Backdoors via Generative Distribution Modeling

  313. Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering

    Summary
    • Proposes Activation Clustering approach to backdoor detection/ removal which analyzes the neural network activations for anomalies and works for both text and images
    • Activation Clustering uses dimensionality techniques (ICA, PCA) on the activations and then clusters them using k-means (k=2) along with a silhouette score metric to separate poisoned from clean clusters
    • Shows that Activation Clustering is successful on three different image/datasets (MNIST, LISA, Rotten Tomatoes) as well as in settings where multiple Trojans are inserted and classes are multi-modal
  314. Model-Reuse Attacks on Deep Learning Systems

  315. How To Backdoor Federated Learning

  316. Trojaning Attack on Neural Networks

  317. Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks

    Summary
    • Proposes neural network poisoning attack that uses "clean labels" which do not require the adversary to mislabel training inputs
    • The paper also presents a optimization based method for generating their poisoning attacks and provides a watermarking strategy for end-to-end attacks that improves the poisoning reliability
    • Authors demonstrate their method by using generated poisoned frog images from the CIFAR dataset to manipulate different kinds of image classifiers
  318. Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks

    Summary
    • Investigate two potential detection methods for backdoor attacks (Fine-tuning and pruning). They find both are insufficient on their own and thus propose a combined detection method which they call "Fine-Pruning"
    • Authors go on to show that on three backdoor techniques "Fine-Pruning" is able to eliminate or reduce Trojans on datasets in the traffic sign, speech, and face recognition domains
  319. Technical Report: When Does Machine Learning FAIL? Generalized Transferability for Evasion and Poisoning Attacks

  320. Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation

  321. Hu-Fu: Hardware and Software Collaborative Attack Framework against Neural Networks

  322. Attack Strength vs. Detectability Dilemma in Adversarial Machine Learning

  323. Data Poisoning Attacks in Contextual Bandits

  324. BEBP: An Poisoning Method Against Machine Learning Based IDSs

  325. Generative Poisoning Attack Method Against Neural Networks

  326. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

    Summary
    • Introduce Trojan Attacks— a type of attack where an adversary can create a maliciously trained network (a backdoored neural network, or a BadNet) that has state-of-the-art performance on the user’s training and validation samples, but behaves badly on specific attacker-chosen inputs
    • Demonstrate backdoors in a more realistic scenario by creating a U.S. street sign classifier that identifies stop signs as speed limits when a special sticker is added to the stop sign
  327. Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization

  328. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

  329. Neural Trojans

  330. Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization

  331. Certified defenses for data poisoning attacks

  332. Data Poisoning Attacks on Factorization-Based Collaborative Filtering

  333. Data poisoning attacks against autoregressive models

  334. Using machine teaching to identify optimal training-set attacks on machine learners

  335. Poisoning Attacks against Support Vector Machines

  336. Backdoor Attacks against Learning Systems

  337. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

  338. Antidote: Understanding and defending against poisoning of anomaly detectors

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published