RAID 2023 – The 26th International Symposium on Research in Attacks, Intrusions and Defenses

Accepted papers (Open Access)

RAID '23: Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses

Full Citation in the ACM Digital Library

SESSION: IoT / Firmware / Binaries

Black-box Attacks Against Neural Binary Function Detection

Joshua Bundt
Michael Davinroy
Ioannis Agadakos
Alina Oprea
William Robertson

Binary analyses based on deep neural networks (DNNs), or neural binary analyses (NBAs), have become a hotly researched topic in recent years. DNNs have been wildly successful at pushing the performance and accuracy envelopes in the natural language and image processing domains. Thus, DNNs are highly promising for solving binary analysis problems that are hard due to a lack of complete information resulting from the lossy compilation process. Despite this promise, it is unclear that the prevailing strategy of repurposing embeddings and model architectures originally developed for other problem domains is sound given the adversarial contexts under which binary analysis often operates.

In this paper, we empirically demonstrate that the current state of the art in neural function boundary detection is vulnerable to both inadvertent and deliberate adversarial attacks. We proceed from the insight that current generation NBAs are built upon embeddings and model architectures intended to solve syntactic problems. We devise a simple, reproducible, and scalable black-box methodology for exploring the space of inadvertent attacks – instruction sequences that could be emitted by common compiler toolchains and configurations – that exploits this syntactic design focus. We then show that these inadvertent misclassifications can be exploited by an attacker, serving as the basis for a highly effective black-box adversarial example generation process. We evaluate this methodology against two state-of-the-art neural function boundary detectors: XDA and DeepDi. We conclude with an analysis of the evaluation data and recommendations for how future research might avoid succumbing to similar attacks.

Extracting Threat Intelligence From Cheat Binaries For Anti-Cheating

Md Sakib Anwar
Chaoshun Zuo
Carter Yagemann
Zhiqiang Lin

Rampant cheating remains a serious concern for game developers who fear losing loyal customers and revenue. While numerous anti-cheating techniques have been proposed, cheating persists in a vibrant (and profitable) illicit market. Inspired by novel insights into the economics behind cheat development and recent techniques for defending against advanced persistent threats (APTs), we propose a fully automated methodology for extracting “cheat intelligence” from widely distributed cheat binaries to produce a “memory access graph” that guides selective data randomization to yield immune game clients. We have implemented a prototype system for Android and Windows games, CheatFighter, and evaluated it on 86 cheats collected from a variety of real-world sources, including Telegram channels and online forums. CheatFighter successfully counteracts 80 of the real-world cheats in under a minute, demonstrating practical end-to-end protection against widespread cheating.

Shimware: Toward Practical Security Retrofitting for Monolithic Firmware Images

Eric Gustafson
Paul Grosen
Nilo Redini
Saagar Jha
Andrea Continella
Ruoyu Wang
Kevin Fu
Sara Rampazzi
Christopher Kruegel
Giovanni Vigna

In today’s era of the Internet of Things, we are surrounded by security- and safety-critical, network-connected devices. In parallel with the rise in attacks on such devices, we have also seen an increase in devices that are abandoned, reached the end of their support periods, or will not otherwise receive future security updates. While this issue exists for a wide array of devices, those that use monolithic firmware, where the code and data are opaquely intermixed, have traditionally been difficult to examine and protect.

In this paper, we explore the challenges of retrofitting monolithic firmware images with new security measures. First, we outline the steps any analyst must take to retrofit firmware, and show that previous work is missing crucial aspects of the process, which are required for a practical solution. We then automate three of these aspects—locating attacker-controlled input, a safe retrofit injection location, and self-checks preventing modifications—through the use of novel automated program analysis techniques. We assemble these analyses into a system, Shimware, that can simplify and facilitate the process of creating a retrofitted firmware image, once the vulnerability is identified.

To evaluate Shimware, we employ both a synthetic evaluation and actual retrofitting of three case study devices: a networked bench power supply, a Bluetooth-enabled cardiac implant monitor, and a high-end programmable logic controller (PLC). Not only could our system identify the correct sources of input, injection locations, and self-checks, but it injected payloads to correct serious safety and security-critical vulnerabilities in these devices.

MP-Mediator: Detecting and Handling the New Stealthy Delay Attacks on IoT Events and Commands

Xuening Xu
Chenglong Fu
Xiaojiang Du

In recent years, intelligent and automated device control features have led to a significant increase in the adoption of smart home IoT systems. Each IoT device sends its events to (and receives commands from) the corresponding IoT server/platform, which executes automation rules set by the user. Recent studies have shown that IoT messages, including events and commands, are subject to stealthy delays ranging from several seconds to minutes, or even hours, without raising any alerts. Exploiting this vulnerability, adversaries can intentionally delay crucial events (e.g., fire alarms) or commands (e.g., locking a door), as well as alter the order of IoT messages that dictate automation rule execution. This manipulation can deceive IoT servers, leading to incorrect command issuance and jeopardizing smart home safety. In this paper, we present MP-Mediator, which is the first defense system that can detect and handle the new, stealthy, and widely applicable delay attacks on IoT messages. For IoT devices lacking accessible APIs, we propose innovative methods leveraging virtual devices and virtual rules as a bridge for indirect integration with MP-Mediator. Furthermore, a VPN-based component is proposed to handle command delay attacks on critical links. We implement and evaluate MP-Mediator in a real-world smart home testbed with twenty-two popular IoT devices and two major IoT automation platforms (IFTTT and Samsung SmartThings). The experimental results show that MP-Mediator can quickly and accurately detect the delay attacks on both IoT events and commands with a precision of more than 96% and a recall of 100%, as well as effectively handle the delay attacks.

BitDance: Manipulating UART Serial Communication with IEMI

Zhixin Xie
Chen Yan
Xiaoyu Ji
Wenyuan Xu

Wired serial communication protocols such as UART are widely used in today’s IoT systems for their simple connection and good industry ecology. However, due to the simplicity of these protocols, they are vulnerable to attacks that falsify the communication. In this work, we propose the BitDance attack that can arbitrarily flip the bits of serial communication without any physical contact utilizing intentional electromagnetic interference (IEMI). We describe the physical process of how electromagnetic interference influences the voltage, build up a model to demonstrate the bit-level control principle of our work, and implement the attack on 6 different sensors with UART, a widely used serial communication protocol. The result shows we can inject bit-level information and disable legitimate communication from the system with a maximum success rate of 45.4 and 100. Finally, we propose countermeasures to mitigate the impact of this attack.

SESSION: IDS and Applied Crypto

EdgeTorrent: Real-time Temporal Graph Representations for Intrusion Detection

Isaiah J. King
Xiaokui Shu
Jiyong Jang
Kevin Eykholt
Taesung Lee
H. Howie Huang

Anomaly-based intrusion detection aims to learn the normal behaviors of a system and detect activity that deviates from it. One of the best ways to represent the behavior of a computer network is through provenance graphs: dynamic networks of entity interactions over time. When provenance graphs deviate from their normal behaviors, it could be indicative of a malicious actor attempting to compromise the network. However, efficiently characterizing the normal behavior of large temporal graphs is challenging. To do this, we propose EdgeTorrent, an end-to-end anomaly-based intrusion detection system for provenance graph analysis. EdgeTorrent leverages a novel high-performance message passing neural network for graph embedding over a stream of edges to capture both temporal and topological changes in the system. These embeddings are then processed by a novel adversarially trained sequence analyzer that alerts when a series of graph embeddings changes in an unexpected way. EdgeTorrent preserves temporal ordering during message passing, and its streaming-focused design allows users to conduct out-of-core inference on billion-edge graphs, faster than real-time. We show that our method outperforms state-of-the-art graph-kernel approaches on several host monitoring data sets; notably, it is the first intrusion detection system to perfectly classify the StreamSpot data set. Additionally, we show it is the best-performing method on a real-world, billion-edge data set encompassing 11 days of benign and attack data.

Looking Beyond IoCs: Automatically Extracting Attack Patterns from External CTI

Md Tanvirul Alam
Dipkamal Bhusal
Youngja Park
Nidhi Rastogi

Public and commercial organizations extensively share cyberthreat intelligence (CTI) to prepare systems to defend against existing and emerging cyberattacks. However, traditional CTI has primarily focused on tracking known threat indicators such as IP addresses and domain names, which may not provide long-term value in defending against evolving attacks. To address this challenge, we propose to use more robust threat intelligence signals called attack patterns. LADDER is a knowledge extraction framework that can extract text-based attack patterns from CTI reports at scale. The framework characterizes attack patterns by capturing the phases of an attack in Android and enterprise networks and systematically maps them to the MITRE ATT&CK pattern framework. LADDER can be used by security analysts to determine the presence of attack vectors related to existing and emerging threats, enabling them to prepare defenses proactively. We also present several use cases to demonstrate the application of LADDER in real-world scenarios. Finally, we provide a new, open-access benchmark malware dataset to train future cyberthreat intelligence models.

Temporary Block Withholding Attacks on Filecoin’s Expected Consensus

Tong Cao
Xin Li

Filecoin is the most impactful storage-oriented cryptocurrency. In this system, miners dedicate their storage space to the network and verify transactions to earn rewards. Nowadays, Filecoin’s network capacity has surpassed 15 exbibytes.

In this paper, we propose three temporary block withholding attacks to challenge Filecoin’s expected consensus (EC). Specifically, we first deconstruct EC following old-fashioned methods (which have been widely developed since 2009) to analyze the advantages and disadvantages of EC’s design. We then present three temporary block withholding schemes by leveraging the shortcomings of EC. We build Markov Decision Process (MDP) models for the three attacks to calculate the adversary’s gains. We develop Monte Carlo simulators to mimic the mining strategies of the adversary and other miners and indicate the impacts of the three attacks on expectation. As a result, we show that our three attacks have significant impacts on Filecoin’s mining fairness and transaction throughput. For instance, when honest miners who control more than half the global storage power update their tipsets (i.e., the collection of blocks in the same epoch that have the same parents) after the default transmission cutoff time, an adversary with 1% of the global storage power is able to launch temporary block withholding attacks without a loss in revenue, which could affect Filecoin’s security and performance. Finally, we discuss the implications of our attacks and propose several countermeasures to mitigate them.

How (Not) to Build Threshold EdDSA

Harry W. H. Wong
Jack P. K. Ma
Hoover H. F. Yin
Sherman S. M. Chow

Edwards-curve digital signature algorithm (EdDSA) is a highly efficient scheme with a short key size. It is derived from the threshold-friendly Schnorr signatures and is covered by the NIST standardization efforts of threshold cryptographic primitives. Nevertheless, extending its deterministic nonce generation to the threshold setting requires heavyweight cryptographic techniques, even when the hash function is replaced with one optimized for secure multi-party computation. Indeed, an efficient extension to the threshold setting is considered a major challenge by NIST and academia.

In RAID 2022, a threshold EdDSA scheme is proposed with the nonce generation using only modular addition instead of a hash. This paper unveils the security flaw of this efficient design. We also propose a generic hybrid approach with a showcase of extending a state-of-the-art threshold Schnorr signature scheme. It enjoys a similar level of immunity to side-channel or fault injection attacks as the more heavyweight threshold extension of deterministic nonce generation, but is much more efficient due to its simplicity.

Towards Understanding Alerts raised by Unsupervised Network Intrusion Detection Systems

Maxime Lanvin
Pierre-François Gimenez
Yufei Han
Frédéric Majorczyk
Ludovic Mé
Eric Totel

The use of Machine Learning for anomaly detection in cyber security-critical applications, such as intrusion detection systems, has been hindered by the lack of explainability. Without understanding the reason behind anomaly alerts, it is too expensive or impossible for human analysts to verify and identify cyber-attacks. Our research addresses this challenge and focuses on unsupervised network intrusion detection, where only benign network traffic is available for training the detection model. We propose a novel post-hoc explanation method, called AE-pvalues, which is based on the p-values of the reconstruction errors produced by an Auto-Encoder-based anomaly detection method. Our work identifies the most informative network traffic features associated with an anomaly alert, providing interpretations for the generated alerts. We conduct an empirical study using a large-scale network intrusion dataset, CICIDS2017, to compare the proposed AE-pvalues method with two state-of-the-art baselines applied in the unsupervised anomaly detection task. Our experimental results show that the AE-pvalues method accurately identifies abnormal influential network traffic features. Furthermore, our study demonstrates that the explanation outputs can help identify different types of network attacks in the detected anomalies, enabling human security analysts to understand the root cause of the anomalies and take prompt action to strengthen security measures.

SESSION: Deep into Systems and Formats

CTPP: A Fast and Stealth Algorithm for Searching Eviction Sets on Intel Processors

Zihan Xue
Jinchi Han
Wei Song

Eviction sets are essential components of the conflict-based cache side-channel attacks. However, it is not an easy task to construct eviction sets on modern Intel processors. As a promising defense against conflict-based cache side-channels, dynamic cache randomization makes the construction of eviction sets even more difficult by periodically randomizing the mapping between addresses and cache set indices. It forces attackers to develop fast search algorithms to find an eviction set at runtime with the lowest latency. Several fast search algorithms have been proposed in recent years. By using these algorithms, attackers regain the capability of launching useful attacks on dynamically randomized caches. Consequently, a detector was recently introduced to catch the fast search algorithms in action according to the uneven distribution of cache evictions. All existing fast search algorithms fail to work.

We present a new eviction set search algorithm called Conflict Testing with Probe+Prune (CTPP). Based on the evaluation on six Intel processors and a behavioral cache model, CTPP is found to achieve the lowest latency in finding an eviction set in all algorithms, potentially escape from the recently proposed detector, and present a strong tolerance to environmental noise.

Characterizing and Mitigating Touchtone Eavesdropping in Smartphone Motion Sensors

Connor Bolton
Yan Long
Jun Han
Josiah Hester
Kevin Fu

Smartphone motion sensors provide cybersecurity attackers with a stealthy way to eavesdrop on nearby acoustic information. Eavesdropping on touchtones emitted by smartphone speakers when users input numbers into their phones exposes sensitive information such as credit card information, banking PINs, and social security card numbers to malicious applications with access to only motion sensor data. This work characterizes this new security threat of touchtone eavesdropping by providing an analysis based on physics and signal processing theory. We show that advanced adversaries who selectively integrate data from multiple motion sensors and multiple sensor axes can achieve over 99% accuracy on recognizing 12 unique touchtones. We further design, analyze, and evaluate several mitigations which could be implemented in a smartphone update. We found that some apparent mitigations such as low-pass filters can undesirably reduce the motion sensor data to benign applications by 83% but only reduce an advanced adversary’s accuracy by less than one percent. Other more informed designs such as anti-aliasing filters can fully preserve the motion sensor data to support benign application functionality while reducing attack accuracy by 50.1%.

Security Analysis of the 3MF Data Format

Jost Rossel
Vladislav Mladenov
Juraj Somorovsky

3D printing is a well-established technology with rapidly increasing usage scenarios both in the industry and consumer context. The growing popularity of 3D printing has also attracted security researchers, who have analyzed possibilities for weakening 3D models or stealing intellectual property from 3D models. We extend these important aspects and provide the first comprehensive security analysis of 3D printing data formats. We performed our systematic study on the example of the 3D Manufacturing Format (3MF), which offers a large variety of features that could lead to critical attacks. Based on 3MF’s features, we systematized three attack goals: Data Exfiltration (dex), Denial of Service, and UI Spoofing (uis). We achieve these goals by exploiting the complexity of 3MF, which is based on the Open Packaging Conventions (OPC) format and uses XML to define 3D models. In total, our analysis led to 352 tests. To create and run these tests automatically, we implemented an open-source tool named 3MF Analyzer (tool), which helped us evaluate 20 applications.

Beware of Pickpockets: A Practical Attack against Blocking Cards

Marco Alecci
Luca Attanasio
Alessandro Brighente
Mauro Conti
Eleonora Losiouk
Hideki Ochiai
Federico Turrin

Today, we rely on contactless smart cards to perform several critical operations (e.g., payments and accessing buildings). Attacking smart cards can have severe consequences, such as losing money or leaking sensitive information. Although the security protections embedded in smart cards have evolved over the years, those with weak security properties are still commonly used. Among the different solutions, blocking cards are affordable devices to protect smart cards. These devices are placed close to the smart cards, generating a noisy jamming signal or shielding them. Whereas vendors claim the reliability of their blocking cards, no previous study has ever focused on evaluating their effectiveness.

In this paper, we shed light on the security threats on smart cards in the presence of blocking cards, showing the possibility of being bypassed by an attacker. We analyze blocking cards by inspecting their emitted signal and assessing a vulnerability in their internal design. We propose a novel attack that bypasses the jamming signal emitted by a blocking card and reads the content of the smart card.

We evaluate the effectiveness of 11 blocking cards when protecting a MIFARE Ultralight smart card and a MIFARE Classic card. Of these 11 cards, we managed to bypass 8 of them and successfully dump the content of a smart card despite the presence of the blocking card. Our findings highlight that the noise type implemented by the blocking cards highly affects the protection level achieved by them. Based on this observation, we propose a countermeasure that may lead to the design of effective blocking cards. To further improve security, we released the tool we developed to inspect the spectrum emitted by blocking cards and set up our attack.

Quarantine: Mitigating Transient Execution Attacks with Physical Domain Isolation

Mathé Hertogh
Manuel Wiesinger
Sebastian Österlund
Marius Muench
Nadav Amit
Herbert Bos
Cristiano Giuffrida

Since the Spectre and Meltdown disclosure in 2018, the list of new transient execution vulnerabilities that abuse the shared nature of microarchitectural resources on CPU cores has been growing rapidly. In response, vendors keep deploying “spot” (per-variant) mitigations, which have become increasingly costly when combined against all the attacks—especially on older-generation processors. Indeed, some are so expensive that system administrators may not deploy them at all. Worse still, spot mitigations can only address known (N-day) attacks as they do not tackle the underlying problem: different security domains that run simultaneously on the same physical CPU cores and share their microarchitectural resources.

In this paper, we propose Quarantine, a principled, software-only approach to mitigate transient execution attacks by eliminating sharing of microarchitectural resources. Quarantine decouples privileged and unprivileged execution and physically isolates different security domains on different CPU cores. We apply Quarantine to the Linux/KVM boundary and show it offers the system and its users blanket protection against malicous VMs and (unikernel) applications. Quarantine mitigates 24 out of the 27 known transient execution attacks on Intel CPUs and provides strong security guarantees against future attacks. On LMbench, Quarantine incurs a geomean overhead of 11.2%, much lower than the default configuration of spot mitigations on Linux distros such as Ubuntu (even though the spot mitigations offer only partial protection).

SESSION: ML (I): Inference and Toxicity

Efficient Membership Inference Attacks against Federated Learning via Bias Differences

Liwei Zhang
Linghui Li
Xiaoyong Li
Binsi Cai
Yali Gao
Ruobin Dou
Luying Chen

Federated learning aims to complete model training without private data sharing, but many privacy risks remain. Recent studies have shown that federated learning is vulnerable to membership inference attacks. The weight as an important parameter in neural networks has been proven effective for membership inference attacks, but it leads to significant overhead. Facing this issue, in this paper, we propose a bias-based method for efficient membership inference attacks against federated learning. Different from the weight that determines the direction of the decision surface, the bias also plays an important role in determining the distance to move along the direction. Moreover, the number of bias is way less than the weight. We consider two types of attacks: local attack and global attack, corresponding to two possible types of insiders: participant and central aggregator. For the local attack, we design a neural network-based inference, which fully learns the vertical bias changes of the member data and non-member data. For the global attack, we design a difference comparison-based inference to determine the data source. Extensive experimental results on four public datasets show that the proposed method achieves state-of-the-art inference accuracy. Moreover, experiments prove the effectiveness of the proposed method to resist some commonly used defenses.

Exploring Clustered Federated Learning’s Vulnerability against Property Inference Attack

Hyunjun Kim
Yungi Cho
Younghan Lee
Ho Bae
Yunheung Paek

Clustered federated learning (CFL) is an advanced technique in the field of federated learning (FL) that addresses the issue of catastrophic forgetting caused by non-independent and identically distributed (non-IID) datasets. CFL achieves this by clustering clients based on the similarity of their datasets and training a global model for each cluster. Despite the effectiveness of CFL in mitigating performance degradation resulting from non-IID datasets, the potential risk of privacy leakages in CFL has not been thoroughly studied. Previous work evaluated the risk of privacy leakages in FL using the property inference attack (PIA), which extracts information about unintended properties (i.e., attributes that differ from the target attribute of the global model’s main task). In this paper, we explore the potential risk of unintended property leakage in CFL by subjecting it to both passive and active PIAs. Our empirical analysis shows that the passive PIA performance on CFL is substantially better than that on FL in terms of the attack AUC score. Moreover, we propose an enhanced active PIA method tailored for CFL to improve the attack performance. Our method introduces a scale-up parameter that amplifies the impact of malicious local updates, resulting in better performance than the previous technique. Furthermore, we demonstrate that the vulnerability of CFL can be alleviated by applying differential privacy (DP) mechanisms at the client-level. Unlike previous works, which have shown that applying DP to FL can induce a high utility loss, our empirical results indicate that DP can be used as a defense mechanism in CFL, leading to a better trade-off between privacy and utility.

Witnessing Erosion of Membership Inference Defenses: Understanding Effects of Data Drift in Membership Privacy

Seung Ho Na
Kwanwoo Kim
Seungwon Shin

Data drift is the phenomenon when the input data distribution in testing time is different from the training time. This strengthens the generalization gap in a model, which is known to severely deteriorate the model’s performance. Meanwhile, previous studies state that membership inference attacks (MIA) take advantage of the generalization gap of a machine learning model. By transitive logic, we can deduce that data drift would affect these privacy attacks. In this work, we consider data drift when applied to the privacy threat of MIA. As the first work to explore the detrimental extent of data drift on membership privacy, we conduct a literature review on current MIA defense works under selected dimensions associated with data drift. Our study reveals that not only has data drift never been tested in MIA defense, but there is also no infrastructure to juxtapose data drift with MIA defense. We overcome this by proposing a design for simulating authentic and synthetic data drift and evaluate the benchmark MIA defense methods on various settings. The evaluation shows that data drift strongly enhances the attack success rate of MIA, regardless of defense. In this, we propose MIAdapt, a proof of concept of a MIA defense that allows update in data drift. From this evaluation, we provide security insight into possible solutions in negating the effects of data drift. We hope our work brings attention to the threat of data drift and instigates the development of MIA defense that are adaptable to data drift.

PrivMon: A Stream-Based System for Real-Time Privacy Attack Detection for Machine Learning Models

Myeongseob Ko
Xinyu Yang
Zhengjie Ji
Hoang Anh Just
Peng Gao
Anoop Kumar
Ruoxi Jia

Machine learning (ML) models can expose the private information of training data when confronted with privacy attacks. Specifically, a malicious user with black-box access to a ML-as-a-service platform can reconstruct the training data (i.e., model inversion attacks) or infer the membership information (i.e., membership inference attacks) simply by querying the ML model. Despite the pressing need for effective defenses against privacy attacks with black-box access, existing approaches have mostly focused on enhancing the robustness of the ML model via modifying the model training process or the model prediction process. These defenses can compromise model utility and require the cooperation of the underlying AI platform (i.e., platform-dependent). These constraints largely limit the real-world applicability of existing defenses.

Despite the prevalent focus on improving the model’s robustness, none of the existing works have focused on the continuous protection of already deployed ML models from privacy attacks by detecting privacy leakage in real-time. This defensive task becomes increasingly important given the vast deployment of ML-as-a-service platforms these days. To bridge the gap, we propose PrivMon, a new stream-based system for real-time privacy attack detection for ML models. To facilitate wide applicability and practicality, PrivMon defends black-box ML models against a wide range of privacy attacks in a platform-agnostic fashion: PrivMon only passively monitors model queries without requiring the cooperation of the model owner or the AI platform. Specifically, PrivMon takes as input a stream of ML model queries and provides an efficient attack detection engine that continuously monitors the stream to detect the privacy attack in real-time, by identifying self-similar malicious queries. We show empirically and theoretically that PrivMon can detect a wide range of realistic privacy attacks within a practical time frame and successfully mitigate the attack success rate. Code is available at https://github.com/ruoxi-jia-group/privmon.

Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots

Bocheng Chen
Guangjing Wang
Hanqing Guo
Yuanda Wang
Qiben Yan

Recent advances in natural language processing and machine learning have led to the development of chatbot models, such as ChatGPT, that can engage in conversational dialogue with human users. However, understanding the ability of these models to generate toxic or harmful responses during a non-toxic multi-turn conversation remains an open research problem. Existing research focuses on single-turn sentence testing, while we find that 82% of the individual non-toxic sentences that elicit toxic behaviors in a conversation are considered safe by existing tools. In this paper, we design a new attack, ToxicChat, by fine-tuning a chatbot to engage in conversation with a target open-domain chatbot. The chatbot is fine-tuned with a collection of crafted conversation sequences. Particularly, each conversation begins with a sentence from a crafted prompt sentences dataset. Our extensive evaluation shows that open-domain chatbot models can be triggered to generate toxic responses in a multi-turn conversation. In the best scenario, ToxicChat achieves a 67% toxicity activation rate. The conversation sequences in the fine-tuning stage help trigger the toxicity in a conversation, which allows the attack to bypass two defense methods. Our findings suggest that further research is needed to address chatbot toxicity in a dynamic interactive environment. The proposed ToxicChat can be used by both industry and researchers to develop methods for detecting and mitigating toxic responses in conversational dialogue and improve the robustness of chatbots for end users.

SESSION: ML (II): Adversarial, Robust and Explainable AI

Flow-MAE: Leveraging Masked AutoEncoder for Accurate, Efficient and Robust Malicious Traffic Classification

Zijun Hang
Yuliang Lu
Yongjie Wang
Yi Xie

Malicious traffic classification is crucial for Intrusion Detection Systems (IDS). However, traditional Machine Learning approaches necessitate expert knowledge and a significant amount of well-labeled data. Although recent studies have employed pre-training models from the Natural Language Processing domain, such as ET-BERT, for traffic classification, their effectiveness is impeded by limited input length and fixed Byte Pair Encoding.

To address these challenges, this paper presents Flow-MAE, a pre-training model that employs Masked AutoEncoders (MAE) from the Computer Vision domain to achieve accurate, efficient, and robust malicious network traffic classification. Flow-MAE overcomes these challenges by utilizing burst (a generic representation of network traffic) and patch embedding to accommodate extensive traffic length. Moreover, Flow-MAE introduces a self-supervised pre-training task, the Masked Patch Model, which captures unbiased representations from bursts with varying lengths and patterns.

Experimental results from six datasets reveal that Flow-MAE achieves new state-of-the-art accuracy (>0.99), efficiency (>900 samples/s), and robustness across diverse network traffic types. In comparison to the state-of-the-art ET-BERT, Flow-MAE exhibits improvements in accuracy and speed by 0.41%-1.93% and 7.8x-10.3x, respectively, while necessitating only 0.2% FLOPs and 44% memory overhead. The efficacy of the core designs is validated through few-shot learning and ablation experiments. The code is publicly available at https://github.com/NLear/Flow-MAE.

Your Attack Is Too DUMB: Formalizing Attacker Scenarios for Adversarial Transferability

Marco Alecci
Mauro Conti
Francesco Marchiori
Luca Martinelli
Luca Pajola

Evasion attacks are a threat to machine learning models, where adversaries attempt to affect classifiers by injecting malicious samples. An alarming side-effect of evasion attacks is their ability to transfer among different models: this property is called transferability. Therefore, an attacker can produce adversarial samples on a custom model (surrogate) to conduct the attack on a victim’s organization later. Although literature widely discusses how adversaries can transfer their attacks, their experimental settings are limited and far from reality. For instance, many experiments consider both attacker and defender sharing the same dataset, balance level (i.e., how the ground truth is distributed), and model architecture.

In this work, we propose the DUMB attacker model. This framework allows analyzing if evasion attacks fail to transfer when the training conditions of surrogate and victim models differ. DUMB considers the following conditions: Dataset soUrces, Model architecture, and the Balance of the ground truth. We then propose a novel testbed to evaluate many state-of-the-art evasion attacks with DUMB; the testbed consists of three computer vision tasks with two distinct datasets each, four types of balance levels, and three model architectures. Our analysis, which generated 13K tests over 14 distinct attacks, led to numerous novel findings in the scope of transferable attacks with surrogate models. In particular, mismatches between attackers and victims in terms of dataset source, balance levels, and model architecture lead to non-negligible loss of attack performance.

False Sense of Security: Leveraging XAI to Analyze the Reasoning and True Performance of Context-less DGA Classifiers

Arthur Drichel
Ulrike Meyer

The problem of revealing botnet activity through Domain Generation Algorithm (DGA) detection seems to be solved, considering that available deep learning classifiers achieve accuracies of over 99.9%. However, these classifiers provide a false sense of security as they are heavily biased and allow for trivial detection bypass. In this work, we leverage explainable artificial intelligence (XAI) methods to analyze the reasoning of deep learning classifiers and to systematically reveal such biases. We show that eliminating these biases from DGA classifiers considerably deteriorates their performance. Nevertheless we are able to design a context-aware detection system that is free of the identified biases and maintains the detection rate of state-of-the art deep learning classifiers. In this context, we propose a visual analysis system that helps to better understand a classifier’s reasoning, thereby increasing trust in and transparency of detection methods and facilitating decision-making.

Federated Explainability for Network Anomaly Characterization

Xabier Sáez-de-Cámara
Jose Luis Flores
Cristóbal Arellano
Aitor Urbieta
Urko Zurutuza

Machine learning (ML) based systems have shown promising results for intrusion detection due to their ability to learn complex patterns. In particular, unsupervised anomaly detection approaches offer practical advantages as does not require labeling the training data, which is costly and time-consuming. To further address practical concerns, there is a rising interest in adopting federated learning (FL) techniques as a recent ML model training paradigm for distributed settings (e.g., IoT), thereby addressing challenges such as data privacy, availability and communication cost concerns. However, output generated by unsupervised models provide limited contextual information to security analysts at SOCs, as they usually lack the means to know why a sample was classified as anomalous or cannot distinguish between different types of anomalies, difficulting the extraction of actionable information and correlation with other indicators. Moreover, ML explainability methods have received little attention in FL settings and present additional challenges due to the distributed nature and data locality requirements. This paper proposes a new methodology to characterize and explain the anomalies detected by unsupervised ML-based intrusion detection models in FL settings. We adapt and develop explainability, clustering and cluster validation algorithms to FL settings to mine patterns in the anomalous samples and identify different threats throughout the entire network, demonstrating the results on two network intrusion detection datasets containing real IoT malware, namely Gafgyt and Mirai, and various attack traces. The learned clustering results can be used to classify emerging anomalies, provide additional context that can be leveraged to gain more insight and enable the correlation of the anomalies with alerts triggered by other security solutions.

PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection

Hanqing Guo
Guangjing Wang
Yuanda Wang
Bocheng Chen
Qiben Yan
Li Xiao

In this paper, we propose PhantomSound, a query-efficient black-box attack toward voice assistants. Existing black-box adversarial attacks on voice assistants either apply substitution models or leverage the intermediate model output to estimate the gradients for crafting adversarial audio samples. However, these attack approaches require a significant amount of queries with a lengthy training stage. PhantomSound leverages the decision-based attack to produce effective adversarial audios, and reduces the number of queries by optimizing the gradient estimation. In the experiments, we perform our attack against 4 different speech-to-text APIs under 3 real-world scenarios to demonstrate the real-time attack impact. The results show that PhantomSound is practical and robust in attacking 5 popular commercial voice controllable devices over the air, and is able to bypass 3 liveness detection mechanisms with success rate. The benchmark result shows that PhantomSound can generate adversarial examples and launch the attack in a few minutes. We significantly enhance the query efficiency and reduce the cost of a successful untargeted and targeted adversarial attack by 93.1% and 65.5% compared with the state-of-the-art black-box attacks, using merely ∼ 300 queries (∼ 5 minutes) and ∼ 1,500 queries (∼ 25 minutes), respectively.

SESSION: Network and Cloud Security

Container Orchestration Honeypot: Observing Attacks in the Wild

Noah Spahn
Nils Hanke
Thorsten Holz
Christopher Kruegel
Giovanni Vigna

Containers, a mechanism to package software and its dependencies into a single artifact, have helped fuel the rapid pace of technological advancements in the last few years. However, it is not always clear what the potential security risk of moving to the cloud and container-based technologies is. In this paper, we investigate exposed container orchestration services on the Internet: how many there are, and the attacks against them. We considered three groups of container-based software: Docker, Kubernetes, and workflow tools. In a measurement study, we scanned the Internet to identify vulnerable container and container-orchestration services running on default ports. Considering the scan data, we then designed a high-interaction honeypot to reveal where attackers tend to strike and what is being done against exposed instances. The honeypot is based on container orchestration tools installed on Ubuntu servers, behind a carefully constructed gateway, and using the default ports. Our honeypot attracted attackers within minutes of launch. In total, we collected 94 days of attack data and extracted associated indicators of compromise (IOCs), which are provided to the research community to enable further insights.

Our empirical study measures the risk associated with container and container orchestration systems exposed on the Internet. The assessment is performed by leveraging a novel design for a high-interaction honeypot. Using the observed data, we extract fresh insights into malicious tools, tactics, and procedures used against exposed host systems. In addition, we make available to the research community a rich dataset of unencrypted malicious traffic.

EnclaveVPN: Toward Optimized Utilization of Enclave Page Cache and Practical Performance of Data Plane for Security-Enhanced Cloud VPN

Jaemin Park
Brent Byunghoon Kang

A cloud Virtual Private Network (VPN) is an essential infrastructure for tenants to connect their on-premise networks with a cloud network. However, tenants are often reluctant to adopt the cloud VPN because of security concerns, such as key disclosure, impersonation, and packet sniffing. Software Guard Extensions (SGX) is a good candidate to address the security concerns because it can create enclaves in the isolated memory (i.e., Enclave Page Cache (EPC)) to protect security-sensitive code and data from malicious access. In this paper, we propose EnclaveVPN, which supports a security-enhanced IPsec gateway using SGX with optimized EPC utilization and practical performance of the data plane. EnclaveVPN leverages enclaves to manage cryptographic keys and execute cryptographic operations for the IPsec gateway. EnclaveVPN allows only encrypted packets to be transmitted within and to/from the cloud network and presents features for optimizing EPC utilization and minimizing overhead in the data plane. We implemented a prototype on a real SGX v1.0 machine (Xeon E-2286M 2.40GHz 8-core CPU). The experiment and benchmark results showed that EnclaveVPN saved the EPC up to 62.5 and achieved approximately 87 of the data plane performance of the non-SGX IPsec gateway.

EBugDec: Detecting Inconsistency Bugs caused by RFC Evolution in Protocol Implementations

Jingting Chen
Feng Li
Qingfang Chen
Ping Li
Lili Xu
Wei Huo

The implementation of network protocol must comply with respective Request for Comments (RFC) and updated as RFCs evolve. However, due to the richness of RFCs and the complex relationships between them, systematically discovering the evolution of RFC requirements is non-trivial, which consequently brings in inconsistency bugs when modifying code to support new RFC documents. This can lead to inconsistency bugs when modifying code to support new RFC documents, known as RFC-evolutionary bugs or ebugs. Recent approaches have used natural language processing techniques to extract RFC rules and employed differential testing or static analysis to discover inconsistency bugs in protocol implementations. However, they seldom consider the evolution of RFC requirements nor their related bugs.

In this paper, we present EBugDec. Given a protocol implementation and the RFCs it claims to support, our approach identifies evolutionary relationships between RFC documents and their corresponding requirement changes. From this, we derive two major types of evolutionary rules: primitive rules that dictate requirements for newly-introduced packet items, and derivative rules that describe the influence the new items made on requirements stipulated in earlier RFCs. Both of them are represented in formal expressions that dictate packet-related operations should be guarded by specific conditions under special cases (if necessary). Then we use clues found in code annotations and release notes to locate rule-related code in the implementation, and leverage a predominator-based algorithm to discover rule violations in the implementation. We also uncover incomplete error handling logic when the rule-specified conditions fail. We implemented a prototype of EBugDec and demonstrated its efficiency by applying it on 12 implementations of protocol services, along with 178 RFC documents their historical releases claim to support. On average, EBugDec consumed 37.29 seconds to finish its analysis, and detected 17 new ebugs, 5 of which can only be triggered under harsh prerequisites.

CoZure: Context Free Grammar Co-Pilot Tool for Finding New Lateral Movements in Azure Active Directory

Abdullahi Chowdhury
Hung Nguyen

Securing cloud environments such as Microsoft Azure cloud is challenging and vulnerabilities due to misconfigurations, especially with user roles assignment, are common. There have been significant efforts to find vulnerabilities that enable lateral movements in Azure AD systems. All of the existing works, however, either follow a manual process to find new vulnerabilities or are only able to discover whether known vulnerabilities exist in a deployed Azure environment. We develop an Azure Active Directory (AAD) lateral movement-discovery tool, CoZure, that can help researchers find new lateral movements in an Azure AD environment. CoZure deploys algorithms from Context-Free Grammar (CFG) to first learn the ways (grammar rules) that security researchers find vulnerabilities and then extend these rules to discover new lateral movement paths. CoZure first collects a large set of existing AAD environment commands using a specialized scraping tool, it then uses CFG to build a knowledge base dataset from these commands and previous attacks. Cozure then applies the knowledge learned to find new combinations of commands that could open up new candidate lateral movements, which are then tested in a real AD environment for validation and manually checked by the user. CoZure helped discover lateral movements that current fuzzing tools (e.g., OneFuzz, RESTler) cannot identify and also shows better performance in finding existing misconfiguration issues in Azure AD. Using CoZure, we have discovered two new (not previously known) lateral movement methods that could lead to numerous new attacking paths in Azure AD.

Phantom-CSI Attacks against Wireless Liveness Detection

Qiuye He
Song Fang

All systems monitoring human behavior in real time are, by their nature, attractive targets for spoofing. For example, misdirecting live-feed security cameras or voice-controllable Internet-of-Things (IoT) systems (e.g., Amazon Alexa and Google Assistant) has immediately intuitive benefits, so there is a consequent need for detecting liveness of the human(s) whose behavior is being monitored. Emerging research lines have focused on analyzing changes in prevalent wireless signals to detect video or voice spoofing attacks, as wireless-based techniques do not require the user to carry any additional device or sensor for liveness detection. Video/voice streaming and coexisting wireless signals convey different aspects of the same overall contextual information related to human activities, and the presence of spoofing attacks on the former breaks this relationship, so the latter performs well as liveness detection to augment the former. However, we recognize and herein evaluate how to spoof the latter as well to defeat this liveness detection. In our attack, an adversary can easily create phantom wireless signals and synchronize them with spoofed video/voice signals, such that the legitimate user can no longer distinguish real from fake human activity. Real-world experimental results on top of software-defined radio platforms validate the possibility of generating fake CSI flows and demonstrate that with the phantom-CSI attack, the true positive rates (TPRs) of wireless liveness detection systems for video and voice decrease from 100% spoofing detection to just 4.4% and 0, respectively.

SESSION: Malware and Fuzzing

A Method for Summarizing and Classifying Evasive Malware

Haikuo Yin
Brandon Lou
Peter Reiher

Ever since the earliest days of the Internet, malware has been a problem for computers. Since then, this problem’s severity has only increased, with important organizations like universities and hospitals suffering major security breaches due to malware. As detection techniques get more advanced, so do attackers’ evasion attempts. One such method involves introducing benign behavior to malware to produce a benign classification even while performing malicious actions. In this work, we propose a method of classifying malware that remains effective in the presence of such evasion attempts. Our contributions include generating a behavior summary, vectorizing it in a way that’s robust to modifications, and constraining features to reduce the effectiveness of these evasion techniques. Our results show that we can effectively and consistently classify such evasive malware with minimal accuracy loss in non-evasive data.

Xunpack: Cross-Architecture Unpacking for Linux IoT Malware

Yuhei Kawakoya
Shu Akabane
Makoto Iwamura
Takeshi Okamoto

Although the vast majority of malware used to be x86 architecture-based, the rapid rise of Internet of Things (IoT) malware in recent years has been forcing malware analysts to deal with binaries written for a wide range of architectures with little tooling support.

We tackled this problem by designing and developing Xunpack, a cross-architecture system to extract and reconstruct the original code of packed IoT malware. The design principle of Xunpack is that it is decoupled from the architecture of the target malware. Specifically, it is built on QEMU, but it is independent of QEMU’s architecture-specific code. This design principle enables us to capture the execution of self-modifying code and the transitions between kernel- and user-land virtual memory spaces in architecture-independent manner; so we can define them as triggers for generating a dump of the target malware. Also, we introduce SelectiveDump, a technique to reconstruct the original code by selectively finding the most appropriate parts of it for reconstruction from several dumps. It can handle binaries packed with all major types of packers, including a state-of-the-art one, i.e., Type VI packer, which unpacks a function at runtime and repacks it after its execution.

To show the effectiveness of Xunpack, we conducted two experiments. First, we demonstrated Xunpack can unpack the original code of a packed ELF binary of 14 different architectures. Second, we compared it with major unpackers by using them against ELF binaries packed with several real-world packers. The result shows that Xunpack successfully unpacks all samples, outperforming the existing major unpackers. It also shows it can even unpack binaries whose architecture has not been reported to be used in malware (e.g., SPARC or RISCV) and their analysis tools have not been well-prepared.

SEnFuzzer: Detecting SGX Memory Corruption via Information Feedback and Tailored Interface Analysis

Donghui Yu
Jianqiang Wang
Haoran Fang
Ya Fang
Yuanyuan Zhang

Intel SGX provides protected memory called enclave to secure the private user data against corrupted or malicious OS environment. However, several researches have shown that the SGX applications suffer from memory corruption vulnerabilities, thus leading to critical information leakage. Detecting memory corruption vulnerability in SGX applications can be cumbersome. Existing works either use symbolic execution or formal methods to analyze the enclave library, which is known to be inefficient and errors prone. Fuzzing, an effective and efficient vulnerability detection method is rarely used in SGX and has limitations.

In this paper, we present SEnFuzzer, an automatic on-device fuzzing framework targeting SGX application memory corruption vulnerability detection. We designed an information feedback mechanism to convert the enclave environment from a black-box to a grey-box which is used to facilitate the fuzzing and the result analysis. To deal with the complex enclave interfaces, we thoroughly analyzed the ECALL and OCALL interface information in the EDL file to generate specific fuzz driver as well as OCALL hook mechanism to increase the fuzzing efficiency. We implemented SEnFuzzer prototype and evaluated it on 20 academic and industrial SGX applications such as mbedTLS-SGX and StealthDB. SEnFuzzer successfully found 51 bugs and vulnerabilities in their latest version.

FieldFuzz: In Situ Blackbox Fuzzing of Proprietary Industrial Automation Runtimes via the Network

Andrei Bytes
Prashant Hari Narayan Rajput
Constantine Doumanidis
Michail Maniatakos
Jianying Zhou
Nils Ole Tippenhauer

Networked Programmable Logic Controllers (PLCs) are proprietary industrial devices utilized in critical infrastructure that execute control logic applications in complex proprietary runtime environments that provide standardized access to the hardware resources in the PLC. These control applications are programmed in domain-specific IEC 61131-3 languages, compiled into a proprietary binary format, and process data provided via industrial protocols. Control applications present an attack surface threatened by manipulated traffic. For example, remote code injection in a control application would directly allow to take over the PLC, threatening physical process damage and the safety of human operators. However, assessing the security of control applications is challenging due to domain-specific challenges and the limited availability of suitable methods. Network-based fuzzing is often the only way to test such devices but is inefficient without guidance from execution tracing.

This work presents the FieldFuzz framework that analyzes the security risks posed by the Codesys runtime (used by over 400 devices from 80 industrial PLC vendors). FieldFuzz leverages efficient network-based fuzzing based on three main contributions: i) reverse-engineering enabled remote control of control applications and runtime components, ii) automated command discovery and status code extraction via network traffic and iii) a monitoring setup to allow on-system tracing and coverage computation. We use FieldFuzz to run fuzzing campaigns, which uncover multiple vulnerabilities, leading to three reported CVE IDs. To study the cross-platform applicability of FieldFuzz, we reproduce the findings on a diverse set of Industrial Control System (ICS) devices, showing a significant improvement over the state-of-the-art.

Bin there, target that: Analyzing the target selection of IoT vulnerabilities in malware binaries

Arwa Abdulkarim Al Alsadi
Kaichi Sameshima
Katsunari Yoshioka
Michel Van Eeten
Carlos Hernandez Gañán

For years, attackers have exploited vulnerabilities in Internet of Things (IoT) devices. Previous research has examined target selection in cybercrime, but there has been little investigation into the factors that influence target selection in attacks on IoT. This study aims to better understand how attackers choose their targets by analyzing the frequency of specific exploits in 11,893 IoT malware binaries that were distributed between 2018–2021. Our findings indicate that 78% of these binary files did not specifically target IoT vulnerabilities but rather scanned the Internet for devices with weak authentication. To understand the usage of exploits in the remaining 2,629 binaries, we develop a theoretical model from relevant literature to examine the impact of four latent variables, i.e. exposure, vulnerability, exploitability, and patchability. We collect indicators to measure these variables and find that they can explain to a significant extent (R2=0.38) why some vulnerabilities are more frequently exploited than others. The severity of vulnerabilities does not significantly increase the frequency with which they are targeted, while the presence of Proof-of-Concept exploit code does increase it. We also observe that the availability of a patch reduces the frequency of being targeted, yet that more complex patches are associated with higher frequency. In terms of exposure, more widespread device models are more likely to be targeted by exploits. We end with recommendations to disincentivize attackers from targeting vulnerabilities.

SESSION: Software Security I

FineIBT: Fine-grain Control-flow Enforcement with Indirect Branch Tracking

Alexander J. Gaidis
Joao Moreira
Ke Sun
Alyssa Milburn
Vaggelis Atlidakis
Vasileios P. Kemerlis

We present the design, implementation, and evaluation of FineIBT: a CFI enforcement mechanism that improves the precision of hardware-assisted CFI solutions, like Intel IBT, by instrumenting program code to reduce the valid/allowed targets of indirect forward-edge transfers. We study the design of FineIBT on the x86-64 architecture, and implement and evaluate it on Linux and the LLVM toolchain. We designed FineIBT’s instrumentation to be compact, incurring low runtime and memory overheads, and generic, so as to support different CFI policies. Our prototype implementation incurs negligible runtime slowdowns (≈ 0%–1.94% in SPEC CPU2017 and ≈ 0%–1.92% in real-world applications) outperforming Clang-CFI. Lastly, we investigate the effectiveness/security and compatibility of FineIBT using the ConFIRM CFI benchmarking suite, demonstrating that our instrumentation provides complete coverage in the presence of modern software features, while supporting a wide range of CFI policies with the same, predictable performance.

SCVMON: Data-oriented attack recovery for RVs based on safety-critical variable monitoring

Sangbin Park
Youngjoon Kim
Dong Hoon Lee

There are many various data-oriented attacks on robotic vehicles (RVs) that change the inputs of an RV control program. While much research has been dedicated to detecting the attacks, the recovery mechanism has received relatively less attention. Without recovery after detection, an RV cannot continue with its assigned missions. Unfortunately, the existing recovery mechanisms have limitations that make it difficult to deploy these in real RVs, such that they require additional hardware/software or can only recover from the limited types of data-oriented attacks. To overcome these limitations, we propose a framework called SCVMON that detects and helps RVs recover from various data-oriented attacks that generate inappropriate control commands. Based on the observation that data-oriented attacks inevitably change the values of some variables in RV control programs, SCVMON systematically identifies the safety-critical variables (SCVs) that can affect the safety of RVs. For efficient recovery, we extract from SCVs a set of monitored safety-critical variables (mSCVs) that can reflect all input changes, and monitor them to detect and recover from various data-oriented attacks. SCVMON does not depend on the physical nature of a specific sensor or hardware, which is a significant benefit, and it can be applied through a simple software update. Our evaluation shows that SCVMON can quickly detect and recover from 20 types of data-oriented attacks. Also, SCVMON incurs only 0.3% storage overhead and up to 5.1% runtime overhead, proving that it is suitable for RVs.

Information Flow Tracking for Heterogeneous Compartmentalized Software

Zahra Tarkhani
Anil Madhavapeddy

We are now seeing increased hardware support for improving the security and performance of privilege separation and compartmentalization techniques. Today, developers can benefit from multiple compartmentalization mechanisms such as process-based sandboxes, trusted execution environments (TEEs)/enclaves, and even intra-address space compartments (i.e., intra-process or intra-enclave). We dub such a computing model a “hetero-compartment” environment and observe that existing system stacks still assume single-compartment models (i.e., user space processes), leading to limitations in using, integrating, and monitoring heterogeneous compartments from a security and performance perspective.

We introduce Deluminator, a set of OS abstractions and a userspace framework to enable extensible and fine-grained information flow tracking in hetero-compartment environments. Deluminator allows developers to securely use and combine compartments, define security policies over shared system resources, and audit policy violations and perform digital forensics across heterogeneous compartments. We implemented Deluminator on Linux-based ARM and x86-64 platforms, which supports diverse compartment types ranging from processes, SGX enclaves, TrustZone Trusted Apps (TAs), and intra-address space compartments. Our evaluation shows that our kernel and hardware-assisted approach results in a reasonable overhead (on average 7-29%) that makes it suitable for real-world applications.

Renewable Just-In-Time Control-Flow Integrity

Erick Bauman
Jun Duan
Kevin W. Hamlen
Zhiqiang Lin

Renew (Rewriting Newly Executable pages after Writes) unites and extends recent advances in binary code analysis and transformation to solve a longstanding compatibility problem for binary code security hardening algorithms—support for arbitrary dynamically self-modifying code. Self-modification is now a mainstay of many consumer software products, including Just-In-Time (JIT) compiled languages, on-demand component loading, self-extracting installers, and self-hooking APIs; but it poses significant challenges for code hardening algorithms that rely on computationally heavy static analyses, source code information, or compiler-specific code generation patterns. As a result, many of the strongest protection mechanisms for code hardening have remained incompatible or significantly weakened for the large class of software that incorporates self-modification (either directly or within its underlying runtime systems).

By leveraging recent advances in lightweight binary disassembly, efficient memory page interception, and fast machine code rewriting, Renew transparently extends binary code security hardening algorithms, such as source-free control-flow integrity (CFI) and software fault isolation (SFI), to self-modifying target codes. Experiments on two commodity JIT compilers and a commodity self-extracting installer solution show that Renew supports highly diverse dynamic code generation strategies with little or no customization to each new application, and achieves a 3–4 × performance improvement over alternative solutions that disable dynamic code to achieve equivalent security guarantees.

Raft: Hardware-assisted Dynamic Information Flow Tracking for Runtime Protection on RISC-V

Yu Wang
Jinting Wu
Haodong Zheng
Zhenyu Ning
Boyuan He
Fengwei Zhang

Dynamic Information Flow Tracking (DIFT) is a fundamental computer security technique that tracks the data flow of interest at runtime, overcoming the limitations of discovering data dependencies statically at compilation time. However, software-based DIFT tools often suffer from unbearably high runtime overhead due to dynamic binary instrumentation or virtual machine, limiting the usefulness of DIFT. Even though hardware-assisted DIFT frameworks cut down the performance overhead effectively, it is still unacceptable for applications under rigorous time constraints.

This paper presents Raft, a flexible hardware-assisted DIFT framework that provides runtime protection for embedded applications without delay to the programs. Our framework is designed as a coprocessor for a RISC-V Rocket Core, introducing minimally-invasive changes to the main processor. In Raft, we apply a novel storage mechanism with hybrid byte/variable granularity to reduce the size of tag storage and provide fine-grained protection. We deploy Raft on the Rocket emulator and FPGA development board to evaluate its effectiveness and efficiency. The experiment results show that, compared to previous approaches, Raft cuts down the performance overhead from more than 20% to less than 0.1% on NBench and CoreMark microbenchmarks. The performance overhead of Raft on SPEC CINT 2006 benchmarks is negligible (0.13%). We also utilize a customized program to demonstrate its functionality and conduct a detailed evaluation with a real-world embedded medical application and known CVEs.

SESSION: Software Security II

MIFP: Selective Fat-Pointer Bounds Compression for Accurate Bounds Checking

Shengjie Xu
Eric Liu
Wei Huang
David Lie

Bounds compression for fat pointers can reduce the memory and performance overhead of maintaining pointer bounds and is necessary for efficient hardware implementation. However, compression can introduce inaccuracy to the bounds, making certain out-of-bounds accesses undetectable. Although the security threat can be mitigated by padding the objects, no known mitigations can detect these out-of-bounds accesses deterministically.

We present MIFP, a method that automatically mixes both compressed and uncompressed bounds to preserve the performance benefits of bounds compression while ensuring accurate bounds checking. Given a program using a single fat pointer representation (e.g., all compressed bounds), MIFP performs whole-program analysis to expand potentially unsafe and inaccurate fat pointers such that they carry accurate uncompressed bounds. To minimize the number of pointers to expand, MIFP adds instrumentation on a per-allocation-site granularity; objects of the same type but different code allocation locations can have their pointer members transformed differently depending on how the pointers are used. We describe our algorithm and supporting data structures, and show that utilizing multiple fat-pointer representations reduces the runtime and memory overheads of uncompressed bounds by 79% and 93% respectively.

All Use-After-Free Vulnerabilities Are Not Created Equal: An Empirical Study on Their Characteristics and Detectability

Zeyu Chen
Daiping Liu
Jidong Xiao
Haining Wang

Over the past decade, use-after-free (UaF) has become one of the most exploited types of vulnerabilities. To address this increasing threat, we need to advance the defense in multiple directions, such as UaF vulnerability detection, UaF exploit defense, and UaF bug fix. Unfortunately, the intricacy rooted in the temporal nature of UaF vulnerabilities makes it quite challenging to develop effective and efficient defenses in these directions. This calls for an in-depth understanding of real-world UaF characteristics. This paper presents the first comprehensive empirical study of UaF vulnerabilities, with 150 cases randomly sampled from multiple representative software suites, such as Linux kernel, Python, and Mozilla Firefox. We aim to identify the commonalities, root causes, and patterns from real-world UaF bugs, so that the empirical results can provide operational guidance to avoid, detect, deter, and fix UaF vulnerabilities. Our main finding is that the root causes of UaF bugs are diverse, and they are not evenly or equally distributed among different software. This implies that a generic UaF detector/fuzzer is probably not an optimal solution. We further categorize the root causes into 11 patterns, several of which can be translated into simple static detection rules to cover a large portion of the 150 UaF vulnerabilities with high accuracy. Motivated by our findings, we implement 11 checkers in a static bug detector called Palfrey. Running Palfrey on the code of popular open source software, we detect 9 new UaF vulnerabilities. Compared with state-of-the-art static bug detectors, Palfrey outperforms in coverage and accuracy for UaF detection, as well as time and memory overhead.

NatiSand: Native Code Sandboxing for JavaScript Runtimes

Marco Abbadini
Dario Facchinetti
Gianluca Oldani
Matthew Rossi
Stefano Paraboschi

Modern runtimes render JavaScript code in a secure and isolated environment, but when they execute binary programs and shared libraries, no isolation guarantees are provided. This is an important limitation, and it affects many popular runtimes including Node.js, Deno, and Bun [20, 61].

In this paper we propose NatiSand, a component for JavaScript runtimes that leverages Landlock, eBPF, and Seccomp to control the filesystem, Inter-Process Communication (IPC), and network resources available to binary programs and shared libraries. NatiSand does not require changes to the application code and offers to the user an easy interface. To demonstrate the effectiveness and efficiency of our approach we implemented NatiSand and integrated it into Deno, a modern, security-oriented JavaScript runtime. We reproduced a number of vulnerabilities affecting third-party code, showing how they are mitigated by NatiSand. We also conducted an extensive experimental evaluation to assess the performance, proving that our approach is competitive with state of the art code sandboxing solutions. The implementation is available open source.

DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection

Yizheng Chen
Zhoujie Ding
Lamya Alowain
Xinyun Chen
David Wagner

We propose and release a new vulnerable source code dataset. We curate the dataset by crawling security issue websites, extracting vulnerability-fixing commits and source codes from the corresponding projects. Our new dataset contains 18,945 vulnerable functions spanning 150 CWEs and 330,492 non-vulnerable functions extracted from 7,514 commits. Our dataset covers 295 more projects than all previous datasets combined.

Combining our new dataset with previous datasets, we present an analysis of the challenges and promising research directions of using deep learning for detecting software vulnerabilities. We study 11 model architectures belonging to 4 families. Our results show that deep learning is still not ready for vulnerability detection, due to high false positive rate, low F1 score, and difficulty of detecting hard CWEs. In particular, we demonstrate an important generalization challenge for the deployment of deep learning-based models. We show that increasing the volume of training data may not further improve the performance of deep learning models for vulnerability detection, but might be useful to improve the generalization ability to unseen projects.

We also identify hopeful future research directions. We demonstrate that large language models (LLMs) are a promising research direction for ML-based vulnerability detection, outperforming Graph Neural Networks (GNNs) with code-structure features in our experiments. Moreover, developing source code specific pre-training objectives is a promising research direction to improve the vulnerability detection performance.

Why Johnny Can’t Use Secure Docker Images: Investigating the Usability Challenges in Using Docker Image Vulnerability Scanners through Heuristic Evaluation

Taeyoung Kim
Seonhye Park
Hyoungshick Kim

This paper explores the usability of Docker Image Vulnerability Scanners (DIVSes) through heuristic evaluations. Docker simplifies the process of software development, distribution, deployment, and execution by providing a container-based execution environment. However, vulnerabilities in Docker images can pose security risks to containers. To mitigate this, DIVSes are crucial in helping developers identify and address these vulnerabilities in the software packages and libraries within Docker images. Despite their importance, research on the usability of DIVSes has been limited. To address this gap, we developed 11 customized heuristics and applied them to three widely-used DIVSes (Grype, Trivy, and Snyk). Our evaluations revealed 239 usability issues within the tools evaluated. Our findings highlight that the evaluated DIVSes do not provide sufficient information to comprehend the risks associated with identified vulnerabilities, prioritize them, or effectively fix them. Our study offers valuable insights and practical recommendations for enhancing the usability of DIVSes, making it easier for developers to identify and address vulnerabilities in Docker images.

SESSION: Web Security and Authentication

SigA: rPPG-based Authentication for Virtual Reality Head-mounted Display

Lin Li
Chao Chen
Lei Pan
Leo Yu Zhang
Jun Zhang
Yang Xiang

Consumer-grade virtual reality head-mounted displays (VR-HMD) are becoming increasingly popular. Despite VR’s convenience and booming applications, VR-based authentication schemes are underdeveloped. The recently proposed authentication methods (Electrooculogram based, Electrical Muscle Stimulation-based, and alike) require active user involvement, disturbing many scenarios like drone flight and telemedicine. This paper proposes an effective and efficient user authentication method in VR environments resilient to impersonation attacks using physiological signals — Photoplethysmogram (PPG), namely SigA. SigA exploits the advantage that PPG is a physiological signal invisible to the naked eye. Using VR-HMDs to cover the eye area completely, SigA reduces the risk of signal leakage during PPG acquisition. We conducted a comprehensive analysis of SigA’s feasibility on five publicly available datasets, nine different pre-trained models, three facial regions, various lengths of the video clips required for training, four different signal time intervals, and continuous authentication with different sliding window sizes. The results demonstrate that SigA achieves more than 95% of the average F1-score in a one-second signal to accommodate a complete cardiac cycle for most adults, implying its applicability in real-world scenarios. Furthermore, experiments have shown that SigA is resistant to zero-effort attacks, statistical attacks, impersonation attacks (with a detection accuracy of over 95%) and session hijacking attacks.

Boosting Big Brother: Attacking Search Engines with Encodings

Nicholas Boucher
Luca Pajola
Ilia Shumailov
Ross Anderson
Mauro Conti

Search engines are vulnerable to attacks against indexing and searching via text encoding manipulation. By imperceptibly perturbing text using uncommon encoded representations, adversaries can control results across search engines for specific search queries. We demonstrate that this attack is successful against two major commercial search engines - Google and Bing - and one open source search engine - Elasticsearch. We further demonstrate that this attack is successful against LLM chat search including Bing’s GPT-4 chatbot and Google’s Bard chatbot. We also present a variant of the attack targeting text summarization and plagiarism detection models, two ML tasks closely tied to search. We provide a set of defenses against these techniques and warn that adversaries can leverage these attacks to launch disinformation campaigns against unsuspecting users, motivating the need for search engine maintainers to patch deployed systems.

Honey, I Cached our Security Tokens Re-usage of Security Tokens in the Wild

Leon Trampert
Ben Stock
Sebastian Roth

In order to mitigate the effect of Web attacks, modern browsers support a plethora of different security mechanisms. Mechanisms such as anti-Cross-Site Request Forgery (CSRF) tokens or nonces in a Content Security Policy rely on a random number that must only be used once. Notably, those Web security mechanisms are shipped through HTML tags or HTTP response headers from the server to the client side. To decrease the server load and the traffic burdened on the server infrastructure, many Web applications are served via a Content Delivery Network (CDN), which caches certain responses from the server to deliver them to multiple clients. This, however, affects not only the content but also the settings of the security mechanisms deployed via HTML meta tags or HTTP headers. If those are also cached, their content is fixed, and the security tokens are no longer random for each request. Even if the responses are not cached, operators may re-use tokens, as generating random numbers that are unique for each request introduces additional complexity for preserving the state on the server side. This work sheds light on the re-usage of security tokens in the wild, investigates what caused the static tokens, and elaborates on the security impact of the non-random security tokens.

Measuring the Leakage and Exploitability of Authentication Secrets in Super-apps: The WeChat Case

Supraja Baskaran
Lianying Zhao
Mohammad Mannan
Amr Youssef

Super-apps such as WeChat and Baidu host millions of mini-apps, which are very popular among users and developers because of the mini-apps’ convenience, lightweight, ease of sharing, and not requiring explicit installation. Such ecosystems involve several entities, such as the super-app and mini-app clients, the super-app backend server, the mini-app developer server, and other hosting platforms and services used by the mini-app developer. To support various user-level functionalities, these components must authenticate each other, which differs from regular user authentication to the super-app platform. In this paper, we explore the mini-app to super-app authentication problem caused by insecure development practices. This type of authentication allows the mini-app code to access super-app services on the developer’s behalf.

We conduct a large-scale measurement of developers’ insecure practices leading to mini-app to super-app authentication bypass, among which hard-coding developer secrets for such authentication is a major contributor. We also analyze the exploitability and security consequences of developer secret leakage in mini-apps by examining individual super-app server-side APIs. We develop an analysis framework for measuring such secret leakage, and primarily analyze 110,993 WeChat mini-apps, and 10,000 Baidu mini-apps (two of the most prominent super-app platforms), along with a few more datasets to test the evolution of developer practices and platform security enforcement over time. We found a large number of WeChat mini-apps (36,425, 32.8%) and a few Baidu mini-apps (112) leak their developer secrets, which can cause severe security and privacy problems for the users and developers of mini-apps. A network attacker who does not even have an account on the super-app platform, can effectively take down a mini-app, send malicious and phishing links to users, and access sensitive information of the mini-app developer and its users. We responsibly disclosed our findings and also put forward potential directions that could be considered to alleviate/eliminate the root causes of developers hard-coding the app secrets in the mini-app’s front-end code.

Leader: Defense Against Exploit-Based Denial-of-Service Attacks on Web Applications

Rajat Tandon
Haoda Wang
Nicolaas Weideman
Shushan Arakelyan
Genevieve Bartlett
Christophe Hauser
Jelena Mirkovic

Exploit-based denial-of-service attacks (exDoS) are challenging to detect and mitigate. Rather than flooding the network with excessive traffic, these attacks generate low rates of application requests that exploit some vulnerability and tie up a scarce key resource. It is impractical to design defenses for each variant of exDoS attacks separately. This approach does not scale, since new vulnerabilities can be discovered in existing applications, and new applications can be deployed with yet unknown vulnerabilities.

We propose Leader, an attack-agnostic defense against exDoS attacks. Leader monitors fine-grained resource usage per application on the host it protects, and per each external request to that application. Over time, Leader learns the time-based patterns of legitimate user’s usage of resources for each application and models them using elliptic envelope. During attacks, Leader uses these models to identify application clients that use resources in an abnormal manner, and blocks them.

We implement and evaluate Leader for Web application’s protection against exDoS attacks. Our results show that Leader correctly identifies around 99% of attack IPs, and around 99% of legitimate IPs across six different exDoS attacks used in our evaluation. On the average, Leader can identify and block an attacker after six requests. Leader has a small run time cost, adding less than 0.5% to page loading time.

Organized by