데이터사이언스융합연구소

산업

Differentially private upsampling for enhanced anomaly detection in imbalanced data

In real-world applications, anomaly detection tasks are critically important. For example, fraud detection for the financial domains and the diagnosis of diseases for the medical domains require highly accurate predictions, as errors can lead to severe consequences. These tasks often rely on sensitive personal data, making it necessary to apply privacy-preserving techniques. However, applying privacy-preserving techniques directly degrades performance. To mitigate this issue, the minority class in an imbalanced dataset can be upsampled to improve balance. In this paper, we propose a differentially private upsampling method using a kernel-based support function for imbalanced datasets. The proposed method employs kernel support vector domain description to estimate the distribution of minority class data under differential privacy constraints, generating synthetic instances based on gradient methods. Additionally, we propose a filtering process that leverages the support function of the majority class data to refine the generated samples without additional privacy loss. Experimental results on real-world datasets demonstrate that the proposed method maintains robust privacy guarantees and achieves superior performance in minority class metrics, comparable to non-private methods.Co-authors: Yujin Choi, Jinseong Park, Youngjoo Park, Jaewook Lee

2026

산업

Homomorphic encryption-based fault diagnosis in IoT-enabled industrial systems

In IoT-enabled industrial environments, ensuring the privacy and security of operational data is paramount for fault diagnosis systems. This study presents a novel framework that seamlessly integrates homomorphic encryption (HE) with deep learning to achieve secure and efficient fault diagnosis for industrial bearings. By performing computations directly on encrypted sensor data, the framework guarantees full data confidentiality throughout the diagnostic process without requiring decryption. Key technical contributions of this work include the development of a minimax polynomial approximation for ReLU activations, which enhances diagnostic accuracy while preserving efficiency, and the design of an efficient 1D convolution method that combines two existing HE convolution techniques for optimal performance. Additionally, the framework incorporates frequency-domain optimizations using the Discrete Fourier Transform (DFT), which significantly enhance processing efficiency. The proposed model was trained on the CWRU bearing dataset and validated on a private dataset, achieving a diagnostic accuracy of 95.92%, comparable to state-of-the-art models operating on plaintext data. Furthermore, the DFT-based optimizations reduced inference time by nearly threefold while maintaining superior accuracy, underscoring the framework’s potential to provide secure and efficient fault diagnosis for industrial applications.Co-authors: Hoki Kim, Youngdoo Son

2025

산업

Privacy-preserving inference resistant to model extraction attacks

Privacy-Preserving Deep Learning (PPDL) has been successfully applied in the inference phase to preserve the privacy of input data. However, PPDL models are vulnerable to model extraction attacks, in which an adversary attempts to steal the trained model itself. In this paper, we propose a new defense method against model extraction attacks that is specifically designed for PPDL based on secure multi-party computations and homomorphic encryption. The proposed method confounds inference queries for out-of-distribution data by using a fake network with the target network while optimizing computational efficiency for PPDL environments. Furthermore, we introduce Wasserstein regularization to ensure that the fake network’s output distribution is indistinguishable from the target network, thwarting adversaries’ attempts to discern any discrepancies within the PPDL framework. The experimental results demonstrate that our defense method attains a good accuracy-security trade-off and is effective against a wide range of attacks, including adaptive attacks and transfer attacks. Our work contributes to the field of PPDL by providing an extended perspective to improve the algorithm’s security and reliability beyond privacy.Co-authors: Yujin Choi, Jaewook Lee, Saerom Park

2024

산업

Improving the utility of differentially private clustering through dynamical processing

In real-world applications, anomaly detection tasks are critically important. For example, fraud detection for the financial domains and the diagnosis of diseases for the medical domains require highly accurate predictions, as errors can lead to severe consequences. These tasks often rely on sensitive personal data, making it necessary to apply privacy-preserving techniques. However, applying privacy-preserving techniques directly degrades performance. To mitigate this issue, the minority class in an imbalanced dataset can be upsampled to improve balance. In this paper, we propose a differentially private upsampling method using a kernel-based support function for imbalanced datasets. The proposed method employs kernel support vector domain description to estimate the distribution of minority class data under differential privacy constraints, generating synthetic instances based on gradient methods. Additionally, we propose a filtering process that leverages the support function of the majority class data to refine the generated samples without additional privacy loss. Experimental results on real-world datasets demonstrate that the proposed method maintains robust privacy guarantees and achieves superior performance in minority class metrics, comparable to non-private methods.Co-authors: Yujin Choi, Jaewook Lee

2025

산업

Fully Few-shot Class-incremental Audio Classification Using Multi-level Embedding Extractor and Ridge Regression Classifier

In the task of Few-shot Class-incremental Audio Classification (FCAC), training samples of each base class are required to be abundant to train model. However, it is not easy to collect abundant training samples for many base classes due to data scarcity and high collection cost. We discuss a more realistic issue, Fully FCAC (FFCAC), in which training samples of both base and incremental classes are only a few. Furthermore, we propose a FFCAC method using a model which is decoupled into a multi-level embedding extractor and a ridge regression classifier. The embedding extractor consists of an encoder of audio spectrogram Transformer and a fusion module, and is trained in the base session but frozen in all incremental sessions. The classifier is updated continually in each incremental session. Results on three public datasets show that our method exceeds current methods in accuracy, and has advantage over most of them in complexity.Co-authors: Yongjie Si, Yanxiong Li, Jiaxin Tan, Qianhua He, Il-Youp Kwak

2025

Our Research