Confidential terms detection using language modeling technique in data leakage prevention

Peneti Subhashini; B. Padmaja Rani

Conference Proceedings

Confidential terms detection using language modeling technique in data leakage prevention

Advances in Intelligent Systems and Computing (2016) 381 271-279

DOI: 10.1007/978-81-322-2526-3_29

0Citations

2Readers

Get full text

Abstract

Confidential documents detection is a key activity in data leakage prevention methods. Once the document is marked as confidential, then it is possible to prevent data leakage from that document. Confidential terms are significant terms, which indicate confidential content in the document. This paper presents confidential terms detection method using language model with Dirichlet prior smoothing technique. Clusters are generated for training dataset documents (confidential and nonconfidential documents). Language model is created separately for confidential and nonconfidential documents. Expand nonconfidential language model in a cluster using similar clusters, which helps to identify the confidential content in the nonconfidential documents. Smoothing assigns a nonzero probability value to unseen words and improves accuracy of the language model.

Author supplied keywords

Cite

CITATION STYLE

APA

Subhashini, P., & Padmaja Rani, B. (2016). Confidential terms detection using language modeling technique in data leakage prevention. In Advances in Intelligent Systems and Computing (Vol. 381, pp. 271–279). Springer Verlag. https://doi.org/10.1007/978-81-322-2526-3_29

Confidential terms detection using language modeling technique in data leakage prevention

Abstract

Author supplied keywords

Cite

Register to see more suggestions