Confidential terms detection using language modeling technique in data leakage prevention

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Confidential documents detection is a key activity in data leakage prevention methods. Once the document is marked as confidential, then it is possible to prevent data leakage from that document. Confidential terms are significant terms, which indicate confidential content in the document. This paper presents confidential terms detection method using language model with Dirichlet prior smoothing technique. Clusters are generated for training dataset documents (confidential and nonconfidential documents). Language model is created separately for confidential and nonconfidential documents. Expand nonconfidential language model in a cluster using similar clusters, which helps to identify the confidential content in the nonconfidential documents. Smoothing assigns a nonzero probability value to unseen words and improves accuracy of the language model.

Cite

CITATION STYLE

APA

Subhashini, P., & Padmaja Rani, B. (2016). Confidential terms detection using language modeling technique in data leakage prevention. In Advances in Intelligent Systems and Computing (Vol. 381, pp. 271–279). Springer Verlag. https://doi.org/10.1007/978-81-322-2526-3_29

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free