Traditional searchable encryption schemes adopting the bag-of-words model occupy massive space to store the document set's index, where the dimension of the document vector is equal to the scale of the dictionary. The bag-of-words model also ignores the semantic information between keywords and documents, which could return non-relevant search results to users. The neutral-network based natural language processing method - Doc2Vec model use word's and paragraph's context information to extract documents' features. The features contain latent semantics information and can measure the similarity between documents. In this paper, we adopt the Doc2Vec model to achieve a semantic-aware multikeyword ranked search scheme. Doc2Vec model uses the distributed representation of words and documents with a modest dimensionality of vectors while trained on a dataset with a few hundred of millions of words. Documents' distributed representations are extracted as documents feature vector by Doc2Vec model and utilized as the search index. The features of the queried keywords are also extracted as the query feature vector, and the secure inner product operation is adopted to achieve privacy-preserving semantic search with the query feature vector and index. Our scheme can support dynamic update on the document set with Doc2Vec model. The experiment on a real-world dataset shows that the fixed-length feature vector can improve the time and space efficiency on the semantic-aware search.
CITATION STYLE
Dai, X., Dai, H., Yang, G., Yi, X., & Huang, H. (2019). An Efficient and Dynamic Semantic-Aware Multikeyword Ranked Search Scheme over Encrypted Cloud Data. IEEE Access, 7, 142855–142865. https://doi.org/10.1109/ACCESS.2019.2944476
Mendeley helps you to discover research relevant for your work.