Style crack refers to the position where the author's identity changes in the article completed by multiple authors. This paper summarizes the current situation and theory of related fields at home and abroad, and proposes a multi-feature based document segmentation method for plagiarism detection. Seven text style features are used for style crack recognition. Through the result of feature extraction, the combination of multi-feature fusion and unsupervised machine learning algorithm is used to classify the features based on extraction, and the clustering algorithm is used to cluster the style features so as to find the location of style cracks. Experiments show that the method is effective and scientific, and achieves good results.
CITATION STYLE
Liu, G., Wang, K., Liu, W., Cheng, X., & Li, T. (2019). Document Segmentation Method Based on Style Feature Fusion. In IOP Conference Series: Materials Science and Engineering (Vol. 646). Institute of Physics Publishing. https://doi.org/10.1088/1757-899X/646/1/012044
Mendeley helps you to discover research relevant for your work.