Plagiarism is usually studied from an analysis viewpoint: how to detectthat a text contains copies of another one. In this chapter we studyplagiarism from the generation viewpoint: how to generate a text with aguarantee of non-plagiarism. More precisely, we address the problem ofMarkov sequence generation with forbidden k-gram constraints. Thisproblem is addressed in two steps. In the first step, we show that,given a Markov transition matrix and a set of k-grams, we can buildefficiently an automaton that represents exactly the language of allsequences that can be generated from a Markov model, and that also donot contain any of the k-grams. The size of the automaton is bounded bythe size of the forbidden k-grams, and so is the time for building it.This automaton can be used to solve the algebraic problem (i.e.considering non-zero probabilities are uniform), by a simple walk. Inthe second step, we show that the automaton can be extended so as to beexploited by a belief propagation scheme, in order to produce perfectsampling of all the solutions.
CITATION STYLE
Papadopoulos, A., Pachet, F., & Roy, P. (2016). Generating Non-plagiaristic Markov Sequences with Max Order Sampling (pp. 85–103). https://doi.org/10.1007/978-3-319-24403-7_6
Mendeley helps you to discover research relevant for your work.