In filtering, each output is produced by a certain number of different inputs. We explore the statistics of this degeneracy in an explicitly treatable filtering problem in which filtering performs the maximal compression of relevant information contained in inputs (arrays of zeros and ones). This problem serves as a reference model for the statistics of filtering and related sampling problems. The filter patterns in this problem conveniently allow a microscopic, combinatorial consideration. This allows us to find the statistics of outputs, namely the exact distribution of output degeneracies, for arbitrary input sizes. We observe that the resulting degeneracy distribution of outputs decays as e-clogαd with degeneracy d, where c is a constant and exponent α>1, i.e., faster than a power law. Importantly, its form essentially depends on the size of the input dataset, appearing to be closer to a power-law dependence for small dataset sizes than for large ones. We demonstrate that for sufficiently small input dataset sizes typical for empirical studies, this distribution could be easily perceived as a power law. We extend our results to filter patterns of various sizes and demonstrate that the shortest filter pattern provides the maximum informative representations of the inputs.
CITATION STYLE
Baxter, G. J., Da Costa, R. A., Dorogovtsev, S. N., & Mendes, J. F. F. (2020). Complex Distributions Emerging in Filtering and Compression. Physical Review X, 10(1). https://doi.org/10.1103/PhysRevX.10.011074
Mendeley helps you to discover research relevant for your work.