Species distribution modeling (SDM) calculates a species' probabilistic distribution by combining Environmental raster layers with species datasets. Such models can help to answer complex questions in Ecology/Biology/Health, e.g., by calculating impacts of climate changes in Biodiversity, or the potential for a disease spread (vectors' modeling). Machine learning is largely applied in SDM, being the Genetic Algorithm for Rule-set Production (GARP) one of the most reliable solutions. However, GARP's convergence needs to speedup under certain conditions (high resolution or number of layers), for which this paper proposes P-GARP, a parallel, scalable implementation of GARP. P-GARP was implemented onto a SGI Altix XE 1300 cluster with 2 quad-core processors/node. Preliminary results show an expressive 3.2/node speedup. Premature convergence is not observed in PGARP and its accuracy is very similar to GARP´s. Effective solutions to improve this speedup in even larger scale are proposed, along with a discussion about P-GARP correctness and efficiency.
Santana, F., Bravo Pariente, C. A., & Saraiva, A. M. (2017). Species distribution modeling with scalability: The case study of P-GARP, a parallel genetic algorithm for rule-set production. In Proceedings - 2017 IEEE International Conference on Information Reuse and Integration, IRI 2017 (Vol. 2017-January, pp. 162–170). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/IRI.2017.93