Analyses of viral genetic linkage can provide insight into HIV transmission dynamics and the impact of prevention interventions. For example, such analyses have the potential to determine whether recently-infected individuals have acquired viruses circulating within or outside a given community. In addition, they have the potential to identify characteristics of chronically infected individuals that make their viruses likely to cluster with others circulating within a community. Such clustering can be related to the potential of such individuals to contribute to the spread of the virus, either directly through transmission to their partners or indirectly through further spread of HIV from those partners. Assessment of the extent to which individual (incident or prevalent) viruses are clustered within a community will be biased if only a subset of subjects are observed, especially if that subset is not representative of the entire HIV infected population. To address this concern, we develop a multiple imputation framework in which missing sequences are imputed based on a model for the diversification of viral genomes. The imputation method decreases the bias in clustering that arises from informative missingness. Data from a household survey conducted in a village in Botswana are used to illustrate these methods. We demonstrate that the multiple imputation approach reduces bias in the overall proportion of clustering due to the presence of missing observations.
Liu, S. H., Erion, G., Novitsky, V., & De Gruttola, V. (2015). Viral genetic linkage analysis in the presence of missing data. PLoS ONE, 10(8). https://doi.org/10.1371/journal.pone.0135469