Skip to main navigation menu Skip to main content Skip to site footer

Graphical alignment of sequences through parallel programming: an approach from the post-genomic era

Alineamiento gráfico de secuencias a través de programación paralela: un enfoque desde la era postgenómica



Open | Download


Section
Articles

How to Cite
Graphical alignment of sequences through parallel programming: an approach from the post-genomic era. (2020). Revista Ingeniería Biomédica, 13(26). https://doi.org/10.24050/19099762.n26.2019.1404

Dimensions
PlumX
Citations
Johan Sebastian Piña Duran
Simón Orozco Arias
Romain Guyot
Reinel Tabares Soto
Nicolás Tobón Orozco

Johan Sebastian Piña Duran,

Estudiante de pregrado en ingeniería biomédica y electrónica, perteneciente al semillero de Bioinformática e inteligencia artificial de la universidad Autónoma de Manizales.

Simón Orozco Arias,

Ingeniero de sistemas, estudiante de doctorado en ingeniería, docente del departamento de ciencias computacionales de la universidad Autónoma de Manizales

Romain Guyot,

PhD. en genética de plantas. Investigador de la universidad autónoma de manizales e investigador del CIRAD en Francia

Reinel Tabares Soto,

Ingeniero electrónico y de sistemas. Magister en automatización industrial. Estudiante del doctorado en ingeniería, Docente de la universidad Autónoma de Manizales y coordinador del programa de ingeniería electrónica.

Nicolás Tobón Orozco,

Estudiante de pregrado en ingeniería biomédica y electrónica, perteneciente al semillero de Bioinformática e inteligencia artificial de la universidad Autónoma de Manizales.

Mariana Sofía Candamil Cortés,

Estudiante de pregrado en ingeniería biomédica y electrónica, perteneciente al semillero de Bioinformática e inteligencia artificial de la universidad Autónoma de Manizales.

A graphical alignment or “dot plot” is a method of visual representation of genomic data analysis, commonly used to compare the similarity of two biological sequences. The DOTTER program, developed in 1995, is the most widely used tool for this type of task. The biggest problem with this software is the high runtime for large scale genomic data. GEPARD (2007), performs faster alignments for larger sequences than DOTTER, but reducing the execution time of the alignment of a chromosome against itself, from 382 years with DOTTER to 61 minutes with GEPARD, although with a low level of detail because it uses an approximation method. This article proposes a strategy that works on multiple processors to perform genomic-level alignments in a shorter run time than GEPARD, achieving accelerations up to 27.9 times using 64 processors from the nominal value. The strategy allows the identification of chromosomal rearrangements, repetitive elements, comparison between genomes of different species and the graphic measurement of the assembly quality of genomic sequences quickly. 


Article visits 898 | PDF visits 672


Downloads

Download data is not yet available.
  1. O. Lecompte, J. D. Thompson, F. Plewniak, J.-C. Thierry, and O. Poch, “Multiple alignment of complete sequences (MACS) in the post-genomic era,” Gene, vol. 270, no. 1, pp. 17–30, 2001.
  2. N. M. Luscombe, D. Greenbaum, and M. Gerstein, “A Proposed Definition and Overview of the Field,” Methods Inf. Med., vol. 40, no. 4, pp. 346–358, 2001.
  3. S. P. Holmes and D. Gusfield, “Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology,” J. Am. Stat. Assoc., vol. 94, no. 447, p. 989, 1999.
  4. W. Chen, B. Liao, and W. Li, “Use of image texture analysis to find DNA sequence similarities,” J. Theor. Biol., vol. 455, pp. 1–6, 2018.
  5. B. Liao and T.-M. Wang, “New 2D graphical representation of DNA sequences,” J. Comput. Chem., vol. 25, no. 11, pp. 1364–1368, 2004.
  6. T. F. Smith and M. S. Waterman, “Identification of common molecular subsequences,” J. Mol. Biol., vol. 147, no. 1, pp. 195–197, 1981.
  7. S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” J. Mol. Biol., vol. 48, no. 3, pp. 443–453, 1970.
  8. A. L. Delcher, S. Kasif, R. D. Fleischmann, J. Peterson, O. White, and S. L. Salzberg, “Alignment of whole genomes,” Nucleic Acids Res., vol. 27, no. 11, pp. 2369–2376, 1999.
  9. E. L. L. Sonnhammer and R. Durbin, “A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis (Reprinted from Gene Combis, vol 167, pg GC1-GC10, 1996),” Gene, vol. 167, no. 1–2, pp. Gc1–Gc10, 1995.
  10. J. Krumsiek, R. Arnold, and T. Rattei, “Gepard: A rapid and sensitive tool for creating dotplots on genome scale,” Bioinformatics, vol. 23, no. 8, pp. 1026–1028, 2007.
  11. S. Orozco-Arias, R. Tabares-Soto, D. Ceballos, and R. Guyot, “Parallel Programming in Biological Sciences, Taking Advantage of Supercomputing in Genomics,” in Advances in Computing, 2017, pp. 627–643.
  12. D. Milone, A. Azar, and H. Rufiner, “Supercomputadoras basadas en ‘clusters’ de PCs,” Rev. Cienc., pp. 173–208, 2002.
  13. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” J. Mol. Biol., vol. 215, no. 3, pp. 403–410, 1990.
  14. S. Orozco Arias et al., “Inpactor, Integrated and Parallel Analyzer and Classifier of LTR Retrotransposons and Its Application for Pineapple LTR Retrotransposons Diversity and Dynamics,” Biology (Basel)., vol. 7, p. 32, 2018.
  15. B. Langmead and S. L. Salzberg, “Fast gapped-read alignment with Bowtie 2,” Nat. Methods, vol. 9, no. 4, pp. 357–359, 2012.
  16. G. Van Rossum and F. L. Drake Jr, Python reference manual. Centrum voor Wiskunde en Informatica Amsterdam, 1995.
  17. S. Schwartz et al., “Human-mouse alignments with BLASTZ.,” Genome Res., vol. 13, no. 1, pp. 103–107, 2003.
  18. S. Hicks, D. A. Wheeler, S. E. Plon, and M. Kimmel, “Prediction of missense mutation functionality depends on both the algorithm and sequence alignment employed,” Hum. Mutat., vol. 32, no. 6, pp. 661–668, 2011.
  19. G. L. Johanning et al., “Expression of human endogenous retrovirus-K is strongly associated with the basal-like breast cancer phenotype,” Sci. Rep., vol. 7, no. February, pp. 1–11, 2017.
  20. S. van der Walt, S. C. Colbert, and G. Varoquaux, “The NumPy Array: A Structure for Efficient Numerical Computation,” Comput. Sci. Eng., vol. 13, no. 2, pp. 22–30, 2011.
  21. J. D. Hunter, “Matplotlib: A 2D Graphics Environment,” Comput. Sci. Eng., vol. 9, no. 3, pp. 90–95, May 2007.
  22. M. Hattori et al., “The DNA sequence of human chromosome 21 - supplement table,” Nature, vol. 405, no. May, p. 7118, 2000.
  23. M. Jette, A. Yoo, and M. Grondona, “SLURM: Simple linux utility for resource management,” in LECTURE NOTES IN COMPUTER SCIENCE, 2003.
  24. H. Carroll, P. Ridge, M. Clement, and Q. Snell, “Effects of gap open and gap extension penalties,” Proc. Third …, pp. 1–5, 2006.
  25. J. L. Wegrzyn et al., “Unique features of the loblolly pine (Pinus taeda L.) megagenome revealed through sequence annotation,” Genetics, vol. 196, no. 3, pp. 891–909, 2014.