Data Sources. Sequences are primarily from GenBank. The genes representing chimpanzees are, in most cases, from the species
Homo (
Pan)
troglodytes, but in a few cases are from
Homo (
Pan)
paniscus. If a gene appeared from its sequence to be nonfunctional, i.e., a pseudogene, it was discarded. In choosing the nonhuman genes to compare with a human gene and to one another, we also discarded any suspected of being paralogously related to the human gene, i.e., in this case, suspected of being related by a last common gene ancestor that duplicated long before the most recent common species ancestor. Our aim was to compare functional coding sequences that are orthologously related; i.e., each interspecies pair traces back to a single last common gene ancestor that existed in the most recent common species ancestor. However, without transcriptional data on many of the loci it is possible that some pseudogenes and/or paralogs were inadvertently compared. Our dataset of inferred orthologous functional coding sequences encompasses 97 loci for both humans and chimpanzees, and, among the 97, 67 were available for gorilla (
Gorilla gorilla), 69 for orangutan (
Pongo pygmaeus), 58 for at least one OWM, and 49 for mouse (
Mus musculus; chosen because they were represented by at least four primate taxa). Sequences were aligned with the clustal algorithm as implemented in MACVECTOR 7.0 (Accelrys, Burlington, MA) and verified by eye. Putative orthologous sequences were first aligned on a gene-by-gene basis and subsequently concatenated for further analysis into a single coding "supergene" alignment that represented 93,045 nucleotide positions including indels (insertions/deletions). Human RefSeq numbers for each individual locus examined and GenBank accession numbers for nonhuman sequences can be found in Table 6, which is published as supporting information on the PNAS web, site
www.pnas.org, and at
www.genetics.wayne.edu/lgross/primates.htm. Previously unpublished sequences from the cytochrome
c locus (
CYCS) were obtained by using standard PCR-based procedures and f luorescence-based automated sequencing protocols. Primer sequences and cycling conditions are presented in
Supporting Text, which is published as supporting information on the PNAS web site.