LOS ANGELES — Scouring information available to anyone with an Internet connection, a team of genetic sleuths deduced the names of dozens of supposedly anonymous people who had their DNA analyzed for scientific and medical research.
The snooping feat, which took advantage of genealogy websites that let people compare their DNA to search for relatives, was in full compliance with federal privacy regulations. Experts said it underscored a stark reality about genetic privacy in the age of social media: Don’t count on it.
“Nobody can promise privacy,” said Mildred Cho, who heads up Stanford University’s Center for Integration of Research on Genetics and Ethics, and wasn’t involved with the study.
Whitehead Institute geneticist Yaniv Erlich and his team, who described their work Thursday in the journal Science, didn’t provide a complete recipe that would help others ferret out the identities of research volunteers. Nor did they divulge the names of the people they were able to unmask.
Since the first draft of the human genome was published in 2000, scientists have scrutinized its 3 billion pairs of DNA letters to try to find variants that cause disease, to understand human physiology, and to unravel the evolutionary history of our species.
Toward that end, academic efforts like the 1000 Genomes Project post complete genomes online for public use. The idea is that providing free access to the data will allow scientists to compare DNA from many people and help them discover connections between genes and traits, eventually leading to the development of personalized, targeted treatments for a wide range of disorders.
Keeping genomic data private has been a concern all along. Worries that health insurers or employers might use information about genetic health risks to drop benefits or discriminate against workers inspired the 2008 Genetic Information Nondiscrimination Act, which provides protection against abuse. Last year, the Presidential Commission for the Study of Bioethical Issues recommended a variety of additional measures to further secure genetic data.
Potentially complicating these efforts are the legions of amateur geneticists who want to learn their risk for diseases or gain clues about their ancestry. As sequencing costs have dropped, these enthusiasts have sent vials of saliva, swabs of cheek cells, circles of dried blood or other types of DNA samples to private sequencing companies. Often, they post their tests results online, for the world to see.
Erlich has been interested in privacy since he worked as a professional hacker — breaking into corporate networks as a “vulnerability researcher” for a computer security company — to help support himself in college. He started planning the current research after hearing about a 15-year-old boy who had part of his genome sequenced in 2005 in order to find his biological father, a sperm donor.
The boy compared a pattern of repeating DNA letters from his Y chromosome to the corresponding patterns of men who had posted their genetic data on a genealogy website. Finding several men whose pattern matched his led him to his father’s last name. He then used other clues to make contact.
Y chromosomes correlate with surnames because both are passed directly from father to son.
Erlich said he thought the boy’s approach was “brilliant,” and he wondered if his lab could do something similar with public genome data.
He and his colleagues started by analyzing the repeat patterns of Y chromosomes in published studies of genomes whose owners were known. They used a free genealogy website to look for surname matches.