I’ve noticed there are web sites that allow people to convert files between 23andMe and Ancestry format. Convenient, but what do they do with the data? Possibly nothing, or perhaps you’re happy to open source your DNA which is great. But if not, the conversion is pretty simple and can be done with tools that are available or easily installed on most computers. Here is an example using gawk, a command line tool that can process and edit text files line by line.

First make sure gawk in installed. For linux, gawk is available by default on some distributions, while on others it needs to be installed. On my mac I installed it with: brew install gawk. It appears to be available for windows as well.

Ancestry to 23andMe

With gawk installed, create a gawk script that takes a single line in Ancestry format and changes it to 23andMe format, ignoring initial comments and the header used by Ancestry files. Name the file anything you want. I’m saving the script in a file named ancestry_to_23andme.gawk.

function isComment(str) {
  return substr(str,0,1) == "#"
}

function isHeader(str) {
  return substr(str,0,4) == "rsid"
}

{
  d="\t"
  if(!isComment($0) && !isHeader($0)) {
    print $1 d $2 d $3 d $4 $5
  } 
}

Now we can run gawk using the script and redirect output to a file using the greater than symbol.

gawk -f ancestry_to_23andme.gawk ancestry_file > generated_23andme_file

The generated file is now in 23andme format.

cat generated_23andme_file
rs4682826	3	46705369	AG
...

Now let’s convert the file back to Ancestry format.

23andMe to Ancestry

For this convertion, we’ll modify the gawk script used above so it does the conversion in reverse. We’ll ignore initial comments as before, but this time there is no header.

function isComment(str) {
  return substr(str,0,1) == "#"
}

{
  d="\t"
  if(!isComment($0)) {
    print $1 d $2 d $3 d substr($4,0,1) d substr($4,1,1)
  } 
}

Ancestry files use a header before the actual data. Let’s execute the gawk script using a shell script that adds the header for us. I’m calling the shell script 23andme_to_ancestry.sh.

#!/bin/bash

d=$'\t'
echo rsid"$d"chromosome"$d"position"$d"allele1"$d"allele2
gawk -f 23andme_to_ancestry.gawk $1

Give yourself permission to run the script. Then run it using the name of the file to convert as an argument. As before, redirect the output to a file using the greater than symbol.

chmod u+x 23andme_to_ancestry.sh
./23andme_to_ancestry.sh generated_23andme_file > generated_ancestry_file

The generated file is back to Ancestry format.

cat generated_ancestry_file
rsid    chromosome      position        allele1 allele2
rs1424968	2	53327092	C	C
...