Image not available

829x651

geno.jpg

๐Ÿงต computers are unable to align entire human genome

Anonymous No. 16069568

its amazing how few people know this but computers cannot do an alignment of entire human genome against some other similarly sized animal genome (like some lizard that would be a very distant relative of humans, or maybe a mouse that is much closer relative than any lizard are)

human genome is 3 gigabytes, mouses are 2.7 gigabytes (and 21 chromosomes where as human has 24)

a computer cannot align two files that are that huge

alignment means a human has a gene sequence of lets say AAATTGGAA and a mouse has AAATCGGAA, a computer tries to align them and finds out for these parts TT is changed to TC but otherwise this piece was identical

then it is put into format
AAATTGGAA
AAATCGGAA
and computer continues to build it until it reaches full 3 gigabytes

the problem is no computer is able to do it, they run out of memory or if not, the task that started in 2016 is still ongoing with no end in sight..

however mitochondrion of animals is much smaller, in fact almost entire mitochondrion as written text (gene sequence) fits on a full screen in 1920x1080 resolution if font size is 10 (in this we presume you have hit enter to end a line of text when its about to reach notepads width so that a new line begins)

Anonymous No. 16069572

Human and mouse mitochondrion can be compared very easily, even with computers from the year 2000 (it takes many minutes with 2000s computer but for modern computer its merely seconds)

Anonymous No. 16069575

Consequence of Man being created by God

Anonymous No. 16069576

the problem is: if we only take little parts of genome from here and there and compare a human piece to mouse piece, the chances are we dont get two pieces that were even doing the same thing in human and mouse, its like comparing apples to oranges

what we actually need is computing power to compare the entirety of 3 gigabytes to the entirety of 2.7 gigabytes

the animal tree of life you see on the internet is based on mitochondria data only because its the only data computers can handle

Anonymous No. 16069580

>>16069568
What if you use more than 3 + 2.7 GB of ram

Anonymous No. 16069607

>>16069576

>compare the entirety of 3 gigabytes to the entirety of 2.7 gigabytes

That's too many flops to multiply bucko. It'll never happen, kiddo.

Anonymous No. 16069659

>>16069580
it doesnt work, 128GB of RAM was not enough
>>16069607

maybe if someone is able to write a program in x86 ASSEMBLY that does these calculations, we may go somewhere with it

Anonymous No. 16069664

>>16069575
>>16069607

here is but a small part of source of the program used today

float Pevo_full[]=
{
// - H E C
1.00, 0.00, 0.00, 0.00,
0.00, 0.94, 0.00, 0.04,
0.00, 0.00, 0.92, 0.04,
0.00, 0.06, 0.08, 0.92
};

//psipred accuracy for confidence values 0-9
const float p_acc[]={0.00,0.47,0.53,0.56,0.58,0.62,0.69,0.74,0.82,0.88,0.96};

/**
* @brief
*/
void
SetBlosumMatrix(const float BlosumXX[])
{
int a,b,n=0;
if (v>=3) printf("Using the BLOSUM%2i matrix\n",par.matrix);
for (a=0; a<20; ++a)
for (pb[a]=0.0f, b=0; b<=a; ++b,++n)
P[a][b] = BlosumXX[n];
for (a=0; a<19; a++)
for (b=a+1; b<20; ++b)
P[a][b] = P[b][a];
for (a=0; a<20; ++a) P[a][20]=P[20][a]=1.0f;
return;
}

/////////////////////////////////////////////////////////////////////////////////////
/**
* @brief Set (global variable) substitution matrix with derived matrices and background frequencies
*/
void
SetSubstitutionMatrix()
{
int a,b;
switch (par.matrix)
{
default:
case 0: //Gonnet matrix
if (v>=3) cout<<"Using the Gonnet matrix ";
for (a=0; a<20; ++a)
for (pb[a]=0.0f, b=0; b<20; ++b)
P[a][b] = 0.000001f*Gonnet[a*20+b];
for (a=0; a<20; ++a) P[a][20]=P[20][a]=1.0f;
break;

case 30: //BLOSUM30
SetBlosumMatrix(Blosum30);
break;
case 40: //BLOSUM40
SetBlosumMatrix(Blosum40);
break;
case 50: //BLOSUM50
SetBlosumMatrix(Blosum50);
break;
case 65: //BLOSUM65
SetBlosumMatrix(Blosum65);
break;
case 80: //BLOSUM80
SetBlosumMatrix(Blosum80);
break;
}

Anonymous No. 16069688

>>16069568
>mouses
mice
>human has 24
23

Anonymous No. 16069700

>>16069664
>Running an O(n^2) algorithm on a 3Gb dataset
Gee why isn't it finishing.

Anonymous No. 16069702

>>16069688
humans have 24 according to human genome projects database, 22 of these are non sexual by their nature

Anonymous No. 16069703

>>16069700
there are no better programs made

I wish someone made one with x86 assembly thou

that c++ program works fine when dataset is very much smaller than 3GB

Anonymous No. 16069796

>>16069568
can't you use some kind of convolution or transform to make it smaller? The match has to be substantial anyway to avoid random matches, so why not shrink it in some way?

Anonymous No. 16069815

>>16069796
It can be shrung considerably in length: transform the entire 3 gigabytes of ATCG:s into protein code

Then it becomes 1 gigabyte file, because every 3 letters of nucleotiedes (ATCG) will produce one letter of protein.

But now we have a new problem: proteins have 16 different letters so now instead of 4 nucleotides we have to deal with 16 different letters

Anonymous No. 16069818

>>16069796
also if you are contemplating on something like "lets compare human X chromosome to mouse X chromosome only and forget about the rest for now" we run into the problem that maybe we dont know for 100% certainty where X chromosome begns and where it ends and if there is some hidden pieces of code elsewhere in the genome that should belong to X and which the DNA coding machinery of cells do get right but humans havent figured it out how the code is to be accessed

Anonymous No. 16069820

>>16069815
When would you know where to begin? You would have three possible offsets.

What I had in mind was that you could let's convert AAAT into 3xA 0xC 0xG 1xT (or bigger) the same number of bits but less sympbols, then perhaps shrink it even more, and search for similar patterns.

Anonymous No. 16069831

>>16069820
this could work but for some reason nobody has so far made such a program

Anonymous No. 16069836

more effective algoritmer are needed
Instead of just wasting computing power

Anonymous No. 16069839

>>16069836
I agree