๐งต computers are unable to align entire human genome
Anonymous at Tue, 12 Mar 2024 05:25:17 UTC No. 16069568
its amazing how few people know this but computers cannot do an alignment of entire human genome against some other similarly sized animal genome (like some lizard that would be a very distant relative of humans, or maybe a mouse that is much closer relative than any lizard are)
human genome is 3 gigabytes, mouses are 2.7 gigabytes (and 21 chromosomes where as human has 24)
a computer cannot align two files that are that huge
alignment means a human has a gene sequence of lets say AAATTGGAA and a mouse has AAATCGGAA, a computer tries to align them and finds out for these parts TT is changed to TC but otherwise this piece was identical
then it is put into format
AAATTGGAA
AAATCGGAA
and computer continues to build it until it reaches full 3 gigabytes
the problem is no computer is able to do it, they run out of memory or if not, the task that started in 2016 is still ongoing with no end in sight..
however mitochondrion of animals is much smaller, in fact almost entire mitochondrion as written text (gene sequence) fits on a full screen in 1920x1080 resolution if font size is 10 (in this we presume you have hit enter to end a line of text when its about to reach notepads width so that a new line begins)
Anonymous at Tue, 12 Mar 2024 05:27:14 UTC No. 16069572
Human and mouse mitochondrion can be compared very easily, even with computers from the year 2000 (it takes many minutes with 2000s computer but for modern computer its merely seconds)
Anonymous at Tue, 12 Mar 2024 05:31:11 UTC No. 16069575
Consequence of Man being created by God
Anonymous at Tue, 12 Mar 2024 05:32:57 UTC No. 16069576
the problem is: if we only take little parts of genome from here and there and compare a human piece to mouse piece, the chances are we dont get two pieces that were even doing the same thing in human and mouse, its like comparing apples to oranges
what we actually need is computing power to compare the entirety of 3 gigabytes to the entirety of 2.7 gigabytes
the animal tree of life you see on the internet is based on mitochondria data only because its the only data computers can handle
Anonymous at Tue, 12 Mar 2024 05:35:07 UTC No. 16069580
>>16069568
What if you use more than 3 + 2.7 GB of ram
Anonymous at Tue, 12 Mar 2024 06:14:25 UTC No. 16069607
>>16069576
>compare the entirety of 3 gigabytes to the entirety of 2.7 gigabytes
That's too many flops to multiply bucko. It'll never happen, kiddo.
Anonymous at Tue, 12 Mar 2024 07:16:28 UTC No. 16069659
>>16069580
it doesnt work, 128GB of RAM was not enough
>>16069607
maybe if someone is able to write a program in x86 ASSEMBLY that does these calculations, we may go somewhere with it
Anonymous at Tue, 12 Mar 2024 07:24:10 UTC No. 16069664
>>16069575
>>16069607
here is but a small part of source of the program used today
float Pevo_full[]=
{
// - H E C
1.00, 0.00, 0.00, 0.00,
0.00, 0.94, 0.00, 0.04,
0.00, 0.00, 0.92, 0.04,
0.00, 0.06, 0.08, 0.92
};
//psipred accuracy for confidence values 0-9
const float p_acc[]={0.00,0.47,0.53,0.56,0.58,0
/**
* @brief
*/
void
SetBlosumMatrix(const float BlosumXX[])
{
int a,b,n=0;
if (v>=3) printf("Using the BLOSUM%2i matrix\n",par.matrix);
for (a=0; a<20; ++a)
for (pb[a]=0.0f, b=0; b<=a; ++b,++n)
P[a][b] = BlosumXX[n];
for (a=0; a<19; a++)
for (b=a+1; b<20; ++b)
P[a][b] = P[b][a];
for (a=0; a<20; ++a) P[a][20]=P[20][a]=1.0f;
return;
}
///////////////////////////////////
/**
* @brief Set (global variable) substitution matrix with derived matrices and background frequencies
*/
void
SetSubstitutionMatrix()
{
int a,b;
switch (par.matrix)
{
default:
case 0: //Gonnet matrix
if (v>=3) cout<<"Using the Gonnet matrix ";
for (a=0; a<20; ++a)
for (pb[a]=0.0f, b=0; b<20; ++b)
P[a][b] = 0.000001f*Gonnet[a*20+b];
for (a=0; a<20; ++a) P[a][20]=P[20][a]=1.0f;
break;
case 30: //BLOSUM30
SetBlosumMatrix(Blosum30);
break;
case 40: //BLOSUM40
SetBlosumMatrix(Blosum40);
break;
case 50: //BLOSUM50
SetBlosumMatrix(Blosum50);
break;
case 65: //BLOSUM65
SetBlosumMatrix(Blosum65);
break;
case 80: //BLOSUM80
SetBlosumMatrix(Blosum80);
break;
}
Anonymous at Tue, 12 Mar 2024 08:10:24 UTC No. 16069688
>>16069568
>mouses
mice
>human has 24
23
Anonymous at Tue, 12 Mar 2024 08:30:19 UTC No. 16069700
>>16069664
>Running an O(n^2) algorithm on a 3Gb dataset
Gee why isn't it finishing.
Anonymous at Tue, 12 Mar 2024 08:31:11 UTC No. 16069702
>>16069688
humans have 24 according to human genome projects database, 22 of these are non sexual by their nature
Anonymous at Tue, 12 Mar 2024 08:32:34 UTC No. 16069703
>>16069700
there are no better programs made
I wish someone made one with x86 assembly thou
that c++ program works fine when dataset is very much smaller than 3GB
Anonymous at Tue, 12 Mar 2024 10:00:40 UTC No. 16069796
>>16069568
can't you use some kind of convolution or transform to make it smaller? The match has to be substantial anyway to avoid random matches, so why not shrink it in some way?
Anonymous at Tue, 12 Mar 2024 10:18:06 UTC No. 16069815
>>16069796
It can be shrung considerably in length: transform the entire 3 gigabytes of ATCG:s into protein code
Then it becomes 1 gigabyte file, because every 3 letters of nucleotiedes (ATCG) will produce one letter of protein.
But now we have a new problem: proteins have 16 different letters so now instead of 4 nucleotides we have to deal with 16 different letters
Anonymous at Tue, 12 Mar 2024 10:20:19 UTC No. 16069818
>>16069796
also if you are contemplating on something like "lets compare human X chromosome to mouse X chromosome only and forget about the rest for now" we run into the problem that maybe we dont know for 100% certainty where X chromosome begns and where it ends and if there is some hidden pieces of code elsewhere in the genome that should belong to X and which the DNA coding machinery of cells do get right but humans havent figured it out how the code is to be accessed
Anonymous at Tue, 12 Mar 2024 10:28:45 UTC No. 16069820
>>16069815
When would you know where to begin? You would have three possible offsets.
What I had in mind was that you could let's convert AAAT into 3xA 0xC 0xG 1xT (or bigger) the same number of bits but less sympbols, then perhaps shrink it even more, and search for similar patterns.
Anonymous at Tue, 12 Mar 2024 10:44:03 UTC No. 16069831
>>16069820
this could work but for some reason nobody has so far made such a program
Anonymous at Tue, 12 Mar 2024 10:50:39 UTC No. 16069836
more effective algoritmer are needed
Instead of just wasting computing power
Anonymous at Tue, 12 Mar 2024 10:52:03 UTC No. 16069839
>>16069836
I agree