/sci/ Statistics Algorithm

902x951

Screenshot 2025-0....png

🧵 Statistics Algorithm

Anonymous at Tue, 18 Mar 2025 17:33:09 UTC No. 16622510

We have 1000 kids with different clothing (shirt, pants, hat).
We have made three lists counting how many kids had a clothing piece of that color.

We now need to make the best assumption on all clothing combinations. We assume the clothing colors are completely independent of each other.
But there are two requirements:
1. We cannot guess the wrong total number of kids. It needs to be exactly 1000.
2. For each clothing catgory, the total number must also match. (E.g. we KNOW there are 50 kids with yellow pants, so we CANNOT guess 53).

Normal rounding doesn't do it. In the screenshot you can see that we get both wrong.
What is the right algorithm here?

Anonymous at Tue, 18 Mar 2025 18:16:13 UTC No. 16622541

>>16622510
Where is your original raw data?

Anonymous at Tue, 18 Mar 2025 19:48:51 UTC No. 16622602

>>16622541
The three little tables on the left are the real counts. There is no way to know the exact intersections. I just want an equal distribution without getting wrong totals

Anonymous at Tue, 18 Mar 2025 21:19:31 UTC No. 16622713

>>16622602
So you want synthetic data so that the marginals and totals match? Rounding "normally" gives you 10 to many people.
Find 10 times you rounded up and round down.
Also consider the possiblity that such a synthetic set might not exist exactly.

Anonymous at Tue, 18 Mar 2025 21:53:55 UTC No. 16622771

>>16622510
>We assume the clothing colors are completely independent of each other.
Wildly invalid. Also, where is the black?

Anonymous at Tue, 18 Mar 2025 22:01:55 UTC No. 16622814

>>16622713
Yes. I just need the chance for the solution to be correct more than 0%, which it is in this case. This I don‘t care about if it‘s realistic. If I just round down 10 values at random, then I might not exactly include the 3 yellow pants ones, again making my guess provably wrong. So I have to make sure I round down exactly 3 yellow pants (and this goes for every other information from the left 3 tables). So it gets very complicated quickly.

🗑️ Anonymous at Thu, 20 Mar 2025 16:30:51 UTC No. 16624333

>>16622510
You will be able to get whole-number values that sum to 1000 if you use the Maximum Likelihood Estimator. Have you heard of it? In many cases it not easy to calculate by hand, but for this question you can calculate it pretty easily. Are you in university?

Anonymous at Thu, 20 Mar 2025 16:41:24 UTC No. 16624338

>>16622510
judging by the “yellow-yellow-yellow” line at the bottom, your estimate pre-rounding is the Maximum Likelihood Estimator. That’s a pretty good estimate when n=1000. Do you really need to round it to an integer??? If you really need to, what I would do is use a computer calculate a lot of possible rounded probabilities. A million calculations is easy with a computer