Image not available

368x248

OU.png

🧵 /psg/ probability and statistics general

Anonymous No. 16187525

previous thread >>16174616

This is one of the boards newest generals. Fairly high activity due to edge lords trying to be funny but instead spreading facts about the absolute state of our world.

Intro stats is fairly easy, intermediate stats come the programming and we have already have several battles about what language is the best in the thread. Nobody uses SAS funnily enough, SPSS has had some people trying to joust the edgelords who are into R and C++, while the stata children are silent as usual.

Come one, come all. State your dumb questions, /pol/tardy or not. Some fairly useful and funny math is showcased in this thread.

Anonymous No. 16187543

Stats people use C++? Since when?

Anonymous No. 16188514

>>16187525
the previous thread 404'd after a small handful of replies because it wasn't launched with a racial crime statics pics, this thread condemned itself it a similar fate.
lrn2/psg/ fagit

Anonymous No. 16188592

>>16188514
Is that so?

🗑️ Anonymous No. 16189863

Bump

Anonymous No. 16189868

>>16189863
If you're that desperate for attention then you should have stuck to the thread guidelines outlined in >>16188514

>>16188592
Yes, as demonstrated by the fact that you had to bump your garbage thread that nobody wants too see off the ass end of page 10.

Image not available

1x1

amari-info-geo-co....pdf

Anonymous No. 16190158

>>16187525

Any anons working on information geometry?

Anonymous No. 16190885

>>16190158
It would be interesting to hear a bit about the applications of this.

🗑️ Anonymous No. 16190890

>>16189868
Seethe harder poltard

Anonymous No. 16190892

Can anyone redpill me on Poisson statistics?

Anonymous No. 16190893

>>16187525
Biggest lie in all of statistics is that events are independent. They're not. In casinos it's common to see streaks of 10 in roulette. If the next event is independent you'd expect 50% of the times 10 streaks are observed to continue to 11. That's not what happens.

Anonymous No. 16190985

I’m doing a course on ODE and the Laplace transform is absolutely not motivated lol. Digging through wiki it appears Laplace used a similar method in working with probabilities. Anyone have more info? Any decent introductory books on probabilities, especially ones that motivate Laplace transform?

Anonymous No. 16192246

Bumpety

Anonymous No. 16192401

>>16190985
I'd say they have a definite place in general stochastic processes. No experience myself, but just thinking a loud.

Anonymous No. 16192404

>>16187525
What would be a good, complete beginner, book for probability and statistics and one good follow-up book? Preferably with depth, just not so much as to be overwhelming for a beginner.

>>16190158
That sounds fascinating. What is it?

Anonymous No. 16192443

>>16187525
What's the difference between gamma and inverse gaussian distributions? I'm doing generalized linear mixed-effects modeling

Anonymous No. 16192808

>>16192404
I think 'statistical inference' by casella and berger is a good follow-up. You can find the text and solution manual on libgen (beware that the solution manual actually has mistakes in it).

To get the most out of the text, I think being solid in calculus would help. The examples/problems can be quite rigorous, so it may not be the best starting point.

Anonymous No. 16192812

how do I find a consulting job to make extra money as a PhD student? I am doing CS/ML. it's harder to find internship/studentship nowaday in FAANG btw.

Anonymous No. 16192813

>>16192812
fuck, wrong thread

Anonymous No. 16192878

>>16192404
First do one variable and then multivariable calculus. Then do a beginning course in probability and stats. Then applied stats. Then a more foundational theoretical course in stats.

Anonymous No. 16192880

>>16192812
Honestly a lot of statisticians are consultants as well. I consult sometimes. Get yourself the simplest LLC you can get in your jurisdiction, do some projects that looks pretty and put it up on some wordpress shit. Then start cold calling companies within your domain knowledge sphere. My domain knowledge is within accounting and economics, so I consult within those spheres and neither accountants nor economists are very good at hardcore stats.

Anonymous No. 16193179

Ay one got data on heights of men in america by age, race, region, etc., spanning multiple decades?

Anonymous No. 16193736

>>16193179
Check https://datausa.io/

Anonymous No. 16193762

Any Gibbs sampling chads here?
Given winbugs/openbugs is dead and ancient what's the best route to go down, JAGS or STAN?

Anonymous No. 16193763

>>16193762
STAN is nice to use but occasionally a pain in the ass to install because of dependencies

Anonymous No. 16193813

>>16193179
https://www.healthdata.org/research-analysis/health-by-location/united-states/county-profiles

Anonymous No. 16193913

>>16192880
Good advice, thanks

Anonymous No. 16193924

>>16193913
What domain knowledge do you have?

Anonymous No. 16194101

>>16192880
Is this a side hustle or your primary income?

Anonymous No. 16194107

>>16194101
side but scalable.

Anonymous No. 16194123

>>16193762
I will second what >>16193763 said: STAN can be a bastard to install. However, it also has a very active community (https://discourse.mc-stan.org/), so chances are good you can get help/find info if you get errors. If you're using R, there's also a couple interfaces out there that might make STAN a bit easier to use.

Anonymous No. 16194150

>>16190985
The Laplace transform of a density function gives you the moment generating function of the random variable. The MGF is very important to certain aspects of statistical signal processing and detection theory (especially large deviations theory and sequential hypothesis testing).

Probability, Random Variables and Stochastic Processes by Papoulis is the standard engineering oriented probability book used at either the upper undergrad or beginning of grad school. Has a decent amount of coverage of the relevance of Fourier and Laplace transforms to probability theory.

Another book that's perhaps less introductory than deals directly with the relevance of the PSD (so Fourier vs Laplace) is Bremaud's Fourier Analysis and Stochastic Processes. That one requires a good bit more analysis background to really work with though.

Anonymous No. 16194520

I'm a bio student and I would like to master statistics. I have taken some intro statistics for biologists, but it was just a couple of lectures about the normal distribution and doing a t-test.

I would like to develop a solid background in statistics, from basics to more advanced topics. What books or online courses do you recommend?

Anonymous No. 16194554

>>16194520
Depends on how lost in the sauce you want to get and also on your math background.

There's basically four "standard texts" in increasing level of difficulty that people recommend for either upper level undergrad or first year grad students that aren't doing measure theoretic probability:
1) Probability and Statistical Inference by Tanis and Hogg
2) All of Statistics by Wasserman
3) Probability and Statistical Inference by Mukhopadhyay
4) Statistical Inference by Casella and Berger

Anonymous No. 16194562

>>16194554
>All of Statistics
can vouch for this. it was a good read. really helped me thru my PhD.

Anonymous No. 16194568

>>16194562
I'd say if the 4 all of statistics and Casella and Berger were the most helpful for me. I'm not a statistician though, I'm an engineer. Can't comment on their usefulness for actual stats grad students.

Anonymous No. 16194569

>>16194568
I've only read number 2 out of the 4 that were listed. needed it cause I were preparing for an ML interview. got recommended by a friend.

Anonymous No. 16194721

>>16187543
Never

Anonymous No. 16194790

>>16194569
Oh, I definitely don't recommend reading all 4 of them. They cover basically the same material but at different levels of depth and slightly different emphasis.

At the point that you've gone through one of them, you probably have enough background that you can just jump right into whatever specific statistics topic you actually want to study directly.

Anonymous No. 16195730

>>16192808
Thanks.
>>16192878
Not what I asked for, but thanks for trying.

Anonymous No. 16196612

>>16193924
Industrial engineering

Anonymous No. 16197214

>>16194554
I want to get balls deep

Anonymous No. 16197904

Bump

Image not available

483x470

764923467892349238.jpg

Anonymous No. 16197968

Who here is reading a stats book, any stats book, daily?

Image not available

827x1241

978-0-387-21718-5.jpg

Anonymous No. 16197977

>>16197214
The deepest you can go is measure theoretic/analysis based statistics. This will give you a lot of ability to tie in tools from more advanced mathematics if you are careful.

Mathematical Statistics by Jun Shao is a pretty good starting point for this, but is assumes you are already fairly comfortable with analysis and measure theoretic probability to a certain degree.

Anonymous No. 16198089

How does one read a regression table ? How do you determine whether a result is statistically significant ? Are p-values (probability of null hypothesis) related to confidence intervals ?

Anonymous No. 16198095

>>16198089
Give us an example table you would like to have interpreted

Anonymous No. 16198153

>>16197968
I am reading the daily racial crime stats.

Anonymous No. 16198182

>>16198153
It's good to stay informed. Thoughbeit that does not count.

Anonymous No. 16198227

>>16197968
I don't read stats books daily, but I have been spending some time on some intermediate probability theory on a pretty close to daily basis recently.

Anonymous No. 16198263

>>16198227
Doing what with it?

Anonymous No. 16198266

>>16198263
Reading the book and doing problems. I'm trying to get a better understanding of continuous time Markov chains.

Anonymous No. 16198270

>>16198266
Can you post the book?

Anonymous No. 16198291

>>16190158
Thank you for the good read.

Image not available

854x351

1685919603099301.jpg

Anonymous No. 16198352

>>16198095
That one for example

Image not available

827x1254

978-3-030-40183-2.jpg

Anonymous No. 16198744

>>16198270
Sorry, I thought I had mentioned it in that post. Looking back I didn't.

I'm going through this book right now. Probably on the easier side for measure theoretic probability, but covers a much wider variety of stochastic process topics than the standard recommendations like Durrett, Ash, etc.

Anonymous No. 16198806

>>16198352
The first row values are means and the ones in square brackets are confidence intervals (minimum and maximum). If the confidence interval crosses 0, the effect is thought to be negligible. If the CI range does not contain 0, it is thought to be statistically different from 0.

First column is calculated as just log income as a function of exports/area. Second column checks if colonizer effect and ln exports together have an effect. Third column checks if geography controls alter the effect of exports and colonizers.

P-value can be checked from a lookup table or a p-value calculator by taking in the F-stat value and calculating degrees of freedom from number of observations (usually N-1).

Anonymous No. 16198821

>>16198153
Worthless.

Anonymous No. 16198836

>>16198821
Nah they are good man. Gotta know what the darkies are up to.

Anonymous No. 16199217

>>16198806
Thank you. Somehow you managed to explain it better than my professors.

Anonymous No. 16200241

Bump

Anonymous No. 16200249

>>16198836
You literally don't. It's hilarious you should say it as you have. You sound more black than I am.

Anonymous No. 16200438

Any good resources for regression modeling?

Anonymous No. 16200503

>>16200438
sci-kit learn user guide is good, not perfect but if you read through it you'll know sci-kit learn well enough at a minimum.

https://scikit-learn.org/stable/user_guide.html

Anonymous No. 16200718

>>16200438
I would suggest 'Regression Modeling Strategies' by Frank Harrel. It's fairly approachable and covers a lot of topics (linear, logistic and ordered regression, model validation , etc).

Anonymous No. 16201177

>>16200249
You are a dumb liberal faggot. What are you doing on 4chan?

Anonymous No. 16201366

>>16201177
Enjoying anime because this is an anime website

Anonymous No. 16201997

>>16200438
I have to learn this too. What book did you end up choosing?

Anonymous No. 16202011

>>16201177
Cope. This is not your safe space, queer.
>>16201366
You're not me. I only rarely watch anime. I haven't seen any since Season 3 of Kimetsu no Yaiba.

Anonymous No. 16202150

>>16202011
kimetsu no what? Are you one of those darkskinned pajeet anime watchers?

Anonymous No. 16203238

>>16198744
I'll check that book out. Thanks

Anonymous No. 16203871

>>16187525
why bother learning advanced SQL, R and stats when the world is run on excel, spss and "line look positive", "p value small" and "program says confidence high"

Anonymous No. 16203884

>>16203871
You have two choices:
1. Join them and be doomed to reinvent the wheel every day
2. Do things that feel right and makes your works reproducible, and build a foundation for the next generation

Anonymous No. 16204032

if you where to have say, 70% of A to happen and 30% of B to happen. even if you have done the math that made you come to this conclusion, would it still technically boil down to guessing?

Anonymous No. 16204063

>>16204032
What do you mean? For any particular experiment (if it's properly random/stochastic) then knowing the distribution doesn't give you any ability to reliably know the outcomes. It can tell you their distribution, and you can make predictions in a statistical sense, but you can't know exactly the outcome of a probabilistic experiment without observing it.

Anonymous No. 16204109

>>16204063
was thinking about situations where there is no guarantee, you are simply just using the knowledge and experience you have to get to a % outcome. like say the weather for meteorology.

Anonymous No. 16204124

>>16204109
Then the answer to your question is yes. If you only know that P(A) = .7, P(B) = .3 and P(A or B) = 1, then you can't know for certain which of the two will happen until it happens.

Anonymous No. 16204137

>>16204124
thanks anon

Anonymous No. 16205941

Bump

Anonymous No. 16206763

Give me a quick rundown on ridgeregressions plox.

Anonymous No. 16207149

>>16206763
There's a few ways you can think about ridge regression.

The most straightforward way (and the way it was originally developed) is that ridge regression imposes an l2 norm constraint on your beta. You're minimizing the mean-square-error subject to your beta being within/on (depending on the setup) some sphere centered around the origin.

Another way of thinking about ridge regression is the Bayesian interpretation. Ridge regression imposes a Gaussian prior on beta.

Anonymous No. 16207320

>>16207149
I always looked at it as an applied lagrange multiplier for statistics and regressions. That it's more of an optimiization thing than an error minimizer.

Anonymous No. 16208186

Is anyone here studying probability / statistics on a daily basis?

Anonymous No. 16208223

>>16207320
You can definitely look at it that way. In the literal sense ridge regression is an equality constraint on the L2 norm of your parameter that your objective function is applied to.

If your objective function is a linear least squares, that's the same thing as maximizing the posterior distribution of your parameter given the data with a Gaussian likelihood function on the data given the parameter and a Gaussian prior on the parameter.

It works out to be tomato tomahto.

Anonymous No. 16208655

>>16208223
Thanks anon. You make me like this thread.

Anonymous No. 16209340

>>16208655
Nice, this is a nice thread

Anonymous No. 16210217

Tell me about the p value, what does it actually mean?

Anonymous No. 16210261

>>16210217
Probability of false alarm. It's basically the probability that the particular data or test statistic you are observing could have happened randomly by chance even though the hypothesis isn't true.

Anonymous No. 16210275

>>16210217
Assuming the null is true, the probability that one obtains results more extreme than what was observed.

This is a nice read about p-values: https://www.fharrell.com/post/pval-litany/#:~:text=A%20p%2Dvalue%20is%20the,the%20effect%20of%20a%20variable.

Anonymous No. 16211385

>>16209340
Yes, a very nice thread.

Anonymous No. 16211928

>>16208655
>>16209340
>>16211385
reading the first few chapters in the deep learning book by Yoshua bengio group would've give you this exact information. the fact that you guys are excited by this tells you guys are either undergrads or code monkeys who are ML wannabe.

Anonymous No. 16211946

>>16211928
So what if they are undergrads? I don't understand your point. Yes, it's not particularly novel information if you are someone who has spent years doing Bayesian ML/Bayesian statistics, but it takes some time to see the connections between these frequentist regularization methods and the Bayesian MAP formulation of said regularization.

Anonymous No. 16212502

>>16211928
Post pic of hand and it will be brown with CI of 95.

Anonymous No. 16213285

>>16211946
Elitism is good, but it should be with a firm and happy hand. Not with a dull depressed heavy hand.

Anonymous No. 16217068

What is the most difficult branch in statistics?

Anonymous No. 16217241

>>16217068
In what way do you mean difficult? Do you mean mathematically difficult or do you mean practically difficult?

Anonymous No. 16219240

>>16217241
Mathematically difficult

Anonymous No. 16220154

>>16219240
I guess that depends on what you find difficult. Generally statistics gets mathematically complicated when the probability theory gets complicated.

Many people find measure theoretic statistics fairly difficult, and this will propagate throughout all of the related fields (performance analysis and large deviations theory, sequential analysis, information theoretic statistics, etc.) with this formulation.

Anonymous No. 16221523

>>16211928
You're on 4chan, what did you expect?

Anonymous No. 16222603

Statistics is not only useful. It's fun as well. I love to do PDEs on stats problems.

Anonymous No. 16222679

>>16222603
>Statistics is fun
LOL seriously? You like anal (receiving)?
>PDE is fun
Hell yeah it is

Anonymous No. 16223170

>>16222679
classic shitpost. Now go to another thread for retards.

Anonymous No. 16223279

so when are you fags going to prove the theory of probability?

Anonymous No. 16223331

>>16223279
lol lmao even

Anonymous No. 16223393

>>16187543
They do, IF they're also computational mathematicians. The stats universities that are actually trying to push forward new or novel techniques use C++ and then make interfaces with R (because they know the applied community all uses R).

Take the INLA project as an example. And that's just something actively in development.

Anonymous No. 16223446

Why is p-hacking bad? Isn't it literally just what happens as you collect more data regardless of the problem?

From a frequentist standpoint, your intervals and p-values go to zero as more data is collected simply because we are working from the interpretation of constant coefficients in our models. Statistical significance is great and all, but it's not a measure of importance or impact just 'hey this interval doesn't overlap with hypothesis X or other coefficient Y'.

I don't really understand the p-hacking problem whatsoever basically. Especially when combined with any sort of validation techniques or with any follow-on operational type question (statistically significant difference doesn't mean an impactful difference $1 is very statistically significantly different than $1.01 but doesn't actually matter in the majority of contexts).

Anonymous No. 16223471

>>16223446
From my understanding, the problem with p-hacking is that you are collecting a biased sample set. It isn't just that you are collecting more data, it is that you are collecting more data under a specific subset which is more likely to show significance (e.g., tailed or skewed data science towards the extreme cases of the alternative).

It's a case of biased sample selection (or potentially pruning of negative outliers which would make your test statistics more centrally located).

Anonymous No. 16224049

>>16223279
Its more of a question of how long before the theory can be proven with 100 percent accuracy. Any day now im sure..

Anonymous No. 16225382

>>16223279
cope from brainlet

Anonymous No. 16225436

>>16223446
P-hacking implies that you already have decided beforehand what the end result is instead of accepting the data as it is

Anonymous No. 16226195

>>16224049
two more weeks right?

Anonymous No. 16226420

Do any unis teach a completely unbiased course on race statistics?

Anonymous No. 16226493

>>16226420
No. The same way that there are no colleges that teach entirely unbiased courses on any other highly controversial subject where there's still open research questions.

Anonymous No. 16227387

>>16226420
lol god no. If you want to learn the real stuff, you have to learn it yourself. Start with the bell curve. Maybe the closest would be some analysis course on applied criminology at Quantico where they teach how the world works to federales.

Anonymous No. 16228727

>>16226420
There's one prestigious uni called /pol/, you can complete a whole degree on racial statistics there

Anonymous No. 16229173

>>16228727
kek

Anonymous No. 16230012

are random variables a group under convolution?

Anonymous No. 16230927

>>16230012
Define random

Anonymous No. 16230928

your vanity thread is on page 10 again, better bump it quick

Anonymous No. 16231089

>>16230928
lmao

Anonymous No. 16231108

>>16230927
a function from the sample space to a subset of the reals (or real space)

Image not available

618x559

0003.png

Anonymous No. 16232093

I LOVE <3 non parametric stats <3

Anonymous No. 16233286

>>16232093
why?

Anonymous No. 16234287

>>16233286
Fuck normal distributions
Fuck means
Fuck SD

Anonymous No. 16234955

My PI forces me to use Matlab for all the analyses and statistics. It's surprisingly comfy but disgusting at the same time.

Anonymous No. 16235418

>>16234955
You work in some kind of weird finance department?

Anonymous No. 16235478

>>16235418
He probably works for the based department. Matlab is based as fuck. T. Statistical signal processing engineer.

Anonymous No. 16236011

>>16235418
Applied physics

Anonymous No. 16236346

>>16236011
Continue using it. Since you are in the field that actually uses it as a standard.
>>16235478
You my dear sir, are an idiot.

Anonymous No. 16236455

>>16236346
I may be a retard but I'm a based retard who uses a software environment that easily handles constrained optimization of nonlinear objective functions.

bodhi No. 16236499

>>16187525
good thread OP

Anonymous No. 16237433

Redpill me on gamma distributions

Image not available

750x1050

misspelling.png

Anonymous No. 16237819

Was over in another board and got suggested to post here.

Problem:
I'm doing data analysis for a refrigeration-based dehumidification product for a company. Sometimes it goes through QC no problem. Sometimes it has a lot of issues. I want to find out why.

What I've done so far:
I've been able to collate the following data (*):
1-Testing chart data for each product
2-Order form data for each product
3-BOM data for each product
(4-I'm working on getting job routing data for each product atm, as someone else in the other thread suggested to me).
Using 1, I can look at the number of failed charts to get a list of 'good' and 'bad' products.
Using 2, I can filter the previous list to only look at the dehum products.
Once I do this, I have a sample size of maybe 500 (the company is not high-volume, they make niche, custom products).
I've ran the following statistical tests:
-Script to do brute force ANOVAs of components in BOM v. good/bad end-products. This only identified outlier products' materials. For example, it was suggested things like, "The shipping crate used in the outlier is suspect." In general, I got a lot of "Pirates cause global warming" noise.
-Because of the previous results, I made all the data binary (good=1,bad=0,part in BOM=1,part not in BOM=0) and did Fisher p-testing. This only identified 'obvious' parts. Things like, "Yes, all compressors would be suspect, of fucking course, that's how refrigeration works." It didn't narrow anything down.
-I tried running correlations on some relevant variables (e.g., amount of refrigerant in product v. failed test numbers), and I just get noise.
There's a chance I missed something in these two previous tests, because there was a lot of noise to go through.
-Because of the small sample size (500), I feel I'm limited to single-variable analyses.

Can anyone think of anything else I should try?

(*) An aside vent: just getting this data collated, accessible, and cross-referenced was a PIA.

Image not available

300x469

047072210X.jpg

Anonymous No. 16238208

>>16237819
At the end of picrel they go into something similar for VW.

Anonymous No. 16239210

Do any projects graph how much the human genome has changed by year?

Anonymous No. 16239235

>>16237819
You should be looking at processes not data. Just 6M: machine, man, materials, measurements, methods, and mother nature. Process failure must exist in one of these categories.
As a data analysis guy, just pareto it and list which problems are the worst and have them explore those.

Anonymous No. 16240236

>>16190158
Thanks for the book.

Anonymous No. 16240457

>>16198089
Absolute value is larger(preferably much) larger than the absolute value of 2, P-values are close to zero.

Anonymous No. 16241236

Where can I, a noob, just ok in maths, start learning about stats?

Anonymous No. 16241615

>>16241236
Download textbook with open datasets that you can easily get on the publishers website. Start going through the problems one by one until you dun goofed the entire book. Easy peasy lemon squeezy.

Anonymous No. 16242340

>>16241615
which books have these open datasets?

Anonymous No. 16243470

>>16242340
Not exactly a straightforward stats book, but Probabilistic Machine Learning by Kevin Murphy is free, has figures and python code on his GitHub and does have some statistics coverage. Introduction to Statistical Learning also has some code and data available.

Anonymous No. 16244215

What's the point of the charateristic function again? They dont add any insights to the study of a probability distribution, unlike the mgf. So why it even exists.

Anonymous No. 16244291

>>16243470
Nice, thanks

Anonymous No. 16244367

>>16244215
There's a few uses for characteristic functions, especially for sampling distributions and frequency analysis for continuous time Markov chains.

In general though, an MGF is more useful if it's available, however not every probability density function has an MGF (while every probability density has a well defined characteristic function).

Anonymous No. 16245515

>>16234287
Baste

Anonymous No. 16245528

>>16244215
>They dont add any insights to the study of a probability distribution
read harder

Anonymous No. 16245643

>>16236455
>easily handles constrained optimization of nonlinear objective functions
you can always code your own in C++, fag. it's not that hard.

Anonymous No. 16245660

>>16187525
we were taught R at university but now I mostly use Python.

Anonymous No. 16245662

>>16245643
> You should reinvent the wheel using older tools because I don't like you using better tools that others have made.

MATLAB is literally a professionally maintained system designed to be effective at solving these optimization problems. I could implement everything from scratch in assembly too, but it would be stupid to do so when others have spent their life's work building tools to do it for me.

Anonymous No. 16245690

>>16245662
>MATLAB is literally a professionally maintained system
but then you're stuck with Matlab, faggot. it's a horrible language.

Anonymous No. 16245695

>>16245690
> But then you're stuck with MATLAB, the industry standard for solving the exact problems MATLAB excels in.

You might as well say that researchers who study Neural Networks architectures are "stuck with Python."

You don't have to like MATLAB. It's not perfect and it's expensive, but it's not an accident that it's the industry standard in many fields of physics, engineering and optimization. There's nothing that MATLAB does that you couldn't do in some other general purpose language, but you'd likely have to make from scratch tools that MATLAB already handles natively in C.

Image not available

712x697

yasu.png

Anonymous No. 16245856

Matlab = SHIT TIER
Python = MEH TIER
R = GOD TIER

prove me wrong faggits

Anonymous No. 16245882

>>16245856
All three of them are good choices for a general purpose statistics/data analysis language with each having certain things they excel at.

R is fantastic if you are working on theoretical statistics or looking to pull from the (many many) open data libraries from the natural sciences. A lot of the cutting edge of mathematical statistics work gets done in R and that's not an accident.

Python is flexible beyond either of the other two and provides unparalleled support for machine learning/adaptive statistics. If you are doing anything at all involving Neural Networks, decision trees or HMM's Python offers quite a lot to you.

MATLAB is the absolute king of matrix based scientific computing. It's literally what the name stands for, "matrix laboratory." If you are doing work that involves a lot of linear algebra (e.g., non-linear programming based statistics, Bayesian optimization or Kalman filtering/target tracking, adaptive linear filtering or stochastic control, etc.) you basically can't beat what MATLAB has to offer. Python is finally starting to see some decent target tracking support with the work being done by the developers of the Stone Soup library, but if you work in anything at all with radar/sonar/lidar/gps etc. you basically can't avoid Matlab.

Honorable mentions go to Julia for their efforts into scientific computing and emphasis on parallelization. Julia is also a great option to learn (but it's still pretty new so don't be surprised if it's not as well supported as the others).

Anonymous No. 16245924

>>16245882
I was shitpoasting, but I do appreciate your god tier poasts on radar, sonar and applied stats. So when I am shitting on matlab, I am not shitting on you. So that is clear. I am shitting on the universities who are cheap fucks and cannot re-tool their shit to make their students better suited for the market place.

Anonymous No. 16245926

>>16245882
Julia seems like fun, but very niche.

Image not available

256x256

F.png

Anonymous No. 16245967

>>16245882
>MATLAB is the absolute king of matrix based scientific computing
*ahem*

Anonymous No. 16245982

>>16245967
Do people actually still use Fortran? I know a lot of the old gods of the field still reach to Fortran, but I've never met anyone under 70 who uses it on a regular basis.

Anonymous No. 16246016

>>16245982
They do, you can actually get pretty spicy jobs if you have 10 years plus exp with Fortran.

Image not available

566x149

Fortran.png

Anonymous No. 16246019

>>16246016
500+ jobs with Fortran? what the fuckkk?

Anonymous No. 16246048

>>16246016
That's good to know! The only thing in my world that still is actively maintained in Fortran is the official OA Labs engine for Bellhop/Kraken for underwater acoustic ray tracing. It's neat to hear that people are still actively using Fortran for real development in the year of our Lord 2024. Makes me feel less old.

Anonymous No. 16247103

>>16245967
I've wanted to learn Fortran for a while but never bothered

Anonymous No. 16247117

>>16190893
Hot hand fallacy

Anonymous No. 16247238

>>16245982
About 70% of all HPC code is Fortran. It's absolutely entrenched and it will never change. And this 70% figure comes after decades of people attempting to force a change to C/C++ as the standard. We've also got CUDA Fortran now.
I'm in my 20s and picked up Fortran and I actually enjoy using it because of how simple and clear it is. Very easy to learn, and modern Fortran is not the abomination it once was with GOTO statements everywhere. It's also unbeatable when it comes to parallel computing.
>>16246016
I got a temporary job in my old department as an undergrad entirely because I was the only one who bothered to learn Fortran. They had an old codebase that needed to be looked at and for some reason nobody else wanted to work in Fortran because people were convinced it was obsolete, therefore no takers for the position, but it turns out stuff shouldn't be ignored just because it's old.
The hard part about Fortran is actually that it's normally written for very specialised purposes, so the trick is a lot of code you're going to read is likely going to require a relatively large amount of additional knowledge to understand properly. A lot of the time, a boomer will have written a numerical solver and not bothered to explain why an equation is there, or what it's doing. If they also spam GOTO a lot, then good luck.

Anonymous No. 16247262

>>16246019
where? there are 4 in my entire shithole country

Anonymous No. 16247462

>>16192812
>PhD student
It will be hard. I earned my PhD in 2020, now run a data science/ml/stats team (it's an interesting hybrid team internal to a big company), and all of our consultants tend to have PhDs and experience. The sort of "natural" lead in to being a consultant is to work in the space for a while, get to know all kinds of people while working with clients, and eventually just starting your own consultancy agency with the known contacts as your primary customers. It's very relationship driven. Without known contacts and without a PhD and experience, it will be difficult, but I guess not impossible; you'll just have to select smaller jobs/smaller companies and undercharge.
> it's harder to find internship/studentship nowaday in FAANG
Biggest tip to CS peeps: Fuck FAANG, its the worst option. There are about 10,000 new startups, especially in biotech, who need CS people. They tend to have a harder time finding people because they aren't very well advertised. While my peers were doing 5 rounds of interviews at FAANG and not getting internships, I found a super local biotech which had 0 SEO by googling the area, and messaged them. They essentially hired me right away as an intern, and then hired me for real about 2 months later. It was a startup with 5 people and they just really sucked at advertising, googling their name they didn't even come up. It sounds "and then everyone clapped", but I also found my second job the same way.

Anonymous No. 16247482

>>16245856
R is got tier, I love it.
Python gets a bad rap but honestly the amount of mature libraries make it my preferred tool. I've only ever needed to write a couple of functions in Rust for speedup, but Python is generally a glorified C wrapper so is plenty fast.
I generally do all of my processing in Python and then export to R for fancy stats and for plotting (ggplot is still absolute god-tier for plotting, fuck matplotlib although seaborn is okay).
>>16245882
>MATLAB
absolutely fuck matlab, I used it for my whole PhD. It has way too many data types, any and all useful functional toolboxes become obsolete after about a year because they actively change EVERY useful function (removing them, merging them, changing them completely) and have no concept of stable, reusable code, and the stupid ass 2x year A and B release is just nonsense. No one can use your code unless they buy matlab (or you use their executable export BS but that's a mess).
I dislike everything about matlab. I used it for some of the things you say its good for (kalman filters for noisy object tracking) which it was great at the time for image processing, but I would rather implement kalman filters from scratch than use their implementation which I KNOW will change and break my code in 2 years.
I tried to run an app process I wrote in 2018B, in 2020A- and it didn't work because half the functions no longer existed. Not that I could check now because I refuse to pay for it.
Fuck MATLAB.

Anonymous No. 16247488

>>16247462
>especially in biotech
everyone hate bio for a reason. those are the worst companies to work in. low pay, low equity, toxic morons ordering you around. they don't know lots about CS so they sometimes ask outrageous shit that only companies like Google barely have the capablities to execute.
also, most biotechs goes bankrupt because of failing FDA or just some scamming scheme to siphon money from investors anyway so expect your equities portion have a 90% chance of being worthless.

Anonymous No. 16247502

>>16247482
Wtf are you on about. Matlab is extremely backwards compatible. And if they change anything, they give you deprecation warnings.

Python is the one that breaks shit constantly.

Anonymous No. 16247547

>>16247502
>Matlab is extremely backwards compatible
Matlab specifically keeps every version as separate entities because they make changes to their toolboxes constantly. I don't know what to tell you other than, using their image toolbox from ~2016-2020, half of the functions were merged or removed. My code literally doesn't work between version because they changed the toolbox so much. There's not much I can say other than that.
Base matlab may be more stable, but it then just becomes a neutered language if you decide to ignore the toolboxes.
>>16247502
>Python is the one that breaks shit constantly
I don't find this to be the case, but maybe its because everyone uses virtual environments to self-contain projects and version. For free. Without downloading a whole separate multi-gigabyte "version" of the language.

🗑️ Barkon No. 16247558

>>16247547
Asom is the name of the world. It is a somna word that means wordfulness, and Arn is the word of saying a word. I have a duality next to my heart because under neglect of high forces not knowing there was a bear spider behind this machine I am, and missing my presence, and I did spice and died the first death and gained an eternal small fear which led to the mother missing me too, so a duality spawned. There is nothing I can do for the enemy has capitalized on this, making me slip off at will and I can't do anything, I can hardly think. Someone's done something illegal and created this universe

Anonymous No. 16247564

>>16247547
>Without downloading a whole separate multi-gigabyte "version" of the language.
Yeah, just download and maintain 10 versions of python and 20 versions of every python package on your computer

Anonymous No. 16248534

>>16247564
Isn't python pretty backwards compatible? At least within the different versions, like 2.0, 3.0 etc.

Anonymous No. 16248998

>>16248534
Python itself is ok, but the packages break compatibility with every minor update

Anonymous No. 16249239

>>16248998
>packages break compatibility with every minor update
that's the problem with the packages, not python tho. even tho I think python authorities should enforce some kind of standard on backward compatibility of the 3rd party packages. worst yet I've seen is when a package is no longer maintained, its older complied binaries cease to exist on some corpo servers and your environment installation no longer work or you have to compile the binaries from sources, which can take a day just because random crap breaks.

Anonymous No. 16249792

>>16192812
Send emails to every local business. Eventually someone will respond positively

Anonymous No. 16250290

>>16187525
I came up with an interesting replacement for t-tests recently, and I want to share it. Basically, the exact way to get the p-value is to get the number of permutations where the difference in means is greater than or equal to the difference seen in the experiment, and then divide that by the total number of permutations. This is called a permutation test, but it's usually too expensive to compute, so people use t-tests as an approximation. What I've realized is that since computers are so powerful nowadays, you can just approximate the permutation test with monte carlo simulations, which avoids the headache of checking if your data meets the assumptions of a t-test.
>>16197968
Been trying to, but I've gotten lazy recently. Going to get back into it because of this comment.

Anonymous No. 16250431

>>16250290
> What I've realized is that since computers are so powerful nowadays, you can just approximate the permutation test with monte carlo simulations, which avoids the headache of checking if your data meets the assumptions of a t-test.

Combinatorial explosion is going to fuck you up good my friend. Assignment algorithms are great to demonstrate exactly why you can't just wave your hands and say "powerful computers will fix it all."

Let's say you have a fancy global optimization based parking assignment algorithm and you have a (fairly small) parking lot of 100 spots and you want to prove that your algorithm is better than random assignment no matter what the starting layout is. There's 2^100 possible permutations, but with Monte Carlo sampling you could probably reduce your permutation test burden to 2^80 or so trials needed to reject the null.

Let's say you have a really powerful computer that can do 10,000 of these assignments per second, (which is actually very optimistic for a potentially 100 x 100 integer programming problem).

These Monte Carlo trials would take you a speedy 3.8E12 years to complete. Quite quick actually!

Let's say now you've got 100,000 of these computers arranged in some sort of sci-fi super cluster (and magically have instantaneous synchronization and no potential for accidentally repeated permutations). This would reduce your time to complete these trials down to a much more manageable 38 million years.
Now if you had a million of these 100,000 computer super clusters with perfect parallelization/synchronization and no data management issues, you could validate your algorithm in 38 years of constant Monte Carlo trials! You might need 10 nuclear powerplants solely dedicated to supporting your computing power to test your one little parking assignment algorithm, but you could do it!

I think dealing with the De Moivre Laplace approximation is a better choice in most of these kinds of circumstances.

Anonymous No. 16251492

>>16245967
Surprisingly easy syntax. A lot like BASIC back in the day

Anonymous No. 16251871

>>16250431
Why are you holding the permutation test
to a higher standard than the original t test?
If 0.001 of sampled permutations have a higher
test statistic, the p value is 0.001. Sure there
will be billions of permutations with a higher
test statistic but there's no need to get all of
them.
Why do you have to know a tiny p value exactly?

Anonymous No. 16252127

>>16251871
Right so since you haven’t and can’t sampl all of them, you’re back to having to test the ones you did sample for statistical significance

Anonymous No. 16253269

bump

Anonymous No. 16253272

I know this might be retarded but are there any cutting edge research topic at the intersection of convex optimization and statistics?

Image not available

1273x543

Fairy Skills.png

Anonymous No. 16253396

Prefacing this by stating I'm not very good with stats. I have a stats question based on a video game I play and how certain skills are randomly learned.
When this character levels up enough to learn a skill, it can learn one of seven skills, and each skill has varying tiers of that skill that can be learned. The game states skills are attempted to be learned in a specific order, rather than all at once, see pic related - the first skill is attempted to be learned at a 1% chance of success, and if that fails then the next tier is attempted at a 2% chance of success, and so on down the columns then across the rows.
This means the actual learning chance isn't simply the chance of each skill, right?
If there were 100 skills each at a 1% chance, then the resulting learned skills as more and more are learned would start looking like a normal distribution centered around the midpoint of the list (I think?). However the skill chance is not constant, so there would be some bias to the distribution but I can't figure out how to combine the two.
To complicate matters further, if the character successfully learns any tier of one skill, the rest of the tiers of that skill are then unable to be learned, so the list is shortened.

Anonymous No. 16253869

>>16253272
Yes, a lot actually.

Anonymous No. 16253981

>>16253272
Yes, non-linear programming approaches to statistical estimation problems are very powerful. In particular, you'll see cone-tangent and Fenchel duality approaches to constrained NLS solutions used in all sorts of problematic statistics problems in physics and engineering (e.g., inverse parameter problems for things like distance or directional cosine based direction).

Anonymous No. 16254025

Bayesian probability theory made me lose faith in humankind. Also the goat gameshow thing with 1000 doors and the host opens 998 other doors.

Anonymous No. 16254040

>>16254025
What about Bayesianism has made you lose faith? Is it the interpretation of probability as a "belief" or "uncertainty" or is it something more about the mathematical approach to Bayesianism?

Image not available

1284x1595

4E7004D0-13E3-4CD....jpg

Anonymous No. 16254073

wtf are you guys “programming” ?

Anonymous No. 16254085

>>16253981
>cone-tangent and Fenchel duality approaches
lol. I am unironically at this part in a 140-pages paper I'm reading.

Image not available

691x1024

1708000923106363m.jpg

Anonymous No. 16254104

>>16187525
suuuup?

Anonymous No. 16254138

>>16254085
Fenchel conjugates are also super important for large deviations theory, which form the basis for near-optimal fixed sample size hypothesis testing when your elementwise test statistic is not necessarily a log likelihood ratio. Convex analysis and information theory both can be made very useful to statistics if you feel like learning some math.

Image not available

1147x515

levelprobs.png

Anonymous No. 16254812

>>16253396
From the probabilities you have there each level is conditional on the previous roll for the skill being a failure.

Moving on after a success doesn't matter as this doesn't change any outcomes for the next skill. The levels across skills are not mutually exclusive but the levels within the the skill are mutually exclusive.

E.g. level 2 Tingling breath is dependent on level 1 failing.
(100% - 1%) * 2% = 1.98%
Level 3 is dependent on level 2 failing etc.

Because you are using failures, the probability of obtaining a level 5 skill is actually higher than the level 1 skill as we are more likely to continue. So you'll also end up with a distribution that actually favours higher levels.

You really should switch to success to make it more intuitive and in the context of skills you should have the prerequisite knowledge of the previous tiers. Consecutive wins would also feel better for the player than consecutive losses even if they're just pressing a single button (you are basically making a slot machine mechanic here)

Pic related is the probaility of learning a specific skill, so Tingling Breath level 5 has a probability of 2.91% chance of occuring. and tingling breath has 11.75% chance of being learnt.

Anonymous No. 16254831

>>16245967
>>16247238
Spiritually a boomer. There's no reason to use Fortran over C++ today.

Anonymous No. 16255002

>>16254073
Personally, I've been working on programming an acoustic signal based estimator that optimally counts the number of times your mom cums on my dick as I fuck her every night. It requires a lot of parametric statistics and power spectral density modeling.