Image not available

904x229

2ddens.png

🧵 /psg/ - probability and statistics general

Anonymous No. 16072199

previous thread >>16059442

if you love stats, weird numbers and counterintutive science, this is your general. Because one of the things with statistics is that nothing ever seems to be what it tries to show you on a first glance or glimpse. Doesn't matter if you are a seasoned professional, NEET or some disgruntled grad student. All are welcome.

Some people may not like it if you try to make them do your homework, others won't care and will just help you. Let's discuss theories together, ask questions and try to meme a little about this field.

in the previous thread we discussed why Julia has promise but is not delivering. How some people still use Matlab but hate it.

So, grab your favorite statistical software, dust off your textbooks, and join me in this exciting journey through the world of /psg/ - Probability and Statistics General! Let's embark on this adventure together and unravel the mysteries of data one statistical concept at a time.

Anonymous No. 16072212

Too bad that Julia has potential but probably just is an edgecase use anyhow.

I think the most general purpose to do now is probably Python, even though I am not really sure if the accuraccy is there for the codebase when you do even simple things like linear regressions over big data. Then R would be better.

Anonymous No. 16072406

What is the central limit theorem and why is it important?

Image not available

774x1370

TIMESAND___fieDft....png

El Arcón No. 16072412

>>16072199

Image not available

1x1

!___Lorentz.pdf

El Arcón No. 16072417

>>16072412

Image not available

1920x1165

Population_curve.....png

🗑️ Anonymous No. 16072447

Has anyone given any thought to the statistical problem of the doomsday argument or carter catastrophe?

"approximately 60 billion humans have been born so far, so it can be estimated that there is a 95% chance that the total number of humans
{\textstyle N} will be less than 20
×{\textstyle \times }60 billion = 1.2 trillion. Assuming that the world population stabilizes at 10 billion and a life expectancy of 80 years, it can be estimated that the remaining 1140 billion humans will be born in 9120 years. Depending on the projection of the world population in the forthcoming centuries, estimates may vary, but the argument states that it is unlikely that more than 1.2 trillion humans will ever live."

Image not available

1920x1165

Population_curve.....png

Anonymous No. 16072453

Has anyone given any thought to the statistical problem of the doomsday argument or carter catastrophe?

"approximately 60 billion humans have been born so far, so it can be estimated that there is a 95% chance that the total number of humans N will be less than 20*60 billion = 1.2 trillion. Assuming that the world population stabilizes at 10 billion and a life expectancy of 80 years, it can be estimated that the remaining 1140 billion humans will be born in 9120 years. Depending on the projection of the world population in the forthcoming centuries, estimates may vary, but the argument states that it is unlikely that more than 1.2 trillion humans will ever live."

Anonymous No. 16073189

>>16072453
I doubt the theory is legit. Sounds like averaging over a very long time span which doesn't take into account asymptotic events.

Anonymous No. 16073291

>>16072412
>>16072417
You ruin this board. Bodhi may have his faults but at least you can have him reply and justify his peculiar posts. You on the other hand reply with total non sequiturs, waste time and can't even engage in discussion.

Anonymous No. 16073313

>>16072199
https://youtu.be/LMavYM0-_h0?feature=shared

Anonymous No. 16075335

>>16073291
Who is that guy?

Anonymous No. 16077931

Do you often google unsolved problems in statistics?

Anonymous No. 16079097

>>16077931
I do

Image not available

925x857

ons.png

Anonymous No. 16079675

>be statistician
>be terrified of the rona at the office
Why are statisticians like this?

Anonymous No. 16079706

>>16072453
>so it can be estimated that there is a 95% chance that the total number of humans N will be less than 20*60 billion = 1.2 trillion
Huh? How does this follow?

Image not available

592x817

tooker.png

Anonymous No. 16079710

>>16072412
>>16072417

Anonymous No. 16080082

>>16079675
They just want to sit at home and not have to commute I guess.

Anonymous No. 16080085

>>16079706
I think the N number is a statistical estimate between something and some other which can be construed as 20. Which means there will only exist 1.2 Tn humans ever.

Anonymous No. 16080087

>>16079710
lmao

Anonymous No. 16080614

>>16080087
lmao

Anonymous No. 16080626

>>16079710
Tooker was extra based that day

Anonymous No. 16081205

>>16080626
Tucker or Tooker?

Anonymous No. 16081326

>>16079675
its not rona it's because the head office is in wales and their work can be done just as easily at home

>finish work and leave office
>literally nothing to do in local area since its wales, the only option is pub.
>cant even make plans for after work since travelling anywhere decent is more than >1 hour
>instead go home and apply to private company jobs in london

Image not available

634x419

crime.png

Anonymous No. 16082056

I miss how crime stats used to have race in them. But now they are nerfed to not include race.

Anonymous No. 16082133

>>16082056
>I miss how crime stats used to have race in them.
They used to. They still do, but they used to, too. Not sure what you're on about.
Race is not a truly independent variable with respect to crime anyway. Which you'd know if you had any business talking about it on a stats thread.

Anonymous No. 16082222

>>16082133
How is race not an independent variable?

Anonymous No. 16082305

>>16082222
>How is race not an independent variable?
https://en.wikipedia.org/wiki/Confounding

Anonymous No. 16082439

>>16072199
>in the previous thread we discussed why Julia has promise but is not delivering. How some people still use Matlab but hate it.
Fortran boomerchads rise up

Anonymous No. 16082471

>>16082439
Fortran <3

Anonymous No. 16082476

>>16082439
>Fortran boomerchads rise up
I would but my fucking knees hurt too much from walking up hill in the snow both ways with rags for shoes carrying my schoolbooks in the lunch pails and pulling the plows in my free time while being whipped

Also Fortran gives me PTSD of being on the receiving end of legacy systems made by some prior fucker I can't track down to throttle

Anonymous No. 16082487

>>16082476
fortran 77 isn't as good i admit

Image not available

720x1600

hbTWxFi.jpg

Anonymous No. 16082772

Anonymous No. 16082788

>>16072212
Based shitpost. Hopefully it won't derail the thread with more pointless arguments over coding languages. That would be a tragedy.

Anonymous No. 16082796

>>16082222
If I had to make a wager, that anon would be making the argument that race isn't informative for crime rates divorced from other confounding factors like SES, geographic location, cultural factors, etc.

They are probably right that race is not all that informative in a vacuum (e.g., Nigerian immigrant communities are black on the census but commit far less violent crime than ADOS individuals regardless of SES).

With that said, it probably isn't an entirely dependent variable. It's pretty rare for something that appears to be such a strong signal to be entirely controlled by confounding factors. It happens but it's rare.

Anonymous No. 16082812

>>16082796
>With that said, it probably isn't an entirely dependent variable
It is. Entirely spurious. This isn't news, except on 4chan.
>It's pretty rare for something that appears to be such a strong signal to be entirely controlled by confounding factors
If by "rare" you mean "all of the time" considering there are innumerable strong or stronger e.g. https://www.tylervigen.com/spurious-correlations
I have no idea what your experience is to have come to the conclusion such a thing is "rare" except by lack of any experience. Yeah that sounds rude but what you said is that fucking nuts absent qualification.

Anonymous No. 16082853

>>16082812
Spurious correlations are almost always very low power. This is not the case when you are talking about racial demographics and crime. Roughly 1/4 of all black men will spend some time incarcerated at some point in their life. When you look at young black men who receive government assistance, that number goes up to roughly 30%. Contrasting that to young white men who receive government assistance where it is less than 5%. That is very unlikely to be a spurious correlation because the racial signal is still significantly informative even after accounting for age, sex, and participation in government assistance programs.

The only way you could possibly believe otherwise is by not having actually engaged with the data on the subject. Even people who are sympathetic to criminal justice reform and cultural efforts to reform/remdiate this problem are aware that race is a powerful signal in this area.

If you want a book recommendation about this topic that is highly sympathetic to the plight of young black men while still advocating for criminal justice reform, I'd recommend "The Anatomy of Racial Inequality" by Glenn Loury. He's a Prof. emeritus in economics at Brown and has dedicated a large portion of his career to the study of social capital theory, racial dynamics in crime/punishment, and the feedback loops which reproduce racial inequalities over time.

Anonymous No. 16083057

Race is an independent variable. The signal from it is too strong.

Anonymous No. 16083097

>>16082772
Hispanics are white, being born in Mexico doesn't make you no longer Spanish.
I've dated a Brazil woman and both of her parents were white. In fact, most of Hispanics I've met are 70-90% white.

Anonymous No. 16083752

>>16083097
You're retarded. With that out of the way, I remind you that Central and South America imported over 90% of the Africans shipped in the Transalantic slave trade, that the average US Latino has about 10% African admixture, and that this is reflected in their IQ scores and academic performance. They're less white than black Americans are black. A black has on average 73% African ancestry (lots of white admixture in the US), a Hispanic has on average 65% European ancestry.

Anonymous No. 16084494

Can you name some of the best textbooks for stats?

Anonymous No. 16084721

>>16083097
It varies a lot. There are some Latinos who are very obviously 90% or more European, but they're obviously the minority. Then there are the other typical beaner types and then there are the ones who are racial horrors beyond mortal ken.

Barkon No. 16084724

>>16084494
Statist versus Reductionist thread.

Anonymous No. 16085001

>>16084494
This depends a lot on your math background and goals.

If you can be a bit more specific about that, I can give you some recommendations.

Anonymous No. 16085178

>>16085001
Anything that helps me become an extremely competent problem solver. I get data and shit out results and interesting predictions. ML is a meme, I want to do it statistically. Don't care if it is bayesian or frequentist. I just want to get it fucking done.

Anonymous No. 16085192

>>16085178
Yeah, but have you done a basic calc sequence and are you comfortable with linear algebra and ODE's?

If the answer is no, then the recommendations I'd point you towards will probably not be worthwhile because they are assuming a level of familiarity with mathematics that you don't have.

Anonymous No. 16085271

>>16072199
I'm trying to understand how to make QM relativistic (beyond the obvious klein-gordon, dirac stuff).
I just want to start from first principles.
My motivating example is to start with
[eqn]\int |\psi(x,t)|^2 dx=1[/eqn]
and differentiate wrt t then choose (based on what is convenient/simple) how psi_t is related to psi, psi_x, ... etc.
Just this logic is enough to derive schrodinger.
I want to make the first equation relativistic then do an analogous process.

Anonymous No. 16085318

Any thoughts on score based models/ langevin sampling

Anonymous No. 16085359

>>16085192
>Yeah, but have you done a basic calc sequence and are you comfortable with linear algebra and ODE's?

Yes I have. Competent in Python as well. I can do Options modelling if I felt like it.

Anonymous No. 16085364

>>16085318
Mostly used in quantum mechanics and molecular modelling to my understanding. It's a subclass of markov chain monte carlo right? I think a lot of the MC things are useful. Learn it and learn it well brother.

Anonymous No. 16085545

>>16085359
Okay, thank you. This is the information I was looking for.

You said you were interested in ML, so I'm going to tailor my recommendations more towards statistical learning rather than theoretical statistics.

The easiest two books that give good coverage of this material (that I'm aware of) are Introduction to Statistical Learning (ISL) by Hastie and Introduction to Machine Learning by Alpaydin.

ISL is more statistics focused, Alpaydin is more focused on the ML side of things. These are both fairly easy and don't ask a ton from their reader. The exercises should be straightforward and give you a good "once-over" of the field.

If you want more challenging and thorough recommendations which also cover a lot of probability theory, my recommendations are as follows:
1) Pronabulistic Machine Learning: An Introduction by Kevin Murphy (free PDF on his website)
2) Pattern Recognition and Machine Learning by Bishop (free PDF online and one of the standard texts on the topic used for engineering and information theory focused teaching of statistical learning)
3) Elements of Statistical Learning (ESL) by Hastie (thorough coverage of frequentist approaches to statistical ML, but much more sparse than Murphy and Bishop on Bayesian approaches).
4) Machine Learning by Theodoridis (the most Bayesian and optimization focused of the bunch, and also a huge reference text).

There's a lot of overlap between them, but they also cover a lot of different topics. For example, Theodoridis talks a lot about factor graphs, Bayesian networks and Particle filtering (which are hugely important in diffusion and Markov time series modeling).

If you're looking for something purely mathematical statistics the standard recommendations (in increasing difficulty) are:
1) All of Statistics by Wasserman
2) Probability and Statistical Inference by Mukhopadhyay
3) Statistical Inference by Casella and Berger

Anonymous No. 16085548

>>16085545
Probabilistic Machine Learning by Murphy* I think I had a stroke while typing.

Anonymous No. 16085663

>>16085545
But anon, I wanted to get away from ML. ML is a god damn meme. I want to do the real stuff.

Anonymous No. 16085676

>>16085663
If by "the real stuff" you mean mathematical statistics, then just go through one of the three books I recommended at the end.

I wouldn't really call the fundamentals of regression, classification, and clustering meme stuff. They are incredibly useful for a ton of different fields beyond the fairly meme-ish CS/business data science applications.

In particular Pattern Recognition by Bishop is a great way to learn about Bayesian classification and regression, and Theodoridis has over 400 pages dedicated to proper Bayesian inference and graphical modeling (which is hugely useful in just about every field).

The things most people consider "Machine Learning" (meaning primarily neural networks and how to code them) are largely avoided in these texts for a more probability, statistics and optimization forward approach to these problems.

Anonymous No. 16085680

>>16085676
Thanks brother. Will screenshot it. Pls write more in the thread. Sorry for being disrespectful.

Anonymous No. 16085737

>>16085680
You don't need to apologize to me my man.

It's not an accident that ML has the reputation it has. There's a ton of snake oil salesman in the field and the software engineering side of CS/Business schools in particular have polluted the space with a lot of bullshit.

I just also happen to think there's a lot of real practical utility in learning the statistical and mathematical fundamentals to the three major ML problems (regression, classification, clustering).

It's definitely less abstract and "pure mathy" than theoretical statistics, but a proper statistical learning book will still have your brain churning through difficult material that can prove useful. Learning the mathematical statistics fundamentals will also be very useful in a lot of applied inference problems, it just isn't generally as useful outside of academia because theoretical statisticians tend not to be Bayesians and most real world problems require informative priors to have tractable/reliable solutions.

Anonymous No. 16085750

>>16085737
Yeah a lot of the ML shit has sent people on buying weird courses, watch countless hours of youtube and is what you say it is, snakeoil. I am however thinking just as you, the three main things, such as regression/prediction, classification and clustering are very awesome statistical tools and quite powerful too if you understand them in a holistic way.

Anonymous No. 16085967

Does anyone have any recommendations for probability/statistics textbooks that don't require an undergrad calc sequence/linear algebra?

I'm going to be working on a set of textbook flow charts for probability/statistics starting some point this week, and I'd like some recommendations for good resources in lower math levels. I've got a solid plan for the multivariable calc/linear algebra approach to these topics, and I have somewhat of an idea for the measure theory approach, but I still don't have many good resources for people who don't have much of a math background.

Image not available

387x466

tufte.jpg

Anonymous No. 16086397

>>16085967
I have some opinions on this. I know not knowing calculus is like trying to run while you are in a wheelchair.

But what do you think of a visual approach to the non-calc learners? Then I think Edward Tuftes book "The Visual Display of Quantitative Information, 2nd Ed. " should suffice?

I will follow up on what I meant above in a more longwinded post later.

Anonymous No. 16086605

>>16085271
You are on the wrong track. There already is a relativistic version of single particle quantum mechanics called the "worldline formalism." It basically involves rewriting a single propagator in QFT as a path integral over fields depending on a single parameter that parametrizes the worldline. It doesn't sound like you are at the level to really understand it, but if you are interested try searching for a review article.

Anonymous No. 16087079

>>16086397
Most of the old books like Tukey etc are very good.

Anonymous No. 16087115

>>16086605
My goal isn't to necessarily reproduce how they made schrodinger relativistic.
I'm interested in ALL of the equations you could choose to conserve probability.
Instead of x I'll probably need some general brane (with a lipschitz condition so it can't causally interact with itself).

Like I said, it is easy to get schrodinger. I'm trying to go beyond that. I can go to higher order, include psi*, or have nonlinearities.

How would you make ∫p(x,t)dx = 1 condition relativistic?

Anonymous No. 16087125

>>16087115
See if you can derive dirac from those initial conditions.

Anonymous No. 16087227

>>16087115
>I'm interested in ALL of the equations you could choose to conserve probability.
How would you make ∫p(x,t)dx = 1 condition relativistic?
It's relativistic as long as (p, J) form a four-vector, with J being the probability current. One example is the conserved current in the Klein-Gordon theory. The problem with that one is that p need not be positive definite, and is better interpreted as a charge density. It's the same story for Dirac.

If you consider the worldline formalism, this would be a really genuine relativistic single particle theory which admits position operators (and a time operator) but I don't think there can be a locally conserved probability current due to the Reeh-Schlieder theorem and all that.

Anonymous No. 16087304

>>16072199
stats or data science?
the stats course has the big boy stuff like real analysis, numerical analysis and pdes while the data science course has way more programming stuff and more electives

Anonymous No. 16087491

>>16087304
Data science is just applied statistics without the math. It's to stats as software engineering is to computer science.

If that kind of applied and programming forward stuff is what interests you, go for it. Someone needs to do it. Just know you'll be doing it at the cost of a more thorough education in the field.

Anonymous No. 16087906

>>16087491
would it be retarded to take the data science course and take some of the subjects unique to the stats course as electives?
maybe not real analysis but certainly pdes, optimization and the rest of the applied/stats stuff

Anonymous No. 16088030

>>16087906
No man. Do it. Because the DS stuff can land you a job, but the stats stuff make sure you understand what you are doing, making you effective in your new role.

Anonymous No. 16088104

>>16087906
If you are interested in the data science track, do it.

Real analysis is really helpful if you want to have a really fundamental understanding of probability theory, but most applied probability theory doesn't require more than multivariable calc, linear algebra, ODEs and some basics of Fourier series/transform based signals analysis.

The thing with the more fundamental mathematics approach is that most of the good textbooks will require a ton of time and meditation for you to properly understand anyways. If you want to go through the classics for statistical learning (Bishop's Pattern Recognition, Duda's Pattern Classification, the Murphy books, ESL, etc.) and can definitely do that on your own and having the DS programming experience will help you implement and understand what they cover.

I don't think the DS course will be a replacement for a fundamental understanding of statistics, but it certainly won't prevent you from studying and understanding it on your own if you are motivated.

Anonymous No. 16088108

>>16088104
>(Bishop's Pattern Recognition, Duda's Pattern Classification, the Murphy books, ESL, etc.)
Could you give complete titles on these books? I am quite interested.

Anonymous No. 16088135

>>16088108
nta
>Bishop
https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf
garbage
>Duda et al.
garbage
>Murphy
https://probml.github.io/pml-book/
mostly garbage, some sections are okay as reference (but only due to the year of publication of the second editions)
>ESL
https://hastie.su.domains/ElemStatLearn/
the book's okay, actually decent, but the opportunity cost of reading this is pretty high

instead, for DS:

- Gelman et al., Bayesian Data Analysis
- Stan's reference manual and case studies. no, really. the manual can actually be read, it helps to understand the underlying statistical machinery for industrial DS: https://mc-stan.org/users/documentation/
- Huntington-Klein, The Effect: An Introduction to Research Design and Causality (2021). https://theeffectbook.net
- Hansen, Econometrics
- Lattimore & Szepesvári, Bandit Algorithms
- Molnar, Interpretable Machine Learning, https://christophm.github.io/interpretable-ml-book/

Anonymous No. 16088136

>>16088108

1) Pattern Recognition and Machine Learning, Christopher M. Bishop, Springer 2006, available for free as a PDF on his website. Provides a great foundation for pretty much everything you'd need in Bayesian ML.

2). Pattern Classification by Richard O. Duda, Peter E. Hart and David G. Stork. A bit older but covers the classics very well and is considered a classic in the field for a reason.

3) Probabilistic Machine Learning: An Introduction by Kevin P. Murphy. Newer and much more focused on modern deep learning based methods but also has a strong coverage of the fundamentals. Free draft pdf on his website and has lots of pseudocode and python examples.

4) Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman. Free PDF on Hastie's website and provides excellent coverage of the frequentist side of statistical learning. Doesn't really cover Bayesian methods very well (because Hastie is a statistics professor and they tend to be less interested in Bayesian methods in general).

Image not available

1024x1024

stocking anarchy ....png

🗑️ Anonymous No. 16088137

psg? i thought this was a panty and stocking thread

Anonymous No. 16088141

>>16088135
Thanks, even if you think those books are garbage, the other you gave seem to be interesting.
>>16088136
Many thanks. I will look at both of these lists.

Image not available

735x535

514dbfc5736f910bc....jpg

Anonymous No. 16088142

psg? i thought this was a panty and stocking thread

Anonymous No. 16088150

>>16088135
You have a very strange perspective.

The books I recommended are for statistical learning, not data science. Gelman is a great book, but it's for BDA, not statistical learning. Those are different fields with different goals.

Anonymous No. 16088189

>>16088150
>books I recommended are for statistical learning, not data science
oh! I read it as though you were treating old-school ML as part of what data scientists do on the job, which, let's be honest, it had been until recently, and probably still is at places that don't realllly need a data scientist.

my own recs are for the kind of work that happens at companies that absolutely do need data scientists (e.g. FAANG, but also logistics and large retailers, &c), with a focus on causal inference and attribution since pure prediction (statistical learning) is nearly always not what you need in the industry. "BDA" and the Stan manual obviously, "The Effect" and "Econometrics" specifically for causal inference, "Bandit Algorithms" as an extension of A/B testing (see e.g. Imbens's work at Amazon on A/B/n experiments), interpretability because even if you use predictive models like forests you still got to explain to your boss which predictors might do what

Anonymous No. 16088198

>>16088189
No, I was recommending them as books to be read if the anon wanted some background in the fundamentals of statistical learning as a supplement to their DS courses. I don't think learning about the fundamentals of logistic regression or factor graphs are a replacement for data science/data analysis, but they will give you an understanding into how the models used by DS people work.

I find (as someone who does statistical learning work regularly) that DS people generally don't tend to have much understanding beyond the basics of how ML functions, and as a result tend to be easily caught in the whirlwind of marketing BS.

At the same time, as someone who has a statistical learning background and only knows the basics of the BDA/causal inference world, I recognize that they are trying to answer a different question (i.e., how trust worthy is my data? How do I scrutinize the information I have?) as opposed to statistical learning (i.e., If I have a set of observations that are already as good as they can get, how do I properly extract information out of them?)

They aren't competing, they are different points in the process.

Anonymous No. 16088221

>>16088189
>>16088198
The dynamics of discussion like this is why I really like /sci/. Thanks for teaching me boys.

Anonymous No. 16088238

>>16088142
>>16088137
If you have drawings of panty and stocking looking over a
shit-ton of data sheets, coffee in hand, while looking like 10
miles of bad road, then they can be here.

Anonymous No. 16088924

>>16088238
lmao, as if that weeb would have something like that.

Anonymous No. 16088971

>>16088924
>>16088238
It's a challenge, of course. Statistically speaking, they likely don't
have this and will go away. And if this hypothesis is rejected--
against odds--we're looking at two new mascots for this general.

Anonymous No. 16089000

>>16088971
I am not even mad.

Anonymous No. 16089366

Opinion about Talebs book the black swan?

Anonymous No. 16089452

>>16089366
Being written before the rise of "data science", it's dated in some parts (normal distributions are no longer as ubiquitous, for example), but the insight that simple models are non-robust (and hence, that explainability is at odds with predictive accuracy) turned out to be a fruitful one. If an updated version comes out, I'd read it.

Anonymous No. 16089456

>>16089366
I like Taleb and find him a compelling and interesting writer, but I'm not all that impressed with his recommendations or technical content.

It's been a while since I've read incerto, but from my (certainly imperfect) recollection the main moral of The Black Swan is that models (statistical, mental, ideological) need to be designed so that there is some ability to handle rare failure events.

These events could come from tails being fatter than expected (Taleb's bugbear) or they could come from incorrect causal modeling assumptions. Either way, you need to be able to answer the "but what if we're wrong" question. This makes intuitive sense, but doesn't lend itself well towards any concrete recommendations other than to be more aware of what simplifying assumptions you are making and more considerate about whether those assumptions make sense.

Anonymous No. 16090323

>>16089456
Why are you not impressed by his technical work?

Anonymous No. 16090337

>>16090323
Maybe this is an impressive thing to options traders or something, but in engineering we've been using non-Gaussian models for just about everything signal detection for quite some time. Usually the basics are taught in terms of asymptomatic Gaussians, but generally real sonar/radar signal detection systems have been using Cauchy models to deal with fat tails effects since the 70's.

Gaussians are nice because an IID Gaussian is the maximum entropy continuous distribution if you have a known mean/variance for your observation errors (which can usually be estimated via Maximum Likelihood methods from previous residuals and historical data). However, many real signal detection cases tend to be Cauchy, Laplace or Rayleigh, all of which have fatter tails and are less well behaved than Gaussians.

Anonymous No. 16090393

>>16090337
I am thinking, should one combine signals engineering with economics/options/markets?

Anonymous No. 16090405

>>16090393
You could. One of my grad courses on information theory had a whole section on optimal portfolio theory.

The thing with engineering signal modeling is that the signals we are tracking are generally a bit more complicated than a market model, but they are often something with some underlying physics that can inform the dynamics of the system.

So we may deal with this crazy mixture distributions and multi-modal particle filters, but they are trying to track a target which moves roughly like some real physical objects and emits energy in a somewhat predictable way that you can model.

I don't think that in econometrics or markets you can rely on the system having some reliable model which the errors are introduced into. I'm certainly not an econometrician but my guess is they don't have anything like an informative set of euler-lagrange equations they can base their market dynamics on. Our signals are non-linear and gross and generally very tailed, but there's some structured model that we can use as an informative heuristic under there.

Anonymous No. 16090410

>>16090405
These crazy mixture distributions* and multi-model particle filters.*

Don't try to write coherent things when drinking after a 12 hour work day.

Anonymous No. 16090416

>>16090410
Brother, I did the same. But without the drinking. I have understood that if I am going to be a workaholic, most of the time I need to stay away from alcohol.

Anonymous No. 16090422

>>16090416
I really haven't had much to drink, but your advice is sound. 3 beers over an hour or so feels like a lot more than it is when you're also over tired.

Anonymous No. 16090432

>>16090422
If you really want to go for guts and glory. Start to either box late at night or lift heavy weights.

Anonymous No. 16090450

>>16082812
>This isn't news, except on 4chan.
nta but you are one sheltered fuck, US blacks are notorious for their toxic crabs in a bucket culture and even Africans shit on them for it. What Americans call race is frankly just culture anyway, look at the "Germans" and "Latinos" and "Italians" that haven't been on actual country of origin soil for X generations. """Race""" absolutely is a dependent variable but that is not really less of a condemnation of a subcommunity.

Anonymous No. 16090490

>>16090450
You are answering to a darkskinned pajeet.

Image not available

325x500

Mostly Harmless E....jpg

Anonymous No. 16090608

>>16090405
>I'm certainly not an econometrician but my guess is they don't have anything like an informative set of euler-lagrange equations they can base their market dynamics on. Our signals are non-linear and gross and generally very tailed, but there's some structured model that we can use as an informative heuristic under there.
It depends on the subfield of econometrics you're in. From what I understand of financial econometrics (the one most commonly associated with options), you could argue that they have Black-Scholes as an "equation of motion" for prices, on top of which they build models for specific assets.
In other fields, where econometrics is applied to questions of policy interest (and hence, having a causal flavor), you can have models that are designed around specific policy minutiae (picrel is a solid introduction on how this is statistically justified in practice), though admittedly these models have something of an opportunistic and ad-hoc flavor. To the extent that they share any underlying (and mathematically formalizable) principles at all, it'd have to be based on either societal institutions (e.g. markets giving rise to the "laws of supply and demand") or theories of socio/psychological behavior (e.g. the iterated prisoner's dilemma as a driver of the level of environmental investments).

Anonymous No. 16090621

Hey dudes, Thank you to whoever made this general, exactly what I was looking for now that I'm moving forward as a higher level healthcare analyst. Would like to one day step into data science and possibly contribute to some more technical help in the field. doing sql dashboard work and some budget analytics for now but also reading and practicing Fundamentals of Statistics by Wasserman. My question is, does anyone know how better to integrate these concepts into healthcare with the access to clinical information I've got? I am desperately looking through their documentation for data flow charts to understand the data better. however, I find it hard to really integrate higher level concepts at the moment. any feedback works. thanks

Anonymous No. 16090626

>>16090621
Forgot to mention I know python/C#/C++, bachelors in analytics, so calc is as advanced as I usually get. I'm over my ML phase as I found it hard to get over all the iteration runs and model testings, found it impractical for most solutions.

Anonymous No. 16090669

How different is your worldview from that of the great unwashed?

Anonymous No. 16090895

>>16090621
Do a search on libgen or similar website with the keywords you are interested in. maybe even google scholar?

Anonymous No. 16090903

>>16090626
what is a bachelor's in analytics?

Anonymous No. 16090927

>>16090669
Many statisticians are redpilled af.

Anonymous No. 16091053

>>16090608
Thank you for this response.

Do you have any recommendations for starting points to learn about these econometrics models as a hobby? I work with non-linear stochastic systems evolution for a living so I don't need a ton of handholding, but I'm in a very different field so I expect the terminology and modeling language/assumptions to be different.

Anonymous No. 16091056

>>16090621
The closest thing I'm aware of is "Healthcare Data Analytics" by Reddy and Aggarwal in CRC.

It's not really my field, but I know that there's a lot of healthcare work being done by the guys at the Johns Hopkins APL under the name "Upstream Data Fusion."

Anonymous No. 16091096

>>16091053
>Do you have any recommendations for starting points to learn about these econometrics models as a hobby? I work with non-linear stochastic systems evolution for a living so I don't need a ton of handholding
I've already slipped it in my response actually: "Mostly Harmless Econometrics" by Angrist and Pischke, which is available on libgen and even has its own website:
https://www.mostlyharmlesseconometrics.com/

In the preface, they say that their target audience are economics PhD students (i.e. people who aren't afraid of math but may have come from other disciplines, so not much different from your situation), and that the aim of the book is to familiarize them with the workhorse models. Each model gets a chapter with a bit of theory, a couple of case studies showcasing their application, and some practical advice/warnings (as is par for the course when introducing any statistical model).

>I'm in a very different field so I expect the terminology and modeling language/assumptions to be different.
The jargon in this book comes from statistics (in fact I don't recall any economics jargon at all, unless you count things like "labour supply" and "elasticity"). There's abundant terminology for discussing the models' structure/assumptions, but since the entire point of the book is to teach those to you, I think it can be excused on that front.

Anonymous No. 16091111

Why is MCMC so based bros? Literally magic.

Anonymous No. 16091112

>>16091096
>the aim of the book is to familiarize them with the workhorse models.

I need more workhorse models in economics, tax policy and general finance if you got it brother. Love you.

Anonymous No. 16091115

>>16091111
Because you get to hold the hand of God while he is rolling the dice in the casino of life.

Anonymous No. 16091119

>>16091096
I'll take a look. I wasn't sure if that image was a good recommendation for an overall econometrics text or an econometrics text primarily dealing with non-Gaussian or robust estimators.

As a passing aside, it's interesting how differently engineers and statisticians approach robustness.

Generally it seems the statisticians approach to robustness is to use something like a median or quartile based estimator, where you get this insensitivity to polluting outliers at the cost of real-time recursive viability (as you can't really "update" your median frame by frame as easily as a recursive conditional mean).

Instead, in engineering we tend to approach robustness by using gating (i.e., scrutinize the measurement/observation before it goes into the filter) and model-fusion (i.e., have a bunch of parallel filters and then get your final estimate by the Bayesian mixture of their individual outputs based on your conditioned filter mode probability). They both seem to work for their various purposes but they rely on fairly different assumptions about where weird outliers come from (tailed occurrences where our assumed distribution is correct vs. probable occurrences where our assumed distribution is incorrect).

Anonymous No. 16091123

>>16091111
It's because convolution integrals are fundamentally smoothers. When you take asymptotically diverging numbers of samples from a stationary distribution, its average will be smooth (and thus well approximated by a quadratic MGF) regardless of what the true distribution is.

Image not available

850x234

adversarial examp....png

Anonymous No. 16091166

>>16091112
I've had Sargent's macroeconomic theory books recommended to me before, but never read them myself (not my fields, sorry).

>>16091119
Don't forget the data scientist's approach to robustness, which aims to limit the model's loss function when it's applied to make predictions on "adversarial" datapoints (by adopting a minmaxing training process). This is the version of robustness that I was alluding to in >>16089452, and it sounds quite similar to your engineer's approach.
As for the statistician's idea of robustness, the issue it defends against is also a concern for data scientists, though they call it "poisoning the training dataset" rather than "adversarial examples". I guess that since non-parametric models are the norm for them, assuming the correct distribution isn't really a concern.

Anonymous No. 16091788

Why do I find asymptotes so interesting. Their shape, the graphics, the mathematics and the usage of them. How they are rare in real life.

Anonymous No. 16092029

>>16091166
What are your thoughts on the research into using over-parametrization or large deviations to statistically characterize ML based classifiers/regression systems?

Anonymous No. 16092044

Oh this is where I exist. Funnily enough, I find the most valuable statistics and math are the 'basic' pieces more so than anything.

Experimental design (both for computers and irl experiments) is something people just do wrong.

I'm also working on a sort of better method for prioritization and sorting under uncertainty... Uh workflow (it's not really a methodology just a merging of methodologies) since apparently the business world has never sat down and figured this out.

Anonymous No. 16092046

>>16092044
>I'm also working on a sort of better method for prioritization and sorting under uncertainty... Uh workflow (it's not really a methodology just a merging of methodologies) since apparently the business world has never sat down and figured this out.

show me your work anon

Anonymous No. 16092049

>>16092044
So like operations research?

Anonymous No. 16092054

>>16092046
It's really straightforward. Organizations are shit at prioritization. Typically asking heads of departments to create these highly biased 1-N list or top-3 or top-10 lists of their priorities. Then someone in the top circles collects it and applies some sort of scoring mechanism and 'pools' things together.

The problem with that is that it typically pisses people off and 'misses nonlinearity' or doesn't capture all the information because arbitrary scores are arbitrary.

My method is just abusing the Bradley-Terry model and putting items needing prioritization into series of comparative games then inferring their rank from the outcomes of those games. It's really straightforward, you can get all the people who matter in the organization to vote/play out the games, they don't have to answer 'where does this time fit on a 1-N' just 'is A more important than B', and the competition is a bracket so it acts as an adaptive experiment of sorts focusing MORE data collection on higher priority items (i.e. items that are winning) than lower priority items.

It's basically just a different method for collecting 1-N data. And the Bradley-Terry model can be used basically interchangeably with ordinal regression for 1-N data since you can turn 1-N data into exhaustive binary games (and it's what a lot of voting methods do anyway).

Anyway, it's less stress for voters, it takes like N-1, 2N-2/3, so on so forth comparisons to complete a competition (depending on how many brackets you're playing) and I already used it in my office for priorities and it lined up reasonably with expectations.

>>16092049
Sort of, it is definitely in the realm, but that's because most of statistics, if it's being used for real decisions, is Operations Research.

Anonymous No. 16092066

>>16072406
the centra limit theorem states that everything is just a normal distribution

Anonymous No. 16092069

>>16092029
>research into using over-parametrization or large deviations to statistically characterize ML based classifiers/regression systems
Haven't been keeping up with ML research since graduating, got a link for that?

Although to keep in the spirit of 4chan, I'll go ahead and give my uninformed opinion anyway: characterizations based on numeric thresholds ("too many" degrees of freedom, residuals "too large") only make sense in the context of a specific problem domain and dataset (or better yet, population), they're not properties of the model used. So I'd object if by "ML system" you meant the classification/regression model alone, but not if you meant it to encompass both the model and the underlying population that it aims to depict quantitatively.

Anonymous No. 16092077

>>16092054
> But that's because most of statistics, if it's being used for real decisions, is operations research.

I think that's a bit of an over generalization that reflects your particular bias from your experience within your field. Every field has their particular biases and assumptions, and not every real usage of statistical decision making is properly understood to be OR, just like I wouldn't call every application of least squares regression to be "machine learning" or every application of Kalman filtering to be "target tracking."

Anonymous No. 16092085

>>16092077
You're right. I should say, statistics are necessary for OR. I think anytime you're making decisions you are operating within an OR framework. Decision Calculus IS OR. Statistics and models inform this, but the actual decision making is OR.

I also think barriers between these fields are completely arbitrary and if you're thinking holistically in any way, it all blends together.

But if we're being anal, the analysis and methods of analyzing the data is statistics, USING it is OR making decisions based off of it IS OR. You can't do OR without statistics... Or well you sort of can, but it's just arbitraryish decisions at that point. I also think the highest value of statistics is when it's framed around what is needed to make decisions, but you can also just query data and examine it and it's relationships without that necessarily in mind.

Anonymous No. 16092086

>>16092069
The best paper I've found on overparam is from Hastie at Stanford.

https://arxiv.org/abs/1903.08560

Basically, they've found that if you keep the ratio of the number of parameters in an ML model to the number of training samples constant, you can achieve nearly zero test error without skyrocketing cross validation error as the number of training samples diverges.

This is still an area of active research, but it seems like as you allow models to grow in complexity one of the golden rules "no test error means over-fitting" breaks down and I don't think we've got a solid answer as to why yet.

The large deviations paper I was referring to was:

https://ieeexplore.ieee.org/document/10008020

These NATO guys are basically developing classical hypothesis testing systems with predictable error exponents using probabilistic classifiers. They use neural networks for this process, but in principle this could work with any stationary probabilistic classifiers so long as it has a continuous everywhere MGF that's finite on the reals. Using large deviations theory to asymptotically characterize these "black box" probabilistic classifiers is a neat idea, I'm just not sure how well it will work on more complex systems.

I think that these two papers are in themselves interesting, but what it says to me is that we're really only scratching the surface in the world of ML explainability and characterization of these ML systems.

Anonymous No. 16092087

>>16092086
We do have solid answers for this. It's actually demonstrable in simpler models such as spline models for regression. You can make a very well fitted model using splines, even on noisy data (splines can have as many or more coefficients than data points).

Introduction to Statistical Learning with R has a good chapter on this using splines as a demonstration of this effect.

Anonymous No. 16092093

>>16092085
> I also think the barriers between these fields are completely arbitrary and if you're thinking holistically in any way, it all blends together.

That's true to a certain extent. They all rely on the same mathematical underpinnings and often use the same processes but for different aims.

For example, one of the most commonly used textbooks for engineering non-linear programming classes is Nocedal's Numerical Optimization, which was originally intended as an optimization for OR textbook. Similarly, a lot of modern portfolio theory relies on information theoretic multi-user channel models originally developed for digital communication systems.

It's not really an anal thing, it just seems like kind of a silly perspective to have. Decision theory and factor graphs/decision trees/network flows are used for a lot of systems where decision making is important. OR just happens to be one of them.

Anonymous No. 16092096

>>16092087
Well, yeah, you can use splining or interpolation to have very well fit models. The problem is that generally your fitting to training data/testing data comes at the cost of over-fitting/over-confident models (meaning your cross validation will be crap because you've essentially made a much higher dimensional model than necessary).

This over-parametrization paper (written by the same guy who wrote ISL btw) is specifically looking at parametric regression systems, not splines or piece-wise continuous mixtures.

Anonymous No. 16092098

>>16092096
Higher order model than necessary* (or higher frequency if you feel like thinking in terms of equivalent power spectral density assuming your noise isn't band-limited).

Anonymous No. 16092109

>>16092093
No you don't understand. I think decision tree and decision theory IS OR. It might not be what academics refer to as OR, but academics are meaningless as OR is a government developed and funded field.

The only real operations research analysts are in government or working for like airlines. Decision theory is encompassed within the field of OR in this context.

Again, it's muddy because academia has gotten very anal about these distinctions despite functionally everyone doing the same work. Disney hires decision scientist in their OPERATIONS division when they used to hire OR analysts out of college. They're all doing the same thing, queuing theory, decision analysis, prioritization, etc.

Like academics typically constrain the field to programming and optimization theories, but if you look at how the government approaches the field (again the group that CREATED the field) it's much wider in scope.

I think we just have different definitions of the field is what I'm trying to say. Like, industrial engineering, OR, and decision analysis are basically the same fields with every single organization having a strange definition for them because no one wants to enforce real standards (and it also enables buzzwords).

>>16092096
No dude, you can do splining WITHOUT over fitting. Literally look into it. It's actually exactly why we use splines for non-noisy data and the exact same phenomenon occurs with noisy data.

I know exactly what you are talking about, read ISLR because that section is very illuminating especially when you realize neural networks are special types of splines with a function of function framework. It's also why they have that stupid theory that's just expansion for deep or wide networks.

Anonymous No. 16092116

>>16092109
> I think decision theory IS OR.

Can you explain to me how sensor network prioritization is OR? What about multi-bank signal processing where you use a decision tree or gating process to determine which filters should be included in your fused model? Is that OR in your view?

When I think OR, I think routing of people, delegation of tasks, and decisions regarding usage of capital. That's definitely an important topic that is a much broader field than OR departments in a school might be, but it certainly isn't encompassing all of what is important for decision theory.

A lot of the practical technology you use today (from multi-user wireless networks, to GPS systems and multi-filter medical imaging) use these decision theoretic systems and are pretty outside of the scope of what I'd consider OR.

> You can do splining without over-fitting.

If you use AIC/BIC you can do splining in some circumstances without over-fitting. The over-parametrization problem I was referring to is specifically relating to continuous and differentiable everywhere parametric regression systems (so not really splining).

Splining is also pretty inflexible in the sense that it becomes very ambiguity in higher dimensional spaces where these over-parameterized systems tend to be used. I guess you can call basis interpolation models and support vector models to be higher dimensional splining (and some people do call basis interpolation systems like MARS to be multivariate splining), but it's definitely less applicable as an approach when you are dealing with high dimensional systems.

Anonymous No. 16092133

>>16092116
Yes it is. The groups doing that within the government are largely 1515s with support of network engineers but the ACTUAL decision tree is being built based off OR principles.

I'm sorry it's really clear you don't know what you're talking about and haven't experienced real OR.

You are really bad at generalized thinking.

Yes splines become ambiguous at higher dimensions, that's not the point. The same exact phenomenon exists with splines and splines are differentiable EVERYWHERE, just not infinitely differentiable. But regardless none of this matters.

This 'special property' in neural networks show up EXACTLY in splines. i.e., more parameters, negative degrees of freedom leading to LOWERED prediction error.

Because you're a frustrating human being, I'll give you the exact pages to read in this SAME phenomenon with splines in ISLR (a.k.a. the double descent): Section 10.8 page 439. This is a free textbook by the way.

What I'm saying isn't that splines are exactly neural networks, neural networks are after all functions of functions, but the phenomenon exists with splines and go back to exactly why we use splines in the first place as opposed to Taylor expanding discrete functions.

Anonymous No. 16092138

>>16092116
Again, we have different definitions of OR, you are looking at OR as a process or field focused on a very specific subsection of organizational processes, but that's not what it is. There's a reason operation is in the word.

They're using the decision theoritic systems to automate the Operations of these systems. Like it's not a complicated concept. They use math in these systems as well and computer programming, sure the engineering of a system encompasses many fields, but that doesn't make the construction of those fields suddenly disappear.

To put in perspective, I don't think decision theory exists on its own. It exists within an operational context. So every related task becomes OR.

My general definition for OR, that is exactly how I see operations research analysts used across government (on GPS systems as well) is just general optimization of any decision. Whether this is optimizing and modeling networks with network engineers or helping leaders make decisions. You see them pop up in all of these contexts. The 1515/15A is a very broad field.

Anonymous No. 16092141

>>16092086
Thanks for sharing, I agree with you that Hastie would know better than to write a paper that merely recapitulates splining (unlike the NATO paper, where I can't reject the [math]\mathcal{H }_0[/math] that they're just recapitulating the asymptotic inference theory that I briefly saw in econometrics, before turning to more applied work).
Though I'm surprised that reproducing kernel Hilbert spaces were only briefly mentioned and never brought up again; it seems like the natural avenue to pursue given that their models are linear. But maybe they've a reason for that, and having neglected theoretical statistics for a while, I'd need more time to digest their paper before passing judgement.

Anonymous No. 16092146

>>16092133
> The groups doing this within ... Regardless the ACTUAL decision tree is based off of OR principles.

1). This isn't true unless you consider the entire concept of a hypothesis test or optimization against a loss function to be within the scope of OR. If you do mean to imply that OR contains all of optimization of a statistical system against some shared loss function, I'd recommend you spend less time in your business oriented bubble and actually try to understand what other people in thess areas have done and their influences.

Most of the original work on this stuff was done by Bell Labs guys in the 40's for the early stages of digital communications for telephone networks. Graph models for economics purposes came later.

The military didn't start using these statistical decision structure systems for personnel management until after the fact when Nash's theory of un-cooperative games way later than the development of multi-user network optimization for engineering contexts.

2) Splines are not in general differentiable at their intersection points. There are ways to try and ensure differentiability, but it in general is not the case that they are differentiable everywhere, even if you use something like a cubic spline which is smoother than a linear/hinge spline.

3) The author of the over-parametrization paper is the PI of ISL. Have you ever considered that he would probably not have dedicated a good chunk of his life researching this over-parametrization problem if it was already solved by splines, covered within a textbook he played no small part in writing?

4) The over-parametrization paper doesn't solely rely on neural networks. They do demonstrate that the same effects are observable on single layer neural networks as the number of neurons grows and that these non-linear NN models are more flexible, but the primary development in that paper is on a linear basis model (before extending it to the non-linear single layer NN).

Anonymous No. 16092154

>>16092138
Wait, so do you think the entirety of hypothesis testing is an OR concept?

Would you consider, as an example, the Neyman-Pearson criterion to be something that belongs to operations research?

What about Fisher Information and the MVUE
/minimum MSE distortion criterion for population selection?

You realize that both of these concepts significantly predate the development of operations research as a proper field following WW2? You might as well claim that all of applied probability theory from Laplace onwards belongs to operations research at that point.

Image not available

320x180

mqdefault.jpg

Anonymous No. 16092241

>>16090903
Sorry for the LinkedIn speech. It means I studied business and 4+ data/statistics related courses & learned the rest through experience.
>>16091056
thanks for this, picking it up now

Anonymous No. 16092724

>>16072199
What's your response to those who think statistics is neither mathematics nor a branch of mathematics?
Even universities consider them separate. Mine has the faculty of mathematics but three separate departments; mathematics, statistics and mechanics

Anonymous No. 16092774

>>16092724
Statistics isn't a branch of mathematics, but it applies mathematics to solve problems.

Just like I wouldn't call theoretical physics or optimization mathematics, I wouldn't call statistics mathematics despite them all being mathematically involved fields.

Anonymous No. 16092931

Im a plug and chugger trying to get this
Can you use correlation as a probability? Like if there was a correlation of .9 between being black and stealing is that a 90% chance a random negro is a thief?

Anonymous No. 16093157

>>16092054
Which electoral system is it equivalent to? Some tasks serve multiple purposes, but most voters don't understand the skill or artefact well enough to see where it can be used again.

Anonymous No. 16093258

>>16092931
I would have to see the full correlation table to say something about it. Because is it normalized to 1?

🗑️ Anonymous No. 16093506

>>16092931
No, the formula for correlation between X and Y is
[eqn]\frac{ E[X \cdot Y] - E[X] \cdot E[Y] }{ \sqrt{ \left( E[X \ cdot X] - E[X] \cdot E[X] \right) \cdot \left( E[Y \ cdot Y] - E[Y] \cdot E[Y] \right) } }[/eqn]
where the E stands for mathematical expectation. This can only be applied to numerical data, so when X = being black and Y= stealing, you need to encode them into something like "X = 1 if you are black, and X = 0 otherwise". Then you can verify that E[X] = P(X=1), and so the formula for correlation simplifies to
[eqn] \frac{ P(X = 1 \text{ AND } Y = 1) - P(X = 1) \cdot P(Y= 1) }{ \sqrt{ P(X = 1) \cdot P(X = 0) \cdot P(Y = 1) \cdot P(Y = 0) }}[/eqn]
which is still not pretty, but about as good as it gets.

This would be equal to the proportion of all negroes who are also thieves, which is also equal to the number of all negro thieves divided by the total number of negroes. Now, divide both the numerator and denominator by the total population of interest, and you get a ratio of two probabilities
[eqn] \frac {P( X = 1 \text{ AND } Y = 1) }{ P( X = 1)} [/eqn]

So the two are similar, but not the same. Notice that correlation is symmetric between X and Y, while "90% chance a random X is a Y" is not equivalent to "90% chance a random Y is a X".

Anonymous No. 16093509

>>16092931
No, the formula for correlation between X and Y is
[eqn]\frac{ E[X \cdot Y] - E[X] \cdot E[Y] }{ \sqrt{ \left( E[X \cdot X] - E[X] \cdot E[X] \right) \cdot \left( E[Y \cdot Y] - E[Y] \cdot E[Y] \right) } }[/eqn]
where the E stands for mathematical expectation. This can only be applied to numerical data, so when X = being black and Y= stealing, you need to encode them into something like "X = 1 if you are black, and X = 0 otherwise". Then you can confirm that E[X] = P(X=1), and so the formula for correlation simplifies to
[eqn] \frac{ P(X = 1 \text{ AND } Y = 1) - P(X = 1) \cdot P(Y= 1) }{ \sqrt{ P(X = 1) \cdot P(X = 0) \cdot P(Y = 1) \cdot P(Y = 0) }}[/eqn]
which is still not pretty, but about as good as it gets.

>90% chance a random negro is a thief?
This would be equal to the proportion of all negroes who are also thieves, which is also equal to the number of all negro thieves divided by the total number of negroes. Now, divide both the numerator and denominator by the total population of interest, and you get a ratio of two probabilities
[eqn] \frac {P( X = 1 \text{ AND } Y = 1)}{ P( X = 1)} [/eqn]
So the two are similar, but not the same. Notice that correlation is symmetric between X and Y, while "90% chance a random X is a Y" is not equivalent to "90% chance a random Y is a X".

Anonymous No. 16093522

>>16092931
>>16093509
Another difference between the two: correlations can be negative, but conditional probabilities cannot be.

(I assume your use of a taboo example is to maximize the probability of getting a response written by a human rather than generated by AI. This is a perfectly rational response, and we'll probably see people getting more racist on the Internet as a result.)

Anonymous No. 16093572

>>16093509
This was actually pretty good.

Anonymous No. 16093620

I'm looking for a good textbook on generating permutations with constraints, preferably with some algorithms for both approximation schemes and exact solutions. Any recommendations?

Image not available

1038x1503

datahazard.jpg

Anonymous No. 16093629

Anonymous No. 16093779

>>16093620
What do you mean by permutations? Do you mean generating random samples within a region or actual permutations in the technical sense?

Anonymous No. 16093802

>>16092724
I agree with them. Statistics is a hybrid field which uses mathematics to solve some problems. But it isn't a subfield of probability because it also overlaps too much with numerical linear algebra and machine learning. It's really its own thing.

Anonymous No. 16093981

>>16093629
Really makes you think

Image not available

1280x796

IMG_20240324_0336....jpg

Anonymous No. 16094014

>>16079706
Draw a bell curve. Randomly generate a number for the x axis where the probability for any number matches the curve. once you have your number, erase the part of the curve that is drawn after your number. That's what being born in the 20th century looks like.

Anonymous No. 16094186

>>16092774
>>16093802
I disagree.
what separates mathematics from the other sciences is rigor and independence of reality. Statistics is both rigorous and independent of reality. Every theorem in statistics has a proper mathematical proof and all of its truths depend only on its axioms.
Statistics is pretty much just the calculus and algebra of random variables + probability theory, combinatorics, logic and friends.

Anonymous No. 16094397

>>16094186
So would you consider the theory of computation side of computer science mathematics? What about theoretical physics?

Both of those are axiomatic and independent of reality. Hamilton's principle of least action, as an example, is an axiomatic statement about their theoretical modeling assumptions, not something empirical.

Similarly, computational automata and theory of computation are axiomatically constructed and all of computability comes back towards algebraic axioms.

Do those count as mathematics in your view? Does something need to be explicitly empirical for it to be not mathematics?

Anonymous No. 16094421

>>16094397
I guess we can all agree that math has to be rigorous and independent of reality.
what other criteria would you add that would exclude theoretical physics, computation theory and statistics from mathematics but still include what is undoubtedly mathematics?

Anonymous No. 16094469

>>16094421
I don't really have a perfect answer, but my gut feeling is that it has to do with the "intention" of the pursuit.

For example, what I work with (statistical signal processing) works on entirely abstractly constructed signals with the aim of being an approximation of how manipulation on an empirical signal would work. There are no properly linear systems or properly time-invariant systems in reality, but the abstraction of LTI systems allows us to construct a rigorous axiomatic framework where we can solve problems that will well approximate reality.

If you are just looking at the mathematics itself, the most of signal processing would be mathematics rather than an engineering field. Yet it's done in an engineering department rather than mathematics because the whole purpose of the axiomatic framework is to operationalize and solve problems in the real world.

Statistics is much the same way. The construction is axiomatic, there are theoretical underpinnings and assumptions (e.g., CRLB for MVUE parameter estimation), but the intention is to provide usable tools for answering problems in reality.

Anonymous No. 16094532

How bullshit are stock projections?

Anonymous No. 16094543

>>16094532
Depends on your input model and how many of them you are doing. I have done work on neural nets and stock predictions. They were ass.

Anonymous No. 16094549

>>16094543
I have no idea what goes into the microsoft one. If the mob is trusting it maybe it does work since since they will dump money in whatever the model says is good?

Anonymous No. 16094591

>>16093779
I mean actual permutations in the purely technical sense right now. Pure math shit, though they'll form a statistical and probability assessment in the next step.

Anonymous No. 16094752

>>16094591
Okay, the best book I'm aware of that might be useful is Simulation by Sheldon Ross. It covers procedures for generation of discrete random variables, which you could use to generate permutations of your (I'm assuming discrete) N-dimensional random variable.

Depending on what the constraints are, you could do rejection sampling where you generate via a simple mechanism some excessive number of samples and then keep the first M of them that meet your constraints. If that won't work for your application, you might need to get creative with transformations of your random variable (if it's even possible to generate them explicitly while meeting your constraints).

Anonymous No. 16094754

>>16094591
Another option that I just thought of that could have been included here >>16094752

You could just make an explicit list of all of the permutations of your object (again assuming that it has finite dimensionality, and the notion of "all of them" makes sense for your application) and then generate your permutations first removing samples which don't meet your constraints and then drawing a random integer in your filtered list.

Anonymous No. 16094896

>>16094752
Appreciated, I'm looking through it now.

That said this isn't really a statistics topic (though it is a probability topic) so thanks for the help

>>16094754
I'm actually using such an algorithm now in a monte-carlo style simulation. The issue I'm running into is that my constraints are often such that an exact solution can complete faster than the approximations approach converges, and I'm not sure how to examine my constraints beforehand in a coherent way to tell the difference (or even how to optimize the exact algorithm).

Anonymous No. 16094918

>>16094186
There is more to statistics than theorems of inferential statistics. Figuring out a way to colour code data to best visualise it is statistics. Pie charts is statistics. Everything in statistics isn't proofs. In fact, a lot of Fisher's fundamental work on it used only very basic math.

Anonymous No. 16095263

>>16094549
Microsoft has a stock picking/prediction model that is free? seriously

Anonymous No. 16095339

>>16095263
You shouldn't be surprised by this, once you think about it from their perspective.
As a big player, you'll want to encourage speculation and manufacture as much hype as possible, since a more volatile market makes leverage easier for you (the option gurus even have a mathematical theorem for this, something along the lines of relating convexity and payoffs). This means getting more people to put their money into the market, so that you can ride on the waves they generate.

Image not available

639x499

bd73385c0e218a20b....jpg

Anonymous No. 16095528

is this true

Anonymous No. 16095652

>>16095528
Is violence an extensive or intensive variable?

Anonymous No. 16095658

>>16095652
kek

Anonymous No. 16095688

>>16095339
I don't disagree, I have wondered a lot about blackrock, WSB and how retail fits into that picture. More volume means more volatility which means you have big waves to ride for profits.

Anonymous No. 16096507

>>16093629
That's distorting the issue. White people may only be responsible for 3% of murders of blacks and 6% of murders of asians, but they are responsible for 100% of the hate crime murders of those groups, basically by definition.

Anonymous No. 16096529

>>16094014
That doesn't make any sense. What do the axes of the bell curve correspond to, and what does that have to do with inferring the total number of humans to ever live from the present day value?

Anonymous No. 16096673

>>16096507
>basically by definition.

Are you being serious now or is this just you trying to trole a bit?

Image not available

460x920

Abcdo.png

Anonymous No. 16096722

Image not available

400x234

giphy.gif

Anonymous No. 16096734

>>16096722

Anonymous No. 16096742

>>16096673
Huh? You need to educate yourself about racism

Image not available

456x547

cancer.png

Anonymous No. 16096746

>>16096742

Anonymous No. 16096775

It's over statistically speaking
The probability of success is reaching 0

Anonymous No. 16096978

>>16096529
If we put you into a closed cubicle and you could only see behind you, it would be improbable for you to look back and see only the wall. Without access to other information, you are forced to make a statistical estimate of the number of people in the room.

Anonymous No. 16096998

>>16096529
>>16096978

The peak of the bell curve is the point of highest probability. You are more likely to have an IQ of 100, for example. And if a population had peaks and valleys, you would be most likely, per unit of time, to be alive during the population peaks.

If you can collect additional data about the history of your species development, you could make probabalistic guesses about where things might be going.

If we were a boom bust species with multiple population peaks, what would be the chance of you being born on the FIRST one?

If our population leveled off and achieved a steady state, what are the chances you would be born right at the BEGINNING of the long marathon?

If our population was a single boom and bust burnout, wouldn't you be born somewhere in the middle of the bell curve, MOST LIKELY?

Considering these things, how might our known population statistics combined with other contextual cues (nuclear weapons? climate change?) affect your estimation?

Anonymous No. 16097044

>>16096998
We have had multiple population peaks and valleys, this peak just happens to be the one that's been the highest so far.

During the black death anywhere between 75 and 200 million people died during a time where population estimates suggest there were fewer than a billion people on Earth in total. That's a huge local maximum before a massive bust.

Before that there was a massive population maximum during the height of the Roman empire before a massive population collapse between 475-550 A.D. which took well over 300 years to fully recover. Similarly, in India and China their entire history is one of dynastic population boom and bust cycles.

You're confusing the current peak being the highest we've seen so far for it being THE peak.

Anonymous No. 16097071

>>16097044
If I was born outside of this one, I would likely be born in one of the previous peaks of lesser magnitude. But the largest one dominates the bell curve. And if there was a future, bigger peak, it's surprising I wasn't born on that one. Or the next bigger one. Or the next one.

And each one seems more and more implausible, because such numbers would dominate the probabilities to such an extent that I would centainly be among them.

If this isn't the biggest, it can't get much bigger.

Anonymous No. 16097088

>>16097071
Your line of thinking is flawed.

If you were to try to model the process of population to determine the most probable time for a person to be born, you'd be looking over intervals in time for the birth/death process. The time that you'd be most likely to be born is the interval on which the cumulative sum process of the population was rising with the largest drift, not the timeframe where the population was the highest. The period of time in which you are the most likely to live is during the interval following the fastest population rise, not the timeframe when the raw number is the highest. Look into birth-death processes if you want more information.

The period of time in which the highest population occurs could happen during a period in which the drift of the CUSUM process is positive but almost near zero over a long period of time. If you were to them compare that to a shorter interval where the CUSUM drift would be higher, you'd could very well have a higher probability of being born.

Consider, for example, a time period of 150 years where the birth rate is roughly 1.01 births per death per annum. This would correspond to an average population increase of roughly 1% per year. If we compare this to a 20 year period where we have 2.5 births per death per annum, and you'd find that for any given interval of time your probability of being born during the higher drift period would be much higher.

Anonymous No. 16097173

>>16097088
Nigga your parents were called "the baby boomers" and your still not shitting your pants

Anonymous No. 16097174

>>16097173
And the baby boomers had less children per capita than their parents.

If you are alive now, you were more likely to have been born during a time when birthrates were higher than when birthrates were lower. It's so definitionally straightforward that it's basically a tautology.

Anonymous No. 16097186

>>16097174
The number to watch is total births per year.

> The number of births in the US has been steadily decreasing since 1990, with about 3.66 million babies born in 2021.

Great, the year I was born. I am literally dead center of the bell curve for the country I was born in.

Anonymous No. 16097195

>>16097186
Yes, and yet I'd you look at the proportions of the population, baby boomers are still about 20.6% of the population despite over half of them being above the median age of death for a US person. This should tell you how absurdly big the baby boom was from a population perspective.

Anyways, the point I was making is that if you want to make statements about the probability of a total population number decreasing, what you need is a measurement on the average statistics of the CUSUM birth/death process.

At that point you are looking at P(max(sum X_n)<= T) where T is some threshold. This will be dependent on both the X0 number (total population at the start of your trials), the average drift (birth rate and death rate) and the stopping threshold a T. A peak X0 with a consistently declining birth rate (like we see in the west) would indicate there is some T not much bigger than our own where that P is close to 1. Globally, it doesn't seem like population growth is slowing that much, and in many cases this population growth in other parts of the world directly leads to western nation population growth via migration (regardless of what you think of the political/social ramifications of this are).

Anonymous No. 16097286

>>16096529
The bell curve argument seems to be about the with future population being lower. I don't see it predicting extinction around which population can't be a continuous variable. If there is a distribution involved in getting the 1.2 trillion, it's a uniform distribution on 0 to 1.2 trillion. The 1.2 trillion was chosen so that drawing <60 million is unlikely enough.

Image not available

1200x1372

1660457376104973.jpg

🗑️ Anonymous No. 16097427

Anonymous No. 16097495

>>16097427
I love pie charts.

Anonymous No. 16098506

I actually like the bell curve population discussion. It really makes the old noggin start joggin.

Anonymous No. 16098557

>>16072199
How do I get into probablity and statistics. I'm just about to finish my joint math degree and I've realised I've done absolutely 0 stats. How do I learn it myself and show that I know it to an employer?

Anonymous No. 16098595

>>16098557
By projects and able to codemonkey most problems. Start off with some stochastic processes and then apply what you learnt. You will be astounded as to how beautiful even applied statistics and their visuals can be.

Anonymous No. 16098608

>>16098557
1. Whatever math your field is joint with, look up what its leading researchers are doing with "data" (use this keyword as a search filter).
2. Use your domain knowledge in that field to understand what they're doing, and more importantly why they choose to analyze their data in the way they did. In particular, look out for points where you disagree with them, or at least think that could be done better. For example, if they're fitting data to a mathematical model, try criticizing some of their assumptions. If they're avoiding mathematical modelling entirely, then be the first to build one (since you don't need to hold yourself to their standards).
3. Learn what you need to implement your idea. This should include a programming language like R or Python, but depending on the sophistication of your model, it may also involve a bit of statistical theory, which you can learn as needed. Since your goal is to show that you know stats, you should put extra effort into the analysis and interpretation of your results.
4. Write it all up in some kind of notebook (e.g. Colab), put it on Github, and give the link to employers. But don't expect them to review or even look at it, unless they're also in the same field.

Anonymous No. 16098614

>>16098608
I look at projects more than I look at the resume.

Anonymous No. 16099251

What societies do you think statistically will make it to 2100?

Anonymous No. 16099670

>>16098595
>>16098608
>>16098614
Thank you anons.

Anonymous No. 16099841

>>16099670
If you want to know anything, just ask.

Anonymous No. 16100157

>>16099841
What's the conditional probability of your mother being a whore given my countable sequence of observations of her sucking dicks in the 7/11 parking lot?

Anonymous No. 16100185

>>16100157
How many counts do you have on how many days? Also where is the money for my tendies. Stingy bitch.

Anonymous No. 16100266

>>16089366
it's an interesting read, not much useful in it except the main message which is fat tailed risks exist and aren't getting enough attention

>>16089452
I think he was more referring to financial modelling where a large chunk of the field is based on stochastic calculus (and hence the wiener process/normal distribution) and is used everywhere

Anonymous No. 16100277

>>16100266
>stochastic calculus and is used everywhere
Do they really? I always thought they just pay lip service to the theory to give themselves a veneer of "scientific credibility", while their actual decision-making process is to throw in all their data into an AI/ML blackbox and find some way to justify the output it spits out (which can also be done with generative AI nowadays).

Image not available

306x438

978-1-4612-4946-7....jpg

Anonymous No. 16100423

>>16084494
this one is pretty good

Anonymous No. 16100425

>>16100277
> Do they really?

I can't tell you exactly what they do now, but I know that when I was learning about Ito differential equations, most of the people in my SDE course were actuaries and math PhD's. I was the only one learning them for applications to engineering (continuous-time discrete-prediciton filters like the kalman-bucy filter and CKF are hugely important in state estimation).

Anonymous No. 16100430

>>16100185
I don't know exactly how many counts, but I go there to get a diet Coke, a pack of marbs, and a taquito on my lunch break pretty much every day, and she's always there. I'm honestly not sure that her position in life is that much more shameful than mine as an engineering drone, but I'm trying to figure out whether she's doing it professionally or as a hobby.

Either way, best of luck getting your tendiss money.

Anonymous No. 16100699

>>16100425
Oh, I don't doubt that academics are still teaching SDEs in their mathematical finance classes. What I very much doubt is that practitioners still believe in them and are willing to place very large bets that rest on assumptions like the geometric Brownian motion of stock prices, just as criticized by Taleb.

Anonymous No. 16100748

>>16100430
Anon, I am the one pimping her out. Thanks for ratting her out so I know she is not cheating me out of my tendies money.

Anonymous No. 16100749

>>16100699
What was the substance in Talebs critique of SDEs? Because I have to admit I don't remember.

Anonymous No. 16100757

>>16100423
Thank you anon! Looks sound.

Anonymous No. 16100773

>>16100749
>Taleb's critique
Not SDEs, but Brownian motion, because that generates normally distributed movements and Gaussians bad.
I don't remember much of the details either, but I do recall a section where Taleb praises Mandelbrot's scale-invariant (i e. fractal) financial models as being comparatively more realistic.

Anonymous No. 16100790

>>16100773
I have to re-read the black swan. I read it when I didn't know the amount of math I do now and probably couldn't fully understand what the leb was talking about.

Anonymous No. 16100834

>>16085967
i would love that chart when you're finished with it anon
disc: stillnaut

Anonymous No. 16100843

>>16100834
I think it will probably be published in a /psg/

Image not available

1x1

Analysis of Econo....pdf

Anonymous No. 16101132

>>16085967
>probability/statistics textbooks that don't require an undergrad calc sequence/linear algebra
I guess it's just about possible to explain continuous random variables, CDFs, and the significance of z-/t-statistics (pun intended) without using calculus. But avoiding linear algebra means avoiding the Jacobian matrix, and hence the transformation of multiple random variables. Without this, I'm not sure you can be said to be teaching probability/statistical theory at all.

But if you're OK with skipping that in favor of covering more applied content, you could look into subjects like econometrics, where textbooks for beginners make a deliberate effort to avoid matrices and derivatives. Wooldridge's "Introductory Econometrics" is the standard recommendation, but I personally find something like PDFrelated to be friendlier to newcomers.

Image not available

2050x1650

Untitled.jpg

Anonymous No. 16101621

Hello, I am a visitor from /fit/ and I need some help...
I maintain a custom Excel spreadsheet of all of my running data to track my progress over time. I have several columns that calculate moving averages over 1, 2, and 3 weeks (7DA = 7 Day Average...) to cancel out noise. However, you may notice that I stopped in October due to an injury but picked back up in February. That gap has obviously fucked up the moving averages, since there's so few measurements (or even just 1) on some dates... and that can skew the result if those measurements are outliers (which then affects the color shading in Excel). It's mainly an issue in the "Weight" section, since body weight fluctuates so much, and 10/27 and 2/11 just happened to be low measurements, so they're screwing up the 14 and 21 day averages. I can adjust the Excel formula to not calculate an average if there aren't enough data points in the time window... but my question is: How many data points for a 7, 14, and 21 day moving average do I need for it to be statistically useful?
Thank you

Anonymous No. 16102044

>>16100773
that was actually the bit that really irritated me
I can accept that stock price movements may be vaguely gaussian due to CLT, and also that you're pointing out this is a big and invalid assumption but to randomly suggest they're somehow fractal? where is even the foggiest evidence or logic to support that?
it's literally just a reacharound he's giving to his best mate mandelbrot

Anonymous No. 16102069

>>16102044
Modeling the random walk as Brownian motion is actually pretty reasonable. It's just probably not stationary Brownian motion because the drift and diffusion likely won't be time-invariant.

Gaussians aren't just nice because they are mathematically fairly simple. The samples coming from an IID Gaussian is also the maximum entropy distribution assumption (meaning it assumes that our current samples tell us the least amount of any similar support distribution with the same two moments). It's usually not that hard to estimate the drift/diffusion of the process via something like an ARMA, and then you assume it's Gaussian to give yourself a "worst-case."

Anonymous No. 16102220

>>16102069
brownian motion is the limit of a random walk with infintisimal steps
that doesn't mean that we can assume a random walk models stock prices, they actually follow very different non-stationary distributions which aren't just "different gaussians", in fact there are several basic aspects that continuous distributions like the wiener process can't capture

it work well some of the time, and goes horribly wrong at other times. this is what taleb was saying and I agree with, but he also randomly mixed in some bollocks about fractals.

Anonymous No. 16102221

>>16101621
You can tell excel to "fill in the blanks" with interstitial values that will be calculated according to the way you specify.

Image not available

658x1000

benoit.jpg

Anonymous No. 16102224

>>16102044
>where is even the foggiest evidence or logic to support that?
Mandelbrot litterally wrote a book about this.

Anonymous No. 16102280

>>16101621
I recommend >>16102221. Just interpolate the data. It falls off in a few weeks anyways. The data is already meaningless because it lies outside the boundaries of the rest of it.
As for your question, there is probably some variation of nyquist frequency. I would guess that 1/2 data points might be sufficient for approximation. A 7 day moving average can probably be approximated with 3-4 points. The 21 day would require 10-11 points. Samples every other day is probably fine.
You may want to forgo moving averages all together and opt for median filtering. Sampling rates should also reflect human body metrics. Weeks is a good choice, but instead of 1, 2, 3 go for 1, 3, 7 weeks. Alternatively, you should look at a seasonal sampling approach. 1, 3, 7, 13 cycle. Or pick a lunar calendar mod.
I can't say I remember running really had a noise component but maybe that is because I never ran to some plateau level where diet or otherwise was affecting the outcome.Is this actually an issue? Are you in some competitive league?

Anonymous No. 16102293

>>16102280
Finding the causal relationsship in the /fit/anons data is going to be a bitch and a half.

Anonymous No. 16102394

>>16102220
I don't think that Brownian motion exactly matches the behavior of the stock market, because obviously the sampling distributions aren't stationary.

I think it is a pretty reasonable approximation for indexed portfolios rather than individual stocks, given that you regularly update your estimated drift/diffusion.

If you really want to include fat tails, you can just use a maximum entropy fourth order exponential family distribution and generate a limiting process that way. Obviously things are non-stationary and this is an approximation, but it's generally a reasonable one for "bundles of stocks"/mixed portfolios rather than individual prices.

If I'm wrong about this, could you let me know where? Obviously the process for an individual stock will be far less stationary, but if you're looking at plotting the walk of an index fund or something it will probably be pretty stationary in terms of moments.

Anonymous No. 16102414

>>16102394
What kind of approximation do you guesstimate would work for a single stock price?

Image not available

526x768

dxh9.png

🗑️ Anonymous No. 16102469

is this data statistically significant?

Anonymous No. 16102475

>>16100266
wrt "fat tails", 30 years ago value at risk models all basically modeled returns as normally distributed when usually they have fat tails
so theyd use a VAR model to calculate what impact say, a 1% worst case scenario would have on portfolio when in reality that worst case scenario might happen 2 or 3% of the time
I believe black swan is mostly about fat tails just based on the title and what Ive heard but I havent read it

he has a different criticism of the black scholes where he suggests theres a better way to price options
I watched a part of a lecture where he derived his model from bacheliers model (the same model thats the basis of black scholes), and he even suggests that bacheliers model in its unmodified state is in some ways better than the black scholes
but never does he take issue with stochastic calculus itself, that would be retarded
as far as I can tell, theres no clear distinction between "stochastic calculus" and higher level statistics

>>16100277
>>16100699
>>16100773
what are you talking about
nobody uses "stochastic calculus" to "pay lip service", quants learn stochastic calc and sdes because you need it if you want to deeply understand the black scholes and especially any other pricing models since such as local volatility or stochastic vol
literally all options are priced using an SDE (the black scholes or newer models like I mentioned)
granted you dont need to take a whole SDE course to derive the black scholes, but at the very least you need a statistical understanding of random variables and you need to know Ito's lemma

Anonymous No. 16102482

>>16100749
>>16100773
>Not SDEs, but Brownian motion, because that generates normally distributed movements and Gaussians bad.
First of all I dont think any options pricing model since Bachelier's uses pure brownian motion, even the black scholes doesnt use pure brownian motion
iirc what Taleb criticised about brownian motion is that if future possible returns are assumed to be normally distributed, then the probability of the stock going down some amount is the same as the stock going up that amount
but this is not true because the stock can only fall to zero while the potential upside is infinite, so you need to weight the distribution such that your model for the distribution of future returns cant give you negative values

>>16102044
that guy literally has no clue what hes talking about dont listen to him

Anonymous No. 16102491

>>16102221
>>16102280
Thanks, I don't really want to to interpolate the data, but I might just do that... I'll also look into median filtering.
As for noisiness, yes running isn't really consistent. Some days I go harder, like try for a personal best, and some days I take it easy. Some days the weather is shit or I could just be tired. Some days I'm hopped up on espresso and just want to go. Some days a duck gets in the way. But, I should generally be getting faster over time, hence the moving averages. And no I'm not trying to find a causal relationship, just see in the colors that I'm getting faster...

Image not available

482x1959

selection bias.png

Anonymous No. 16102493

>>16102469
>is this data statistically significant?
It probably is, but the real question you should be asking about health data is whether the selection bias is negligible or not.

Anonymous No. 16102514

>>16102414
Well, it would certainly be some kind of autoregressive process and could probably be locally approximated fairly well by some kind of martingale.

The problem with individual stocks is that they have a lot of variability, so things like the average change from sample to sample (drift), the width of the window of typical changes (diffusion), how central/symmetric these changes are (skewness) and how often "outliers" occur (kurtosis) are going to be very time dependent.

When you have a portfolio made of potentially thousands of individual stocks/bonds/derivatives/etc. you get a lot of smoothing that occurs naturally by the fact that the overall portfolio sum is adding these random process contributions. Adding of random variables is convolution, which can be thought of as a "low pass filter" of sorts, as high frequency low power contributions are ignored against the low frequency high power contributions.

An individual stock will depend on all of these unobservable things like internal dynamics of the company, dynamics of their competitors, dynamics of the contributing market to their products, etc. All of these become pretty much intractable to figure into and make any predictions you perform far less stable than a portfolio which is "smooth" by comparison.

Anonymous No. 16102657

Market capitalization is price P multiplied by quantity Q. How is PdQ or QdP identifiable if we don't know whether high volume is a single share being sold many times or many shares being sold once.

Anonymous No. 16102992

>>16102657
whats PdQ and QdP

Anonymous No. 16103114

>>16102657
Do you want to calculate volume backwards or what is the thing you are trying to do? because Volume is often one of the givens when it comes to stock data.

Image not available

301x437

analogy-to-gas-work.jpg

Anonymous No. 16103259

>>16102992
>>16103114
If I had more data on each transaction, namely the seller's realized capital gain or return on investment, I may end up with a better size estimate than market capitalization. Unlike gas laws, the market equation of state is unknown.
PdQ or QdP are in analogy with picrel: p is price or pressure, Q is the number of shares analogous to gas volume.

Anonymous No. 16103271

what's the point of transforming random variables? the professor spent all the time talking about how to do it and what to do when the density function isn't monotone but didn't say why to do it.

Anonymous No. 16103289

>>16082439
based

Anonymous No. 16103291

>>16085545
At least in the Python version, there are a lot of problems with Hastie's code just so you know. And they seem to put very little effort into maintaining it, which makes the exercises of dubious value.

Anonymous No. 16103305

>>16103259
oh you mean like infinitesimal change in Q with respect to time? wouldnt it be dQ/dt?
coming up with some "fair value" formula that accounts for velocity of shares (analogous to the velocity of money in quantity theory of money) might be an interesting idea, but idk why youd just flat out multiply PdQ or QdP
also I dont quite understand what you objection to market cap is in the first place or what you mean by market equation of state
market capitalization is *defined* as the asset price times quantity
and the price is price, its just whatever people decide to buy and sell an asset for
these are definitions, you cant improve on them

Image not available

1047x923

market-gas.jpg

Anonymous No. 16103427

>>16103305
The green line is the "equation of state". Maybe a supply or demand curve is better. Integration is over price or number of shares. Time is implicit or maybe even suspended for the purpose of this model. The red area is what it might cost to buy up all the shares, or what the current owners would get if they all had to sell.
Market capitalization would be equal to both of those if you can pretend that someone can provide unlimited liquidity for free.

Anonymous No. 16103582

>>16102221
>>16102280
I did some experimentation with interpolation using the FORECAST.LINEAR function. It worked well for the bodyweight columns, since the progress actually was really linear. For my running times though, the injury really fucked it up. If I use forecasting to fill in missing data, the progress I was making last year makes it so the interpolated values are lower than they should be. I tried playing with the source data range to only include recent times, but it still wasn't great. It also fucks up at the beginning because progress comes rapidly at first and then tapers off... I might be stuck with what I have now... oh well, thanks

Anonymous No. 16103601

>>16103582
Honestly I would just prune the data you did not have. That is how I would solve it at work.

Anonymous No. 16103609

>>16103601
You mean to just not calculate rows where there aren't enough previous values within the 7/14/21 day window? Yeah that's probably what I'll do

Anonymous No. 16103628

>>16103427
this is just so wrong I dont even know where to begin
please take calc I before posting here

Anonymous No. 16103813

>>16103609
Or scrub the entire period with shitty data. Because you cannot use it anyways because of the injury.

Anonymous No. 16103828

>>16103628
Let the kid cook. I am immensely entertained by him and the guy that uses the gas law to look at markets.

Image not available

1507x1660

Untitled.jpg

Anonymous No. 16103838

>>16103813
I don't want to remove that data because it gets factored into the reg/yellow/green shading, which shows my progress (where the greenest are personal bests). I adjusted the formulas to limit the moving averages to windows where there's at least 3/7, 6/14, and 9/21 days of data. This is the result, which makes sense I guess. I should have run more consistently last year. I made the 3, 6, and 9 cell references, so I can raise/lower that if necessary.

Anonymous No. 16103860

>>16103838
Honest question anon. Are you pro or near pro level athlete? Because otherwise I don't understand why anyone would do this much cardio. The only people besides athletes that need this much cardio is infantry and LEOs.

Anonymous No. 16103872

>>16103860
>Are you pro or near pro level athlete?
lol, my personal best 5K last year was 24:25. Runners in the Olympics are 10 minutes faster than that, so no I'm nowhere near pro. (In my mid 20s, I got down to like 21 min.) Also, I don't run that much. According to the sheet, I ran 15 miles in the past 7 days (and my most was 31 miles). Eliud Kipchoge, the best ranked marathoner, runs like 140 miles a week. I just like to run. It helps me stay in shape for when I want to go hiking in the mountains. Also, it's a fun and healthy way to generate data to play with in Excel. I haven't even added the heartrate data from my Garmin watch...

Anonymous No. 16103876

>>16103838
Fuck excel

Anonymous No. 16103895

Will my chances with girls increase if my P value is big?

Anonymous No. 16103912

>>16103872
Stop running and get a rowing machine instead. You are going to fuck your knees for nothing. Or lift weights. You are going to be fit for hikes, but without the wear and tear on ligaments, joints and your mitochondrial system.

Anonymous No. 16103916

>>16103876
I do agree, but it can be very useful sometimes.

Anonymous No. 16103972

Masturbating over bayes theorem rn

Anonymous No. 16103978

>>16103972
bayesd.

Anonymous No. 16104035

>>16103972
reddit's theorem

Anonymous No. 16104038

Tell me your most schizo theory

Anonymous No. 16104067

>>16104038
With or without data to back it up?

Image not available

1600x1600

HJr3.jpg

Anonymous No. 16104075

>>16093629

Image not available

1200x750

Euic.png

Anonymous No. 16104224

Anonymous No. 16104338

>>16104067
With or its too crazy to believe

Anonymous No. 16104474

>>16102475
you're wrong, stochastic calculus is almost entirely the study of the wiener process, which is normally distributed
VAR models are also based on the normal distribution of the stock price and calculating the x-percentile loss based on that

becheliers model is shit because it assumes that the stock price is randomly distributed rather than the percentile change, which opens the possiblity of negative prices

>>16102482
what do you mean "pure" brownian motion? they use regular old brownian mtion in the black scholes model: the evolution of the stock prices follows the process dS=uS dt + vS dW

Anonymous No. 16104493

>>16104474
>VAR models are also based on the normal distribution of the stock price
No, theyre based on returns (ie how much the price moves per sample period, eg daily return). Price is never normally distributed, returns are either normally distributed or some sort of normal-looking fat tail distribution

>and calculating the x-percentile loss based on that
literally what I said

>what do you mean "pure" brownian motion?
I mean it uses a drift term and a term that reduces the variance of returns as the price gets larger
I dont believe bacheliers had those but I could be wrong, Im just going off of memory

Anonymous No. 16104700

>>16104338
But if it has data that checks out, how is it a crazy conspiracy theory? Check mate atheists.

Image not available

200x252

Fisher.jpg

Anonymous No. 16104928

To become a god tier statistician in college, what courses should one pick and what would be the optimal progression?

Anonymous No. 16105228

Ok... another thought I had about using moving averages: what if all of my data points for the past "week" are really in the past few days? Like I take a break from running for four days, but then run 3 in a row. Then, the average isn't really over a week it's over 3 days. I've considered feeding those three values into to the FORECAST function and having it calculate a value for 3.5 days ago (the middle of the past 7 days), but does that make sense? Maybe I should just use FORECAST to interpolate what today's value should have been, and then call it "Smoothed" or something. Maybe I'm thinking too hard about all of this...

Anonymous No. 16105229

>>16105228
Stop running for cardio if you don't need it for your job. It's insanely taxing for the body and you are not doing yourself a service.

Anonymous No. 16105506

>>16104700
Cherry pick conditional probabilities that support your 5g mind control theory and ignore ones that dont

Anonymous No. 16105508

>>16105506
People in this thread are far too rational for that, broslice.

Anonymous No. 16105511

>>16093629
>according to these statistics, letting your guard down in the presence of negroes is inadvisable

Anonymous No. 16105523

>>16105508
What if the towers already fnorded the probabilities from your brain that count against the theory? What is the probability of that?

Anonymous No. 16105533

>>16105511
It is the old statistical maxim of "around blax, never relax".

Anonymous No. 16105592

>>16105229
It’s “taxing” because people push themselves to do too much after too little training, if you stay consistent and leave time for yourself to adapt its fine

Anonymous No. 16105717

>>16105592
No because you are exhausting the energy reserves in your body and what you believe is "good" for you is just a release of dopamines and other shit to heal the pain and damage from the strenous effort.

If you are a man, you should be lifting weights. Not running around like some retard. I am basing this on the fact that weightlifters who did not use steroids fared very well against Covid, whereas maraton runners and runners fared extremely poor, even poorer than skinnyfat or overweight retards that never exercise.

Anonymous No. 16105753

>>16105717
Nta but what you are saying is retarded (and I hate cardio and am pretty bad at it).

Weightlifters and juicers were dropping like flies during the COVID era because of the extreme burden the spike protein put on the heart (both from the myocarditis risk from the vaccine and the virus itself). There were a ton of body builders who were very young who passed away during 2020-2023 when COVID was more significant of a concern, and it was pretty much always from heart related complications (which exogenous hormones make much worse).

Also, cardio is super good for your health. Even if your primary exercise focus is weight lifting, you should be shooting for somewhere in the neighborhood of 100-150 minutes of moderate cardio just for the cardiovascular and circulation benefits alone. Even if you aren't concerned about your heart exploding, you'll get better utilization of the testosterone in your body because your circulation will be better and you'll recover faster as well.

Anonymous No. 16105772

>>16105753
I said if they did not use steroids. If they used steroids it was literally like doing marathon shit. If the coof didn't get them, the vax certainly did.

Weightlifters who are nattys did not see the coof as something more lethal than a flu. If they roided they were fucked.

If you use dumbbells, that is cardio which is enough for the heart desu. Also if you are older than 20 years old, you will have to use a stairmaster or some shit to soften up your joints before heavy squats or deadlifts.

That cardio is enough. You could be doing that kind of cardio and still run faster and longer than some retarded marathon runner.

Anonymous No. 16105860

>>16079675
because they're low IQ

Anonymous No. 16105908

>>16079675
Never heard of a bullshit excuse?

Image not available

1440x1799

1645564614760.jpg

🗑️ Anonymous No. 16105942

Women are loot boxes and you need math to game the system to get laid

Anonymous No. 16106393

>>16105942
I believe it.

Anonymous No. 16106400

I want to fit a dataset i have to a bunch of distributions, but im not sure how to bias the values of the data to fit distributions that require >0 values. How do i decide what lower limit to set? I dont want it to be iterative in the sense i try a bunch and pick the best fit curve

Anonymous No. 16106404

>>16104928
To be a good statistician you need domain knowledge. Otherwise you will never understand the causality of your data. Get into a discipline and learn stats based on the problems it throws at you

Anonymous No. 16106695

>>16106404
I believe you. I just wonder what kind of math courses and so on should I take before I choose a field where I will gain domain knowledge?

Anonymous No. 16106697

>>16106400
You could always do what I do? I plot the data points I have and then I just curve fit it. However, the big problem with this way of doing things is that the predictive power isn't really there, more that it's easier interpolating between regions of data I already collected. For instance if something is between A and B, I know what the value should be at A+ that is towards B.

You can try to just spit some distributions towards the plotted data as well to see if you can find some characteristic for the general phenomena.

Anonymous No. 16107086

Did you have a big aha moment when reading a book about statistics? Which one was it?

Anonymous No. 16107109

>>16107086
The anecdote where researchers trained an image classifier to distinguish photos of two kinds of objects, only to discover that because the two kinds of photos were taken under different conditions, the model ended up learning to discriminate on the background instead.
There are many versions of this story floating around, with the objects tailored to a particular domain (tanks, lesions, etc.)

Anonymous No. 16107122

Maybe this is a better place to ask than /mg/, since I got no response..

Doing my bachelors final capstone course. Not quite sure what to do or how far to take it. I'm too shy to talk to professors and ask. I think they're too kind, and will tell me that pretty much anything is good. I wanted to do something on SDE's but I'm not sure if I can do it well enough. I'm confident enough to cover the breakdown of an Ito integral, a basic wiener or poisson process, and a basic SDE as a super duper ODE, but I'm just following along the book by Evans(intro to SDE) and its written to easy to follow that it just makes my whole work seem a bit easy maybe? Am I supposed to just study whatever I want and call it a day, I completed the degree? Should I throw in some numerical method solution examples?
Just need some advice, this is my last sprint and I finish my bachelors....

Anonymous No. 16107161

>>16107122
Ask them for previous cap stone projects in SDEs and the scope of what you should have accomplished. Good luck anon. SDEs are very fun.

Anonymous No. 16107166

>>16107109
So instead of doing what it should be doing, the machine just did something that made the researchers luck out. kek.

Anonymous No. 16107526

Is data analytics a meme shilled by youtube grifters? High school math filters all the midwits so i dont think it will saturate even if every clickbait viewing moron starts applying. Is it future/ai proof?

Image not available

680x639

N16.jpg

Anonymous No. 16107527

Do these statistics prove that the holocaust was fake?

Anonymous No. 16107783

>>16107526
you can look at whatever website recruiters post their job listings on. that will give you a far more info that whatever would be posted here
>High school math filters all the midwits so i dont think it will saturate even if every clickbait viewing moron starts applying.
statistically speaking, your competition isn't the general population, it's all people with phds in math or something math heavy.

Anonymous No. 16107819

>>16107527
If you would like to check the stats of it all, I am sure you would find some very peculiar wartime shenanigans, but postwar.

Anonymous No. 16107824

>>16107526
Depends on what route. If you get a stats/math bachelors, it's viable and will be for a very long time. The shilling is mostly done for "non-degree" jobs which I think can work in some edgecases, but for most, it won't work. So this is one field where you actually should get a degree.

Anonymous No. 16107955

I just ran into data I got the suggestion to use a negative binomial family in the code. Can somebody explain what the negative binomial distribution is? Because I have literally never seen that even in text books. Maybe my college was retarded or the textbooks were too cookiecutter but pls halp.

Anonymous No. 16107975

>>16107955
Negative binomial is just counting the number of failures in N trials instead of the number of successes.

P(x = k|N) means the probability of there being exactly k failures in exactly N trials. It's basically just a binomial but flipped the other way (counting the number of tails coin flips instead of heads).

Anonymous No. 16107976

>>16107975
What is the reasoning behind using this instead of just doing binomial and taking 1-P(bin-run) ?

Fledgling Investor No. 16107989

Whats the probability of getting to clown level in less than an hour?

Anonymous No. 16107992

>>16107975
It's flipped in a different way. It's not that the new p is 1-p for the coin or for the draw. It's that you keep flipping until you get the specified number of tails, whereas for the binomial N is fixed in advance.

Anonymous No. 16108017

>>16107992
I found out that any distribution that utilizes negative binomial usually violates homoscedacity and has an overdispersion (variance greater than the mean of the response variable).

But it's hard to grasp desu, this is not a common distribution in textbooks or something I have seen in graphs or scatterplots.

Image not available

183x275

faraway.jpg

Anonymous No. 16108079

>>16108017
Overdispersion sounds right but mean and variance usually have different units. Faraway probably explains it.

🗑️ Fledgling Investor No. 16108385

[math]
>\begin{equation}
\label{eq:bayes}
P(\theta|\textbf{D}) = P(\theta ) \frac{P(\textbf{D} |\theta)}{P(\textbf{D})} ~~~~~|| I,
\end{equation}[/math]

Fledgling Investor No. 16108390

>[math]
\begin{equation}
\label{eq:bayes}
P(\theta|\textbf{D}) = P(\theta ) \frac{P(\textbf{D} |\theta)}{P(\textbf{D})} ~~~~~|| I,
\end{equation}[/math]

Fledgling Investor No. 16108396

>[math]\begin{equation}\label{eq:bayes}P(\theta|\textbf{D}) = P(\theta ) \frac{P(\textbf{D} |\theta)}{P(\textbf{D})} ~~~~~|| I,\end{equation}[/math]

Anonymous No. 16108401

>>16107527
yes

🗑️ Anonymous No. 16108408

The mean for my stat midterm was a 59. I managed to barely scrape above that. I don't know what the fuck is going on in that class.

The professor doesn't use a microphone so no one can hear them. Can't even read the slides because whoever made it decided to highlight every fucking word in a different color without any pattern to it. Its over..

Image not available

391x320

lovethisthread.png

Anonymous No. 16108501

>>16108079
I love you anon. You always come with the best book recommendations.

Image not available

1175x1629

wtf jews.jpg

Anonymous No. 16108521

>>16107527

Image not available

440x440

thinking.gif

Anonymous No. 16108527

>>16108521

Anonymous No. 16108620

>>16108079
> Mean and variance usually have different units

Unless they are unitless quantities (which can happen in some cases) they are all but guaranteed to. If the random variable has a realization in units, its mean and standard deviation will be in units, and the variance will be in units squared. Similarly this will also be the case for further moments like kurtosis being in units to the fourth etc.

Anonymous No. 16109004

>>16108620
Could you plot something to prove your point?

Anonymous No. 16109021

>>16104928
undergrad pure stats and probability courses tend to be few, so take all of them. avoid "stats for X" courses since they just repeat freshers stats and lack substance and foundation.
you should take single and multivariable calculus and linear algebra but that's probably already mandatory.
personally i'd also take some applied math stuff like optimization, PDEs and numerical math.
if you still have credits left take databases and algorithms.
if you hate yourself take a signals course

Anonymous No. 16109351

>>16109004
It's not a plotting thing. It's a definitional thing.

Var(x) = E((x-E(x))^2)

If x is in units, E(x) is in units and Var(x) is in units^2.

Anonymous No. 16109357

>>16109351
ty, then I understand

Anonymous No. 16109373

>>16109021
Don't you have to take ODEs before PDEs? Also what is the use of PDEs for stats? Excuse my noob question but have no idea.

Anonymous No. 16109790

>>16109373
>Don't you have to take ODEs before PDEs?
yes
>Also what is the use of PDEs for stats?
they are not really necessary for stats but PDEs are like a superweapon for modelling reality. they are used in all sorts of things including economics.
this is hard stuff tho and it's only necessary if want to lean more towards applied mathematics. imo a god tier statistician is a god tier mathematician, hence why i thought it's relevant

Anonymous No. 16110194

>>16109790
>imo a god tier statistician is a god tier mathematician, hence why i thought it's relevant
I think you are right. I will check out the prerequisites for PDEs and see how I can get there.

Image not available

520x302

ghana.jpg

Anonymous No. 16110196

Anonymous No. 16110216

>>16110196
Really makes you think

Anonymous No. 16111161

What is your favourite PDE equation?

Anonymous No. 16111327

>>16109021
Signal processing and linear system theory are goated. Why are you saying he shouldn't take them? Pretty much all of modern time series statistics came from control theory and state estimation for signal processing.

Image not available

1280x720

medical errors wo....jpg

Anonymous No. 16111869

Anonymous No. 16112520

>>16111869
Nigs and cops less deadly than medical hoes dancing during the pandemic? Who would have thought.

Anonymous No. 16112536

>>16079710
Lmao based

Anonymous No. 16112799

>>16111161
Black Scholes

Anonymous No. 16113129

next /psg/ thread, please

Anonymous No. 16113152

Not sure if my question fits here but, are there any resources available to be better at reading stats on my predictive model?
Well in general anything that helps to increase the chance of my models reaching beyond 70% accuracy would help. (Anything that doesn't fuck over my thesis publication chances, even though the topic is the main culprit for it.)

Also, the gist of my topic is the Prediction of Labeled data that has a Binary Outcome.