Originally posted by harvib
I am trying to understand as I am not a statistician. Doesn't a statement like the one above mean isolating causation? If so how was this done? What
assumptions must be accurate in order for the model to be accurate? What if a variable is dependent on another variable (i.e. unemployment vs.
likelihood of insurance).
You can say the same for
all experimental approaches. For example, I might use what you think is a true experiment, and assess the effect of
some IV on some DV, physically controlling a number of other variables (i.e., what kiddies know as 'fair tests'). But it is impossible to control
for all variables. That would, firstly, expect omniscience and, secondly, also make experimental methods impossible, lol. So we pick the most
important and relevant.
Claims like 'correlation does not imply causation' (although I always add - but can be suggestive) applies to simplistic bivariate models
(pearson-product, spearman etc). Where we might show that x is related to y, but have no information on the impact of a wealth of other important
variables. Moreover, these sort of bivariate models possess no IV or DV.
So if we find that paranoia is related to time on ATS, we don't know whether time on ATS causes paranoia, or whether paranoia causes people to spend
time on websites like ATS. We might also have a third (4th, 5th, 6th etc) unknown variable that could influence either paranoia or time on ATS.
Multivariate regression models do contain IV and DV. But the cox model is actually calculating a probability of an event rather than whether x has a
signifcant effect on y (or for normal regression, how much variance of the DV can be explained by a factor). And for the example you gave (employment
vs. insurance status), that is controlled for - they are attempting to isolate particular variables by controlling for the effects of the other
variables (which they show in Table 2).
All stats include numerous assumptions - which are tested during analysis. In fact, there will be several for the Cox model - if I say, the strata
should show proportional hazards, which is assessed by log-log analysis - does that mean anything to you? As noted, the assumptions are readily tested
during analysis. The same applies here to all analysis, from basic t-tests to MANOVA to mixed effects models.
If you're looking for holes in the study, I can easily give you an obvious one - the 95% CI is actually 6% - 84%. Which does cover a previous study
in 1993 which showed 25% increased mortality, though. And the limitations outline a number of potential issues. Forget about the stats. The cox model
is pretty good. Indeed, most stats are. But they are only as good as the data used. GIGO, basically.
Also if I am reading this correctly being unemployed also leads to a 40% increased probability of death, being black leads to a 32% increase,
being male leads to a 37% increase. The model most certainly can't lead one to reasonably believe that those variables are the direct cause of death
as the article infers about being uninsured. Can it?
Wut? It's a probability analysis. So being male compared to female leads to a 37% increased probability of mortality (before 65). Do you doubt such
numbers? Considering the raw data shows 2.6% death in females and 3.6% death in males, not a big surprise - the difference in the cox model is that it
takes into account a large number of other relevant factors.
So, take the 37%. If you are male it would say you have a 37% increased likelihood of death before 65. A more illustrative example would be to say
that for every 100 females that die before 65, 137 males would. Do you find that surprising? Thought it was well-known.
You might then ask the obvious question for male mortality - why? Would include higher risk-taking behaviour, IMO.
And the same applies for lack of insurance - why? Would include lack of preventative treatment, IMO.
What it won't include is the wealth of other variables already controlled for (e.g., your example of employment vs. insurance status).
Edit to add: I hope I am not coming across as antagonistic or argumentative, I am genuinely trying to understand. I have just always been under
the impression that for statistics to accurately predict causation that certain accurate assumptions must be made. However I am ready to stand
corrected.
No worries.
For the reliable implication of causation, we need to use methods that control for other variables in an IV/DV type experiment. Which the cox model
does.
I just think you're pissing in the wind on the stats issue. It is a data model and not reality, but if you just want to take all stats out doesn't
leave science much more than subjective art. Look at the study itself for its weaknesses, which are clear enough.
[edit on 20-9-2009 by melatonin]