Standard deviations
Aug. 23rd, 2006 01:50 pmThe main conclusion is to finish the random files issue.
TODO: Make a number of shuffling series of 10 shufflings each. Choose a word from the top of occurrences list. The means for that word should not differ more than 5% and the standard deviations shoiuld not differ more that 15%. It it is not true, increase a number of shuffles in each series to 15 and so forth.
TEN recognizes two different measures of standard deviation - the standard deviation of a series (he calls it an individual measurement dispersion) and a standard deviation of averages. This notation is new to me, although I do inderstand tbat given K smapling series, the standard deviation between means of those series will be very low. I just don't understand why we need such a parameter and when can it be used. TEN argues that sigma_averages = sigma_sample/sqrt(N), where N is a number of samples in each series.
TODO: Make a number of shuffling series of 10 shufflings each. Choose a word from the top of occurrences list. The means for that word should not differ more than 5% and the standard deviations shoiuld not differ more that 15%. It it is not true, increase a number of shuffles in each series to 15 and so forth.
TEN recognizes two different measures of standard deviation - the standard deviation of a series (he calls it an individual measurement dispersion) and a standard deviation of averages. This notation is new to me, although I do inderstand tbat given K smapling series, the standard deviation between means of those series will be very low. I just don't understand why we need such a parameter and when can it be used. TEN argues that sigma_averages = sigma_sample/sqrt(N), where N is a number of samples in each series.