How to determine number of trials in an experiment?
Answers
Answer:
Whenever we design a new experiment, we have to specify how many times each participant should repeat each condition. But how do we decide this? I think most researchers base this decision on things like the amount of time available, what they did in their last study, and what seems ‘about right’. We all know that running more trials gets us ‘better’ data, but hey, you’ve got to be pragmatic as well right? Nobody would expect their participants to do an experiment lasting hours and hours (except psychophysicists…).
I started thinking about this a few months ago, and discovered that the number of trials has a surprisingly direct effect on the statistical power of a study design. Power is the probability that a study design will be able to detect an effect of a particular size. Most people know that, for a given effect size, power increases as a function of sample size (see the figure below). But it turns out that under certain conditions power can also depend on the number of trials each participant completes.
I suspect the above plots will be rather surprising to a lot of people. They make explicit the vague heuristic that ‘more trials is better’ by showing how data quality has a direct effect on statistical power. Most a priori power analyses assume that effect size is constant, because it is invariant to sample size (though the accuracy with which effect size is measured increases with N). For this reason, power calculations typically optimise the sample size (N), and ignore the number of trials. But we could perform complementary calculations, where we assume a fixed sample size, and manipulate the number of trials to achieve a desired level of power. Or, more realistically, since both N and k are degrees of freedom available to the experimenter, we should consider them together when designing a study.
This is the aim of a recent paper (preprint) which proposes to represent statistical power as the joint function of sample size (N) and number of trials (k). The two-dimensional ‘power contour’ plots below are hypothetical examples for different values of within- and between-participant standard deviations. In the left panel, the within-participant standard deviation is negligible, and increasing the number of trials does not affect power. The vertical lines are iso-power contours – combinations of values which produce the same level of power. It’s clear for the left example that power is invariant with k. However, in the right hand panel, the within-participant standard deviation is large, and the power contours become curved. Now there are many combinations of N and k that will provide 80% power (thick blue line). In principle any of these combinations might constitute a valid study design, and experimenters can choose a combination based on other constraints, such as the time available for testing each participant, or how easy it is to recruit from the desired sample.
Of course, for this method to be useful, we need to check that the power contours look more like the second plot above than the first. And we also should have some idea about the likely within- and between-participant standard error for the technique we plan to use. To this end, we reanalysed 8 existing data sets for a range of widely used methods, including reaction times, sensory thresholds, EEG, MEG, and fMRI. In all cases it turned out that the within-participant variance was greater than the between-participants variance, and power contours (generated by repeatedly subsampling the data) had the expected shape.
Consideration of the power contour plot is instructive when thinking about different experimental traditions. In some sub-disciplines it is commonplace to test large numbers of participants on relatively few trials, occupying the region in the lower right hand corner of the space. Other experimental traditions (for example psychophysics) go to the other extreme – a small number of participants complete very large numbers of trials each. Both approaches have their advantages and disadvantages, but it is clear that under reasonable assumptions, high statistical power can be achieved.
Overall, we think that calculating power contours offers a useful framework for thinking in more detail about the design of future studies. Of course, as with any type of power analysis, the outcome depends on the assumptions that we make about the likely difference between means, and the variances involved. We will never know these values for sure and can only estimate them once we have conducted a study. However the values in the paper are plausible, and we have made all scripts and data available so that others can see how to conduct similar analyses on their own data.
PLEASE MARK IT AS BRAINLIEST AND FOLLOW ME.