Please rotate your device to landscape mode to view the charts.

Background and Context

What are Nonstandard Errors?

Researchers analyzing the same data may choose different analysis paths, creating additional uncertainty beyond traditional standard errors.

The #fincap Experiment

164 research teams independently tested the same six hypotheses on the same financial market data spanning 17 years.

Experimental Design

Teams wrote short papers evaluated by 34 peer reviewers, with multiple feedback stages to test if peer review reduces dispersion.

Substantial Dispersion in Estimates Across Researchers

  • Nonstandard errors (NSEs) measured as interquartile range of estimates were substantial across all six research hypotheses.
  • Even for seemingly straightforward calculations like client volume share, NSEs were sizable at 1.2%.
  • Abstract concepts like market efficiency (6.7%) and gross trading revenue (21.4%) showed the largest dispersion.

The Evidence Generation Process Creating Two Types of Uncertainty

Population Sample Evidence Data-Generating Process Evidence-Generating Process Standard Error Nonstandard Error
  • Standard errors arise from the data-generating process when samples are drawn from populations.
  • Nonstandard errors emerge from the evidence-generating process as researchers make different analytical choices.
  • For market efficiency, nonstandard errors (6.7%) were nearly twice as large as standard errors (3.4%).

Higher Quality Associated with Smaller Nonstandard Errors

  • Higher reproducibility significantly reduced NSEs by 25.0%, showing the value of robust, replicable methods.
  • Papers with higher peer ratings had 33.3% smaller NSEs, confirming quality correlates with consistency.
  • Surprisingly, higher team quality (publications, experience) slightly increased NSEs by 2.8%, suggesting diverse approaches by experts.

Peer Feedback Substantially Reduces Dispersion in Estimates

  • Across four feedback stages, the interquartile range decreased by 47.2%, showing clear convergence in estimates.
  • The interdecile range (capturing more extreme values) decreased even more dramatically by 68.2%.
  • Each individual feedback stage contributed to reducing dispersion, suggesting cumulative benefits of peer review.

Researchers Significantly Underestimate the Dispersion in Results

Researchers' Prediction Actual Dispersion 28.3% 100%
  • Researchers predicted only 28.3% of the actual dispersion in an incentivized belief survey.
  • This 71.7% underestimation explains why nonstandard errors have received little attention in the past.
  • Even when looking at trimmed samples that remove outliers, researchers still significantly underestimated variation.

Contribution and Implications

  • Nonstandard errors are substantial and should be considered alongside standard errors when evaluating empirical research results.
  • Peer feedback significantly reduces dispersion, highlighting the importance of peer review processes before publication.
  • Higher reproducibility correlates with lower dispersion, suggesting that emphasizing reproducible methods improves consistency.
  • Key analysis decisions like model choice and sampling frequency substantially impact results and should be carefully considered.
  • The threshold for statistical significance should account for multiple testing, with a suggested value of 2.9-3.0.

Data Sources

  • Chart 1 (NSEs Across Hypotheses) uses data from Table I, Panel C showing interquartile ranges across six research hypotheses.
  • Chart 3 (Quality Variables) uses data from Table III and Figure 2 showing how quality variables relate to dispersion.
  • Chart 4 (Peer Feedback) uses data from Table IV and Figure 3 demonstrating reduction in dispersion across stages.
  • SVG 1 (Evidence Process) illustrates the conceptual framework presented in the introduction of the paper.
  • SVG 2 (Researcher Beliefs) represents findings from Table IA.VI in the Internet Appendix about researchers' predictions.