Please rotate your device to landscape mode to view the charts.

Nonstandard errors

Journal: Journal of Finance

Date: 2024

Author: Albert J. Menkveld, Anna Dreber, Felix Holzmeister, Juergen Huber, Magnus Johannesson, Michael Kirchler, Sebastian Neusüß, Michael Razen, Utz Weitzel, Fincap Team, Fearghal Kearney, Tony Klein, Liangyi Mu

Abstract:
In statistics, samples are drawn from a population in a data-generating process (DGP). Standard errors measure the uncertainty in estimates of population parameters. In science, evidence is generated to test hypotheses in an evidence-generating process (EGP). We claim that EGP variation across researchers adds uncertainty—nonstandard errors (NSEs). We study NSEs by letting 164 teams test the same hypotheses on the same data. NSEs turn out to be sizable, but smaller for more reproducible or higher rated research. Adding peer-review stages reduces NSEs. We further find that this type of uncertainty is underestimated by participants.

Link: Google Scholar

Background and Context

What are Nonstandard Errors?

Researchers analyzing the same data may choose different analysis paths, creating additional uncertainty beyond traditional standard errors.

The #fincap Experiment

164 research teams independently tested the same six hypotheses on the same financial market data spanning 17 years.

Experimental Design

Teams wrote short papers evaluated by 34 peer reviewers, with multiple feedback stages to test if peer review reduces dispersion.

Substantial Dispersion in Estimates Across Researchers

Nonstandard errors (NSEs) measured as interquartile range of estimates were substantial across all six research hypotheses.
Even for seemingly straightforward calculations like client volume share, NSEs were sizable at 1.2%.
Abstract concepts like market efficiency (6.7%) and gross trading revenue (21.4%) showed the largest dispersion.

The Evidence Generation Process Creating Two Types of Uncertainty

Standard errors arise from the data-generating process when samples are drawn from populations.
Nonstandard errors emerge from the evidence-generating process as researchers make different analytical choices.
For market efficiency, nonstandard errors (6.7%) were nearly twice as large as standard errors (3.4%).

Higher Quality Associated with Smaller Nonstandard Errors

Higher reproducibility significantly reduced NSEs by 25.0%, showing the value of robust, replicable methods.
Papers with higher peer ratings had 33.3% smaller NSEs, confirming quality correlates with consistency.
Surprisingly, higher team quality (publications, experience) slightly increased NSEs by 2.8%, suggesting diverse approaches by experts.

Peer Feedback Substantially Reduces Dispersion in Estimates

Across four feedback stages, the interquartile range decreased by 47.2%, showing clear convergence in estimates.
The interdecile range (capturing more extreme values) decreased even more dramatically by 68.2%.
Each individual feedback stage contributed to reducing dispersion, suggesting cumulative benefits of peer review.

Researchers Significantly Underestimate the Dispersion in Results

Researchers predicted only 28.3% of the actual dispersion in an incentivized belief survey.
This 71.7% underestimation explains why nonstandard errors have received little attention in the past.
Even when looking at trimmed samples that remove outliers, researchers still significantly underestimated variation.

Contribution and Implications

Nonstandard errors are substantial and should be considered alongside standard errors when evaluating empirical research results.
Peer feedback significantly reduces dispersion, highlighting the importance of peer review processes before publication.
Higher reproducibility correlates with lower dispersion, suggesting that emphasizing reproducible methods improves consistency.
Key analysis decisions like model choice and sampling frequency substantially impact results and should be carefully considered.
The threshold for statistical significance should account for multiple testing, with a suggested value of 2.9-3.0.

Data Sources

Chart 1 (NSEs Across Hypotheses) uses data from Table I, Panel C showing interquartile ranges across six research hypotheses.
Chart 3 (Quality Variables) uses data from Table III and Figure 2 showing how quality variables relate to dispersion.
Chart 4 (Peer Feedback) uses data from Table IV and Figure 3 demonstrating reduction in dispersion across stages.
SVG 1 (Evidence Process) illustrates the conceptual framework presented in the introduction of the paper.
SVG 2 (Researcher Beliefs) represents findings from Table IA.VI in the Internet Appendix about researchers' predictions.