Article · Methodology 11 May 2026 ~15 min read

How to backtest an EA properly.

Most retail backtests are useless. They show the strategy what to do, then ask whether it did it. Here's the exact process professional algorithmic traders use instead — and the metrics that actually matter.

Backtesting is the single most important step between buying an EA and risking real capital with it. Done right, it's the closest thing to a time machine: you can see how a strategy would have behaved across years of market history, in conditions you never lived through yourself.

Done wrong — which is how 90% of retail traders do it — it's worse than useless. A bad backtest doesn't just fail to predict the future; it actively gives you false confidence in strategies that will destroy your account.

The Core Insight

A great backtest doesn't try to convince you the strategy will work. It tries to convince you it won't — and only fails because the strategy is genuinely robust. If you can't break your backtest, your backtest is broken.

Why most backtests are useless

Before getting into the right process, you need to understand why the default MetaTrader Strategy Tester results are typically meaningless.

1. The default tick model is fabricated

When you run a backtest with "Every Tick" mode using just downloaded broker data, MetaTrader interpolates tick movement between bar OHLC values. In other words, it invents the price movement inside each candle. Your EA sees a smooth, predictable sequence of prices that never existed in reality.

Real markets don't move like that. Real ticks have wicks, gaps, micro-spikes, liquidity holes — none of which the interpolated model captures. An EA that enters on "first tick to touch this price" can hit setups in backtests that would never have triggered live.

2. Spread is artificially clean

The default Strategy Tester uses a fixed spread — usually whatever your current broker shows. But real spreads widen during news, session opens, and low liquidity periods. Many strategies look profitable in backtest only because they don't account for the 8-pip spread on XAUUSD during the New York open or the 30-pip spread on BTC at 3am.

3. Slippage is assumed to be zero

Backtest fills are perfect: your order executes at the exact price you requested. Live, you'll experience slippage — sometimes positive, often negative, occasionally catastrophic during fast markets. Strategies with edge measured in fractions of an ATR can be killed by realistic slippage assumptions.

4. Optimization without out-of-sample validation

The biggest sin: traders run optimization across all available historical data, then quote those optimized results as "the backtest." This is mathematically guaranteed to overfit. You found the best parameters for that specific period — and they have no reason to keep working going forward.

The 6-step process for proper backtesting

Here's the exact sequence professional algorithmic traders use to validate a strategy before risking capital. None of these steps is optional. Skipping any of them means you're flying blind.

Step 1 — Get real tick data

This is the foundation. Without quality tick data, every subsequent step is garbage-in-garbage-out.

Option A — MetaTrader's built-in data

Open Tools → Options → Charts and set "Max bars in chart" and "Max bars in history" to maximum. Then go to View → Symbols, select your symbol, and click "Refresh" to download the broker's full available history.

This data is convenient but limited: most retail brokers store only a few months of real tick data, then fall back to interpolated minute bars. For long backtests (5+ years), this isn't enough.

Option B — External tick data providers

For serious backtests, you'll want true tick data going back years. The standard sources:

Dukascopy Historical Data Feed — free, decade-plus of tick data for major symbols. Format is JSON or CSV, requires conversion to MT5-compatible format.
TickData Suite / Tick Data Manager — paid tools that import Dukascopy data directly into MT5. ~$60-100, worth it if you backtest frequently.
HistData.com — free 1-minute bar data going back to 2000+, fine for lower-frequency strategies.

Aim for 99% modelling quality

When you run the Strategy Tester, the report shows a "Modelling Quality" percentage in the corner. Anything below 90% means your test is mostly fabricated data. You want 99%, which is only achievable with real imported tick data.

Critical Reality Check

If you see "n/a" or "Modelling Quality: 25%" in your report — throw the backtest in the bin and start over with proper data. No metric from a low-quality backtest is trustworthy, no matter how good it looks.

Step 2 — Configure the Strategy Tester properly

Once you have good data, set up the test environment correctly.

Modelling mode

Always use "Every tick based on real ticks". Never use "OHLC M1" for serious validation — it skips the intra-minute price action where most stops and entries actually trigger.

Spread

Use the "Current" spread setting with your actual broker's typical values. Better: use a slightly higher fixed spread (e.g., if average is 2 pips, test with 3-4) to add conservative buffer.

Initial deposit

Use a realistic deposit — the same size you'd use live. Testing a strategy on $100,000 when you'll run it on $1,000 gives misleading results: minimum lot sizes, margin requirements, and percentage-based risk all behave differently.

Date range

Cover at least 5 years, including different market regimes: trending periods, ranging periods, crisis events. For a strategy you'll run on gold or indices, include 2020 (COVID), 2022 (rate-hike cycle), and 2024 (gold rally). If the strategy only worked in one regime, you want to find that out before live trading.

Step 3 — Interpret the metrics correctly

The Strategy Tester gives you a wall of numbers. Most don't matter. These do:

Profit Factor (PF)

Total profit divided by total loss. Above 1.2 is workable; above 1.5 is solid; above 2.0 is rare and genuinely good. Anything above 3-4 on a long backtest with hundreds of trades is suspicious — likely overfit or relying on a specific market condition.

Important nuance: a lower Profit Factor isn't automatically bad. Strategies with high trade frequency (thousands of trades) and clean expectancy can be excellent at PF 1.2-1.3 — what matters is whether the edge compounds reliably, not whether each individual trade is asymmetric. Always evaluate PF together with total trade count, Sharpe ratio, and drawdown.

Sharpe Ratio

Risk-adjusted return. Above 1.5 is good for retail; above 3.0 is excellent; above 5.0 should be scrutinized carefully. The trap: Sharpe is highly sensitive to the period tested. A strategy with Sharpe 5.0 on a 50-trade backtest tells you almost nothing.

Maximum Drawdown

The worst peak-to-trough loss in the backtest. This is the number that determines whether you'll psychologically survive the strategy live. Whatever the backtest shows, expect the real drawdown to be 1.5-2x worse. Most retail traders can't tolerate beyond 15-20% drawdown without panicking and disabling the EA mid-recovery.

Total trades

Statistical significance matters. Below 100 trades, any backtest is unreliable. Above 500 trades is much better. Above 1000 starts approaching solid statistical confidence. Strategies with great metrics over just 30-50 trades are almost always lucky, not skilful.

Recovery Factor

Net profit divided by maximum drawdown. Above 5 is excellent; above 10 is exceptional. Tells you how much profit you got per unit of pain endured — essentially the trade-off ratio that matters most for compounding.

Metrics In Context

No single metric is meaningful in isolation. A strategy with PF 1.4 but 5,000 trades and Sharpe 3.0 is far more trustworthy than one with PF 3.0 but 80 trades and Sharpe 8.0. Always evaluate metrics together, weighted by sample size.

Step 4 — The most important test: out-of-sample validation

This is the step that separates real strategies from curve-fitting exercises.

The basic principle

Split your historical data into two periods:

In-sample period — used to develop, optimize, and tune the strategy. Typically the older 70-80% of your data.
Out-of-sample period — kept completely separate. Used only at the end, to test whether the strategy still works on data it has never seen.

The test

Run the strategy on the out-of-sample period with the parameters chosen from in-sample. If performance degrades dramatically, the in-sample results were overfit. Mild degradation (20-30% lower Sharpe) is normal. Dramatic degradation (Sharpe 5.0 becomes Sharpe 0.5) means the strategy is fitting noise, not signal.

Walk-Forward Analysis (advanced)

An even more rigorous version: slide a window across your data, optimize on each window, test on the next, then move forward. This simulates how the strategy would have evolved over time if you'd been re-optimizing live. If walk-forward results are consistent, you have something real.

Step 5 — Demo before live

Even after a clean backtest with out-of-sample validation, the next step is not live capital. It's demo testing.

Why demo matters

Backtests, even with perfect tick data, don't fully capture:

Broker-specific execution — your specific broker's spread behavior, requotes, dealer interference
Server-time mismatches — many strategies depend on specific hours; if your broker's server time differs from what the backtest assumed, results will differ
Connectivity issues — real trading involves disconnects, missed candles, partial fills
Market microstructure — actual order book behavior that interpolation can't simulate

How long

Minimum 4 weeks on demo before committing live capital. If the strategy is low-frequency (1-2 trades per week), extend to 8-12 weeks to get a meaningful sample. Compare demo results to the backtest on the same period — they should be reasonably close. If they diverge wildly, investigate before going live.

Step 6 — Live with minimum risk first

The final step in true validation isn't demo — it's a small live account.

Demo accounts have one structural difference from live: brokers have no incentive to cheat your demo. Spreads behave perfectly. Fills are instant. On live, you may encounter slightly worse execution, requoting, or other friction that doesn't show in demo. Run on a small live account (e.g., 25% of your intended deployment size) for at least 4 weeks before scaling up.

Red flags that should stop you instantly

Some patterns in EA backtests are reliable warnings. If you see any of these, walk away regardless of how good the headline numbers look:

⚠

Win rate above 85% Almost always indicates martingale, grid, or hidden recovery logic. Real edges produce 40-65% win rates with positive expectancy from R:R, not from hit rate.

⚠

No visible stop loss in trades If positions don't have hard stop losses, the EA is either martingaling, gridding, or hoping for reversion. None of these are sustainable on real capital.

⚠

"100% winning months" claims Mathematically impossible for any strategy with meaningful edge. Either the backtest is selectively reported or the strategy has catastrophic risk hidden somewhere.

⚠

Modelling quality below 90% Backtest data is mostly fabricated. No metric is trustworthy. Demand 99% quality tests before considering any EA seriously.

⚠

Sharpe ratio above 10 Real strategies rarely exceed Sharpe 5. Anything in the 10-20+ range is almost certainly curve-fit, has tiny sample size, or hides leverage that will eventually blow up.

⚠

No out-of-sample test shown If the seller can't show you that the strategy works on data it wasn't optimized for, they either didn't test it or it failed. Either way, don't buy.

⚠

Trade frequency that's wildly inconsistent An EA that takes 200 trades in 2021 and 12 trades in 2022 is probably curve-fit to specific market behavior. Real strategies show roughly consistent activity across regimes.

The quick checklist

Before you ever risk real capital on an EA, every box below should be checked. If you can't tick any of them, the strategy isn't ready.

The Backtest Validation Checklist

Modelling quality is 99% (real tick data, not interpolated)
Test period spans at least 5 years with multiple market regimes
Total trade count is above 500 (preferably 1000+)
Profit Factor is appropriate for the strategy type (PF 1.2+ for high-frequency, 1.5+ for low-frequency)
Maximum Drawdown is something you can psychologically tolerate live
Out-of-sample period was tested and results held up
Spread used in backtest matches or exceeds your broker's typical spread
No hidden martingale, grid, or recovery logic in the trades
4+ weeks of demo testing completed with results comparable to backtest
Initial live deployment is sized small (25% of target) for first month

Final thought

Backtesting properly takes days, not minutes. Most retail traders skip the work and rely on whatever the EA seller shows on the product page. That's why most retail traders lose money on EAs.

The few traders who actually compound capital with algorithmic systems treat backtesting like an exercise in disproving a strategy. They look harder for what could go wrong than for what could go right. They run their own out-of-sample tests. They sit on demo for weeks. They start live with 25% of intended size. And only after all of that do they scale up.

It's slow. It's boring. It's the difference between trading and gambling.

Important Disclaimer

Past performance does not guarantee future results. Even a rigorously validated strategy can fail going forward due to regime change, broker issues, or unforeseen market events. Backtesting is necessary but not sufficient. Always trade with capital you can afford to lose.

Author

João Jara — Jara Trading

Back to Jara Trading