How to backtest an EA properly.
Most retail backtests are useless. They show the strategy what to do, then ask whether it did it. Here's the exact process professional algorithmic traders use instead — and the metrics that actually matter.
Backtesting is the single most important step between buying an EA and risking real capital with it. Done right, it's the closest thing to a time machine: you can see how a strategy would have behaved across years of market history, in conditions you never lived through yourself.
Done wrong — which is how 90% of retail traders do it — it's worse than useless. A bad backtest doesn't just fail to predict the future; it actively gives you false confidence in strategies that will destroy your account.
A great backtest doesn't try to convince you the strategy will work. It tries to convince you it won't — and only fails because the strategy is genuinely robust. If you can't break your backtest, your backtest is broken.
Why most backtests are useless
Before getting into the right process, you need to understand why the default MetaTrader Strategy Tester results are typically meaningless.
1. The default tick model is fabricated
When you run a backtest with "Every Tick" mode using just downloaded broker data, MetaTrader interpolates tick movement between bar OHLC values. In other words, it invents the price movement inside each candle. Your EA sees a smooth, predictable sequence of prices that never existed in reality.
Real markets don't move like that. Real ticks have wicks, gaps, micro-spikes, liquidity holes — none of which the interpolated model captures. An EA that enters on "first tick to touch this price" can hit setups in backtests that would never have triggered live.
2. Spread is artificially clean
The default Strategy Tester uses a fixed spread — usually whatever your current broker shows. But real spreads widen during news, session opens, and low liquidity periods. Many strategies look profitable in backtest only because they don't account for the 8-pip spread on XAUUSD during the New York open or the 30-pip spread on BTC at 3am.
3. Slippage is assumed to be zero
Backtest fills are perfect: your order executes at the exact price you requested. Live, you'll experience slippage — sometimes positive, often negative, occasionally catastrophic during fast markets. Strategies with edge measured in fractions of an ATR can be killed by realistic slippage assumptions.
4. Optimization without out-of-sample validation
The biggest sin: traders run optimization across all available historical data, then quote those optimized results as "the backtest." This is mathematically guaranteed to overfit. You found the best parameters for that specific period — and they have no reason to keep working going forward.
The 6-step process for proper backtesting
Here's the exact sequence professional algorithmic traders use to validate a strategy before risking capital. None of these steps is optional. Skipping any of them means you're flying blind.
Step 1 — Get real tick data
This is the foundation. Without quality tick data, every subsequent step is garbage-in-garbage-out.
Option A — MetaTrader's built-in data
Open Tools → Options → Charts and set "Max bars in chart"
and "Max bars in history" to maximum. Then go to View → Symbols,
select your symbol, and click "Refresh" to download the broker's full available
history.
This data is convenient but limited: most retail brokers store only a few months of real tick data, then fall back to interpolated minute bars. For long backtests (5+ years), this isn't enough.
Option B — External tick data providers
For serious backtests, you'll want true tick data going back years. The standard sources:
- Dukascopy Historical Data Feed — free, decade-plus of tick data for major symbols. Format is JSON or CSV, requires conversion to MT5-compatible format.
- TickData Suite / Tick Data Manager — paid tools that import Dukascopy data directly into MT5. ~$60-100, worth it if you backtest frequently.
- HistData.com — free 1-minute bar data going back to 2000+, fine for lower-frequency strategies.
Aim for 99% modelling quality
When you run the Strategy Tester, the report shows a "Modelling Quality" percentage in the corner. Anything below 90% means your test is mostly fabricated data. You want 99%, which is only achievable with real imported tick data.
If you see "n/a" or "Modelling Quality: 25%" in your report — throw the backtest in the bin and start over with proper data. No metric from a low-quality backtest is trustworthy, no matter how good it looks.
Step 2 — Configure the Strategy Tester properly
Once you have good data, set up the test environment correctly.
Modelling mode
Always use "Every tick based on real ticks". Never use "OHLC M1" for serious validation — it skips the intra-minute price action where most stops and entries actually trigger.
Spread
Use the "Current" spread setting with your actual broker's typical values. Better: use a slightly higher fixed spread (e.g., if average is 2 pips, test with 3-4) to add conservative buffer.
Initial deposit
Use a realistic deposit — the same size you'd use live. Testing a strategy on $100,000 when you'll run it on $1,000 gives misleading results: minimum lot sizes, margin requirements, and percentage-based risk all behave differently.
Date range
Cover at least 5 years, including different market regimes: trending periods, ranging periods, crisis events. For a strategy you'll run on gold or indices, include 2020 (COVID), 2022 (rate-hike cycle), and 2024 (gold rally). If the strategy only worked in one regime, you want to find that out before live trading.
Step 3 — Interpret the metrics correctly
The Strategy Tester gives you a wall of numbers. Most don't matter. These do:
Profit Factor (PF)
Total profit divided by total loss. Above 1.2 is workable; above 1.5 is solid; above 2.0 is rare and genuinely good. Anything above 3-4 on a long backtest with hundreds of trades is suspicious — likely overfit or relying on a specific market condition.
Important nuance: a lower Profit Factor isn't automatically bad. Strategies with high trade frequency (thousands of trades) and clean expectancy can be excellent at PF 1.2-1.3 — what matters is whether the edge compounds reliably, not whether each individual trade is asymmetric. Always evaluate PF together with total trade count, Sharpe ratio, and drawdown.
Sharpe Ratio
Risk-adjusted return. Above 1.5 is good for retail; above 3.0 is excellent; above 5.0 should be scrutinized carefully. The trap: Sharpe is highly sensitive to the period tested. A strategy with Sharpe 5.0 on a 50-trade backtest tells you almost nothing.
Maximum Drawdown
The worst peak-to-trough loss in the backtest. This is the number that determines whether you'll psychologically survive the strategy live. Whatever the backtest shows, expect the real drawdown to be 1.5-2x worse. Most retail traders can't tolerate beyond 15-20% drawdown without panicking and disabling the EA mid-recovery.
Total trades
Statistical significance matters. Below 100 trades, any backtest is unreliable. Above 500 trades is much better. Above 1000 starts approaching solid statistical confidence. Strategies with great metrics over just 30-50 trades are almost always lucky, not skilful.
Recovery Factor
Net profit divided by maximum drawdown. Above 5 is excellent; above 10 is exceptional. Tells you how much profit you got per unit of pain endured — essentially the trade-off ratio that matters most for compounding.
No single metric is meaningful in isolation. A strategy with PF 1.4 but 5,000 trades and Sharpe 3.0 is far more trustworthy than one with PF 3.0 but 80 trades and Sharpe 8.0. Always evaluate metrics together, weighted by sample size.
Step 4 — The most important test: out-of-sample validation
This is the step that separates real strategies from curve-fitting exercises.
The basic principle
Split your historical data into two periods:
- In-sample period — used to develop, optimize, and tune the strategy. Typically the older 70-80% of your data.
- Out-of-sample period — kept completely separate. Used only at the end, to test whether the strategy still works on data it has never seen.
The test
Run the strategy on the out-of-sample period with the parameters chosen from in-sample. If performance degrades dramatically, the in-sample results were overfit. Mild degradation (20-30% lower Sharpe) is normal. Dramatic degradation (Sharpe 5.0 becomes Sharpe 0.5) means the strategy is fitting noise, not signal.
Walk-Forward Analysis (advanced)
An even more rigorous version: slide a window across your data, optimize on each window, test on the next, then move forward. This simulates how the strategy would have evolved over time if you'd been re-optimizing live. If walk-forward results are consistent, you have something real.
Step 5 — Demo before live
Even after a clean backtest with out-of-sample validation, the next step is not live capital. It's demo testing.
Why demo matters
Backtests, even with perfect tick data, don't fully capture:
- Broker-specific execution — your specific broker's spread behavior, requotes, dealer interference
- Server-time mismatches — many strategies depend on specific hours; if your broker's server time differs from what the backtest assumed, results will differ
- Connectivity issues — real trading involves disconnects, missed candles, partial fills
- Market microstructure — actual order book behavior that interpolation can't simulate
How long
Minimum 4 weeks on demo before committing live capital. If the strategy is low-frequency (1-2 trades per week), extend to 8-12 weeks to get a meaningful sample. Compare demo results to the backtest on the same period — they should be reasonably close. If they diverge wildly, investigate before going live.
Step 6 — Live with minimum risk first
The final step in true validation isn't demo — it's a small live account.
Demo accounts have one structural difference from live: brokers have no incentive to cheat your demo. Spreads behave perfectly. Fills are instant. On live, you may encounter slightly worse execution, requoting, or other friction that doesn't show in demo. Run on a small live account (e.g., 25% of your intended deployment size) for at least 4 weeks before scaling up.
Red flags that should stop you instantly
Some patterns in EA backtests are reliable warnings. If you see any of these, walk away regardless of how good the headline numbers look:
The quick checklist
Before you ever risk real capital on an EA, every box below should be checked. If you can't tick any of them, the strategy isn't ready.
The Backtest Validation Checklist
- Modelling quality is 99% (real tick data, not interpolated)
- Test period spans at least 5 years with multiple market regimes
- Total trade count is above 500 (preferably 1000+)
- Profit Factor is appropriate for the strategy type (PF 1.2+ for high-frequency, 1.5+ for low-frequency)
- Maximum Drawdown is something you can psychologically tolerate live
- Out-of-sample period was tested and results held up
- Spread used in backtest matches or exceeds your broker's typical spread
- No hidden martingale, grid, or recovery logic in the trades
- 4+ weeks of demo testing completed with results comparable to backtest
- Initial live deployment is sized small (25% of target) for first month
Final thought
Backtesting properly takes days, not minutes. Most retail traders skip the work and rely on whatever the EA seller shows on the product page. That's why most retail traders lose money on EAs.
The few traders who actually compound capital with algorithmic systems treat backtesting like an exercise in disproving a strategy. They look harder for what could go wrong than for what could go right. They run their own out-of-sample tests. They sit on demo for weeks. They start live with 25% of intended size. And only after all of that do they scale up.
It's slow. It's boring. It's the difference between trading and gambling.
Past performance does not guarantee future results. Even a rigorously validated strategy can fail going forward due to regime change, broker issues, or unforeseen market events. Backtesting is necessary but not sufficient. Always trade with capital you can afford to lose.