price() clarification

price() clarification - 12/18/13 15:38

price()

Quote:

Returns the mean price of a bar, i.e. the average of all ticks the currently selected asset that fall into the bar period. This is normally the preferred price for indicators because it is less susceptible to random fluctuations and makes systems more robust and independent on the data feed.

In back testing:

1) on 1 min bars does price() return (high+low+open+close)/4 ?

2) on higher BarPeriods does it average each tick (i.e. each 1 min high low open close) or average a stored mean value of 1 min prices?

3) if it averages stored mean values, does it always use all available 1 min values or does it average say 30 min mean prices when calculating a 60 min mean price?

4) How does it handle higher Barperiods that may span many minutes where there are no 1 min data points at all, e.g. at weekends, or times of low activity where no ticks are received during a 1 min period and there is no HLOC?

5) Does use of TimeFrame change any of the answers above, or does price() return exactly the same values when using Timeframe as the equivalent Barperiod? i.e. does barperiod 60; price() return exactly the same values as barperiod 5; timeframe 12; price()?

In live trading, I assume this is much simpler in that zorro runs this averaging for each actual tick received with a timestamp that falls inside the Barperiod / Timeframe in question. i.e. sum of prices of all ticks/no. ticks received

Posted By: jcl

Re: price() clarification - 12/18/13 17:42

It's always the average of (high+low)/2 of all available ticks inside the bar period. 1-min bars can have 1 or more ticks, dependent on simulation or trade mode. TimeFrame just averages over all bar periods inside the time frame.

Posted By: swingtraderkk

Re: price() clarification - 12/19/13 13:03

Thanks jcl,

That helps but I'm still a little confused on terminology.

1) Average price to me from other platforms implied:
(high+low+open+close)/4
while (high+low)/2 was something I'd previously called a mid price.
Can this be clarified in the manual?

2) For backtesting when we use price() and our price data is the provided 1 minute tick structs with a HLOC, on barperiod = 1 then price() returns (high+low)/2 which is in my mind a mid price? Is this correct?

3) What happens when there is a gap i.e. no tick struct for a minute or minutes available at all?
Does zorro execute run() for a bar with no tick structs?
What does zorro plot? a gap?
What does zorro return when asked for price() for a specific minute it does not have a tick struct for, does it return null, an error or simply price() for the previous or next available tick struct?
How do indicators react to these gaps?

4) Still in backtesting, if barperiod is higher than 1 e.g. 5, then can you confirm that price() returns an average of the 5 x 1 min tick structs i.e.
((Min 1 :(high+low)/2) + (Min 2 :(high+low)/2) .... + (Min 5 :(high+low)/2))/5
This to me is close to a true average and provides superior smoothing to previous definitions of average such as (high+low+open+close)/4.
Can you also confirm that if that 5 min period in time contains more or less than five tick structs then the denominator is the actual number of tick structs available for that 5 minute period and the numerator is the sum of the midpoints (high+low)/2) for those tick structs?

5) In live trading, can I assume that the ticks received by zorro are single close prices i.e. there is no High Low or Open, so (high+low)/2) is actually close, therefore even at Barperiod = 1 in live trading price() returns a true average of all ticks received in that minute?

6) Does zorro then store each tick received as a tick struct back to the bar file? or does it store that minute's data summarised as a single HLOC in the Bar file?

7) What does zorro return for price() for a period in live trading where no ticks are received?

Thanks again for any help you can give.

Posted By: jcl

Re: price() clarification - 12/19/13 15:21

1) I don't know what (high+low+open+close)/4 is called, but it is definitely not an average.

2) yes

3) price curves have no gaps; in backtests missing data is just skipped on the time axis.

4) yes

5) yes

6) it stores HLOC only

7) the last received average price.

Posted By: swingtraderkk

Re: price() clarification - 12/19/13 16:25

great thanks jcl, cleared a lot of things up for me.

only a few left:

a) Does zorro execute run() for a bar with no tick structs in backtest or is it skipped?
b) What does zorro return when asked for price() for a specific minute it does not have a tick struct for, does it return null, an error or simply price() for the previous or next available tick struct? e.g if I try to use timeoffset to get an opening or closing price of an exchange what happens if there is no tick struct for that bar?
c) Do indicators skip bars with no tick structs or use last price received?

Posted By: jcl

Re: price() clarification - 12/19/13 16:56

Any bar has always a price. There is no point in the price curve with no price. Any gaps, such as weekends, are only visible on the time axis, not in the price curve.

Posted By: pipclown

Re: price() clarification - 01/03/14 00:38

I'll borrow this thread as well for some other questions regarding priceXXX()

When backtesting on 1-minute data, I observe some strange behavior where priceOpen() almost always equals priceHigh(). Same for priceClose() == priceLow().

However, when you look at the structs stored in the .bar history file you can clearly see much more different values for open, high, low and close.
Its as if priceOpen() and priceHigh() are identical for BarPeriod = 1 when backtesting.

Try this script:

Code:

int numBars = 0;
int numOpenHighNonMatching = 0;
int numCloseLowNonMatching = 0;

function run() {
	StartDate = 20070101;
	EndDate = 20070630;
	LookBack = 0;
	BarPeriod = 1;
	asset("EUR/USD");
	
	if (is(INITRUN)) {
		numBars = 0;
		numOpenHighNonMatching = 0;
		numCloseLowNonMatching = 0;
	}

	numBars++;
	
	if (priceOpen() != priceHigh()) {
		numOpenHighNonMatching++;
	}
	
	if (priceClose() != priceLow()) {
		numCloseLowNonMatching++;
	}
	
	
	if (is(EXITRUN)) {
		printf("\nNum bars received: %d\n", numBars - 1);
		printf("\nTotal non-matching: %d", numOpenHighNonMatching + numCloseLowNonMatching);
	}
}

For me that script outputs,

Code:

BackTest: bar_export EUR/USD 2007
Num bars received: 151067
Total non-matching: 43

Looking at the raw .bar files, I would expect non-matching to be much higher, even for BarPeriod = 1.
Naturally, turning up BarPeriod to say 60 or higher shows more normal and different values.

Do Zorro handle prices at BarPeriod = 1 differently compared to higher periods?

I also noticed that bars at the beginning of the year (those last in .bar file) are sometimes skipped for some reason?

Maybe I've just misunderstood something, but I've been scratching my head over this one for over 2 hours

Thanks!

Posted By: jcl

Re: price() clarification - 01/03/14 07:59

No, BarPeriod 1 is not handled differently. It's however possble that the price functions return other values as in raw data, due to sampling. Bars are aligned, raw ticks aren't. For instance, if you have a price tick in raw data at 08:00:30 and the next one at 08:01:30, the 08:01:00..08:02:00 bar is sampled from 2 ticks.

There are also no bars skipped AFAIK. Maybe you mean the lookback period?

Posted By: Anonymous

Re: price() clarification - 01/03/14 10:13

Good detective work pipclown!

May I suggest that this gets investigated deeper, jcl, because this looks like a serious problem to me.

Also I'm now completely confused with all this ticks/bars terminology. So far I thought that Zorro's history files contained only 1 minute OHLC bars. If that's right, then the above script reveals that something is really not OK with the history data.

And I also thought that using the word 'tick' in Zorro is just play of words. The real tick in forex represents a price change. So every time bid and/or ask changes and there is a transaction, the tick records new (bid, ask) values plus the exact time when it happened. That translates to tens of ticks for every 1 min bar nowadays. I just don't see how all those ticks compressed could translate to most of the 1min bars with no wicks (as demonstrated by pipclown).

I might be wrong, and I'll soon run the tests to prove or disprove my theory, but something's very fishy here...

Posted By: Anonymous

Re: price() clarification - 01/03/14 10:22

OK, to add to my previous post, the only possible reason that most of those 1min bars could have been wickless is because it was 2007. and the market was moving much slower then. Pending further analysis...

Posted By: jcl

Re: price() clarification - 01/03/14 12:48

I've just asked and can confirm that in the included historical data files, most ticks are minute aligned. So normally one tick is indeed identical to a 1-minute bar, other than in my example above.

To avoid confusion: "Ticks" here mean the raw data as received from the price server, consisting of HLOC prices and a time stamp. If a tick contains only a single price quote, H L O C are identical. But normally they have wicks and look like a normal candle.

So, all is fine, at least I do not know of a mystery about ticks and bars. You can easily see the raw ticks when you plot a chart of 1-minute bars:

Posted By: Anonymous

Re: price() clarification - 01/03/14 13:25

By "I've asked" I presume you asked FXCM? That explanation is very weird indeed.

Let me demonstrate, now that I've found some time to extract some real ticks (EUR/USD, Dukascopy feed):

Here's how ticks look:

Code:

2014-01-02 01:17:13.341 1.37694 1.37714
2014-01-02 01:17:13.644 1.37695 1.37703
2014-01-02 01:17:15.954 1.37694 1.37713
2014-01-02 01:17:16.209 1.37694 1.37718
2014-01-02 01:17:17.653 1.37680 1.37706
2014-01-02 01:17:17.754 1.37680 1.37719
2014-01-02 01:17:17.805 1.37680 1.37720
2014-01-02 01:17:20.120 1.37680 1.37703
2014-01-02 01:17:20.424 1.37691 1.37696
2014-01-02 01:17:28.991 1.37694 1.37696
2014-01-02 01:17:29.042 1.37695 1.37698
2014-01-02 01:17:29.143 1.37696 1.37703
2014-01-02 01:17:29.201 1.37696 1.37705
2014-01-02 01:17:29.301 1.37695 1.37725
2014-01-02 01:17:31.682 1.37696 1.37725
2014-01-02 01:17:32.307 1.37695 1.37725
2014-01-02 01:17:33.607 1.37704 1.37725
2014-01-02 01:17:33.915 1.37714 1.37725
2014-01-02 01:17:57.532 1.37715 1.37723
2014-01-02 01:17:59.977 1.37714 1.37725

First column is the time (including milliseconds) followed by bid & ask prices. When you get a tick you know that there has been at least one transaction with that bid/ask prices. There could have been more, but another tick is printed only if bid/ask prices changed since the last tick has printed.

If you look closely, all the above ticks are from the same minute 01:17 and I specifically looked for the very quiet period so that I can paste the full minute without forcing you to scroll too much.

Trust me, during peak hours one minute more often than not sees more than hundred ticks. Tens of ticks can occasionally be printed inside one second.

Converting that tick data to 1min bars is quite straightforward, you detect the opening price (first bid: 1.37694 in the example above), closing price (last bid: 1.37714), highest price (1.37715), lowest price (1.37680) and you're done.

So you compress typically hundreds of ticks to just 4 numbers (losing lots of information in the process) to get a 1 min OHLC bar, but calling the resulting OHLC bar 'a tick' is insane.

Posted By: jcl

Re: price() clarification - 01/03/14 14:22

Well... I think discussions about how something ought be called or not are certainly among the ten most superfluous human activities. So you have my permission to call that thing that I refer to as 'tick' anything else that you want

.

We've just followed the name convention from broker APIs, which normally use the name 'tick' for the smallest price cluster. If you can think of a better name, I'm open for any suggestions.

Posted By: Anonymous

Re: price() clarification - 01/03/14 14:39

Originally Posted By: jcl

Well... I think discussions about how something ought be called or not are certainly among the ten most superflous human activities. So you have my permission to call that thing that I refer to as 'tick' anything else that you want

I'm pretty sure that if I loved history, I would've known a few examples where language barriers started wars.

Quote:

We've just used the terminology from the broker API, therefore the name 'tick' for that thing, which is represented by the TICK struct in the software.

OK, so it is FXCM insanity or just illiterate API programmers, whatever... And it's not a first time. FXCM 'lot' is also not 100000 of base currency like the rest of the world, but something completely different.

But to call 1min bar a tick and 2min bar is suddenly not a tick? Weird! And how can a tick have 4 prices? And how can it last for 1 minute exactly, how that makes it a tick? Blah...

I agree it's stupid chasing the reasons for bad terminology, but it's still confusing and error prone to translate every occurence of work 'tick' to 'bar actually' in mind. Not to mention that some of your previous explanation how various price function operate on ticks didn't make any sense to me, only now I understand why.

Posted By: jcl

Re: price() clarification - 01/03/14 14:42

Just to avoid further confusion: A bar is NOT a tick. A bar is a time period, while a tick is the smallest price cluster.

Only in the special case when the bar period is identical to the tick distance, a bar is equivalent to 1 tick.

Posted By: Anonymous

Re: price() clarification - 01/03/14 14:52

Could you please explain what 'price cluster' is? I never heard of it...

Searching manual returns only: "The Z3 system detects price clusters that precede a breakout - a strong price movement in any direction"

Google search returns a lot of hits about "Fibonacci price clusters", is that it? Doubt it...

Posted By: jcl

Re: price() clarification - 01/03/14 14:53

I've just made up that word for avoiding 'tick'. I meant an object clustered together from 4 prices. In software:

Code:

typedef struct TICK
{
	float fOpen, fClose;	
	float fHigh, fLow;	
	DATE	timestamp;	
} TICK;

Posted By: Anonymous

Re: price() clarification - 01/03/14 15:10

Sorry, you're still not coming clear here. That object looks _exactly_ like an OHLC bar. Also from what I've seen, Zorro's (might be FXCM's) bar files are constructed from 1min bars. If it looks like a bar, swims like a bar and quack like a bar it's an effing bar.

So, when 2 posts above you say "A bar is a time period, while a tick is the smallest price cluster" I really don't understand what you mean.

Above I see structure that describes the most common price bar nowadays (used to construct the most popular OHLC and candlestick charts). Yes, it is awkwardly called 'TICK', which reminds of a standard price tick I explained earlier. But beside C source it should NEVER be called a tick to avoid any confusion with real price ticks.

Posted By: DdlV

Re: price() clarification - 01/03/14 18:16

I agree that discussions about what something "ought" to be called are sometimes superfluous. However, agreeing on the definitions of terms is rarely superfluous - in fact, it's usually critical to a useful discussion and good decisions. So, I start there:

Tick: Timestamp, Bid, and Ask provided whenever one of the latter 2 changes in live trading; or provided as historical data from live trading. (Or I suppose could be made up to simulate future trading...)

Bar: Timestamp, Open, Close, High, and Low built from the Ticks of whatever the Bar's historical time period is, possibly ending with the most recent live data, Open coming from the last Tick prior to the start of the time period. Also called Candle.

Is this correct? Is there any other useful term to define?

Using those definitions, my understanding is that Zorro's provided history files are 1 minute Bars built from the historical Ticks - is that correct?

Returning to pipclown's original question, he saw a preponderance of High==Open and Low==Close for EUR/USD in the 1st 1/2 of 2007. GBP/USD shows the same thing, but AUD/USD does not - why the difference?

Additionally, for EUR/USD the preponderance persists through 2010 but vanishes beginning with 2011 - why?

Thanks.

Posted By: Sundance

Re: price() clarification - 01/03/14 19:44

Definition of terms is absolute the base before discussing anything. DdlV gave the right ones. Never ever call a tick something with a High,Low,Open and Close price. This is definitively not a tick!
I'am a little bit surprised to read your definition of a Tick three posts above JCL. It's simply Wrong and has nothing to do with obsolete naming something so important.

Posted By: Anonymous

Re: price() clarification - 01/03/14 19:48

So I am guessing you cant actually test a strategy with tick data in zorro. closest you can come to this is using 1 minute bars.

Posted By: Anonymous

Re: price() clarification - 01/03/14 19:57

Originally Posted By: DdlV

FWIW, I completely agree with your definitions, and that is how I have seen those terms, always.

I also agree that the proper terminology is essential. Especially in complex topics, and automated trading is certainly complex.

Posted By: Anonymous

Re: price() clarification - 01/03/14 19:58

Originally Posted By: liftoff

So I am guessing you cant actually test a strategy with tick data in zorro. closest you can come to this is using 1 minute bars.

That is correct.

Posted By: pipclown

Re: price() clarification - 01/04/14 03:05

I put together a Python script that parses and counts wicks in the .bar history files. A wick is defined as high price > open price or low price < close price.

Here's the output from that script:

Code:

_Asset | 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013
--------------------------------------------------------------------------------
AUDUSD |   66    78    82    89    87    76     0     0     0    93    93    91 
EURUSD |   71    81    84     0     0     0     0     0     0    96    94    92 
EURCHF |    -     -     -     -     -     -     -     -     -    94    78    87 
GBPUSD |   73    80    85    90    91     0     0     0     0    93    92    92 
 GER30 |    -     -     -     -     -     -     -     0     0     0    88    86 
NAS100 |    -     -     -     -     -     -     -     0     0    76    75    74 
NZDUSD |   67    78    83    86    88    76    81    85    87     -    91     - 
SPX500 |    -     -     -     -     -     -     -     0     0    80    77    76 
 UK100 |    -     -     -     -     -     -     -     0     0    88    83    82 
  US30 |    -     -     -     -     -     -     -     0     0    87    85    84 
USDCAD |   67    79    83    88    88    74     0     0     0    91    89    84 
USDCHF |   78    81    82    87    90     0     0     0     0    93    92    91 
USDJPY |   74    80    82    89    89     0     0     0     0    89    88    94 
 USOil |    -     -     -     -     -     -     -     0     0    82    83    80 
XAGUSD |    -     -     -     -     -     -     -     -     0    89    86    87 
XAUUSD |    -     -     -     -     -     -     -     -     0    93    93    93

Entries marked with a dash ("-") are simply missing data (no .bar file found). Tried to download all .bar files found on the download page.
The numbers represent the wick percentage (num wicks / total num of HLOCs).

The wick percentage can of course vary from each asset and year depending on if the asset is trending or not. But what looks peculiar are the zeros. Seems strange these should be zero?

jcl, do you know why this could be?

Then there's the question what effect this has on backtesting, if any. Rarely do you test on the 1-min bar period IMHO.

I included the script I used. Requires a working Python installation. Run from the History folder.

Code:

import sys
from struct import calcsize, unpack_from
from collections import namedtuple

Tick = namedtuple('Tick', ['open', 'close', 'high', 'low', 'time'])
YEARS = range(2002, 2013 + 1)
ASSETS = [
    'AUDUSD', 'EURUSD', 'EURCHF', 'GBPUSD', 'GER30', 'NAS100',
    'NZDUSD', 'SPX500', 'UK100', 'US30', 'USDCAD', 'USDCHF',
    'USDJPY', 'USOil', 'XAGUSD', 'XAUUSD'
]

def parse_bar_file(filename):
    ticks = open(filename, 'rb').read()
    format = '<ffffd'
    format_size = calcsize(format)
    num_ticks = len(ticks) / format_size
    return [Tick(*unpack_from(format, ticks, format_size * i)) for i in xrange(num_ticks)]

def count_wicks(ticks):
    return len([t for t in ticks if t.high > t.open or t.low < t.close])

def print_summary_matrix():
    print "%6s | %s" % ("Asset", "  ".join(map(str, YEARS)))
    print "-" * 80
    for asset in ASSETS:
        wick_percentages = []
        for year in YEARS:
            try:
                ticks = parse_bar_file("%s_%d.bar" % (asset, year))
            except IOError:
                # Some years are missing
                wick_percentages.append("   - ")
                continue

            num_wicks = count_wicks(ticks)
            if len(ticks) == 0:
                # Some files are empty
                prc = wick_percentages.append("   - ")
                continue

            prc = num_wicks / float(len(ticks))
            wick_percentages.append("%4.0f " % (prc * 100))
        
        print "%6s | %s" % (asset, " ".join(wick_percentages))


print_summary_matrix()

Posted By: jcl

Re: price() clarification - 01/04/14 09:45

Yes, that explains your finding. The FXCM API apparently composed price ticks in 2008-2010 in a different way than the ticks from more recent years. Possibly FXCM have filtered price quotes in some way before generating ticks from them.

I had not noticed this previously - I've only checked the 2013 data after your first post. I'll ask FXCM about that. We had downloaded the wickless price files in 2011, using their old API back then, so maybe unfiltered ticks are now available with the new API. We'll then upload new historic files soon.

- For overcoming the apparent confusion of quotes, ticks, bars, and candles, I'll put up a glossary of those terms in the manual. I hope this helps then to understand my clumsy technical explanations.

Posted By: pipclown

Re: price() clarification - 01/04/14 14:55

Cool, thanks jcl.

Posted By: Sundance

Re: price() clarification - 01/04/14 15:41

Most appreciated JCL!

Posted By: Anonymous

Re: price() clarification - 01/05/14 17:27

Originally Posted By: DdlV

Bar: Timestamp, Open, Close, High, and Low built from the Ticks of whatever the Bar's historical time period is, possibly ending with the most recent live data, Open coming from the last Tick prior to the start of the time period. Also called Candle.

Actually, I see one mistake in the above. If Open came from the last tick prior to the start of the time period, then we would never have gaps. But we do. Open comes from the first tick in the new time period. At least that's how I see it.

Posted By: DdlV

Re: price() clarification - 01/05/14 18:19

Thanks acidburn! True - Open is a bit more complicated - something like:

if((Market has been closed) or (Tick time == Bar start time)) Open = Bar's first Tick price;
else Open = last Bar Close;

???

Posted By: GPEngine

Re: price() clarification - 01/05/14 18:39

Great script, pipclown. I used this to identify some problems in my History data.

Can we think of some other criteria to add to this Python script, to make it more of a sanity check?
How about
- starts near jan 1
- ends near dec 31, unless current year.
- no gaps > 3 days

Posted By: Anonymous

Re: price() clarification - 01/05/14 19:56

Originally Posted By: DdlV

Thanks acidburn! True - Open is a bit more complicated - something like:

if((Market has been closed) or (Tick time == Bar start time)) Open = Bar's first Tick price;
else Open = last Bar Close;

???

Nope. I still don't like it. I think Open price should bear no connection to the last bar. One bar should be completely independent of the other. And bound only by time, as candlestick chart is time based.

So quoting your rules, i think Open only need this one: "Open = Bar's first Tick price"

Posted By: Anonymous

Re: price() clarification - 01/05/14 20:05

Ah, I think I see where you're leading with last bar's close. In a (very unlikely) situation when there was no trading at all in a 1 minute period, yes, if you want that slot filled you could copy the close price from the last bar to all of OHLC of the current bar. It's the best approximation.

Of course, situations like that should be very rare in todays forex market. But as it is fragmented (not centralized) some less popular brokers could see periods of inactivity. Although 1 minute is a long long time...

Posted By: Anonymous

Re: price() clarification - 01/05/14 20:34

Originally Posted By: pipclown

I put together a Python script that parses and counts wicks in the .bar history files. A wick is defined as high price > open price or low price < close price.

Now, that I had time to run the script and compare results, all I can say is great detective work pipclown!

I think there's no need to duplicate the effort, so I won't pursue the same analysis.

But... I would like to ask you for a favor, if you can spare a bit more time on your excellent script. I'd do it myself, if I knew python.

I find your formula for a wick just a bit lacking.

I think the existence of the upper wick would be better described as high > max(open, close) and analogous the lower wick as low < min(open, close). If you would be so kind to make this little adaptation and run it again, to see if it changes anything? Thanks!

Posted By: DdlV

Re: price() clarification - 01/06/14 02:27

Thanks acidburn. I think it comes down to (again!

) definition:

If Bar is defined as the price action of the Ticks within the Bar's time period, then you are correct and a Bar's Open is the price of the first Tick. And there is always a gap from the prior Bar's close, as I understand Ticks only happen when price changes?

If Bar is defined as the price action of the time period of the Bar, then:

In continuous trading Open is the Close of the previous Bar, since that's the price at the start of the Bar (no gap), unless the Bar's first Tick is exactly at the Bar's start time (gap); and in discontinuous trading (Market was closed, halted, other?) by convention Open is the first Tick's price (gap)?

Comments? Thanks!

Posted By: pipclown

Re: price() clarification - 01/06/14 04:12

acidburn, yeah, you're right. Fixed it below.

GPEngine, I modified the scripts a bit and added your suggestions.
It now outputs some more tables with an overview for each asset.
Also counts number of duplicate data it finds (bars with identical timestamps)

Output from script:

Code:

Wick percentages
_Asset | 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013
--------------------------------------------------------------------------------
 UK100 |    -     -     -     -     -     -     -   0.0   0.0  74.1  66.3  62.8  
USDCHF | 56.1  64.4  63.3  74.2  78.9   0.0   0.0   0.0   0.0  86.3  84.4  82.4  
GBPUSD | 46.7  61.9  68.8  77.9  81.1   0.0   0.0   0.0   0.0  86.4  84.8  84.9  
EURCHF |    -     -     -     -     -     -     -     -     -  89.1  61.9  76.2  
XAUUSD |    -     -     -     -     -     -     -     -   0.0  86.1  86.3  87.6  
 GER30 |    -     -     -     -     -     -     -   0.0   0.0   0.0  76.2  72.0  
XAGUSD |    -     -     -     -     -     -     -     -   0.0  80.6  75.9  76.4  
USDCAD | 35.8  61.8  65.5  75.3  76.0  54.8   0.0   0.0   0.0  82.4  79.7  69.9  
AUDUSD | 31.9  58.8  63.2  73.9  74.4  56.2   0.0   0.0   0.0  86.9  87.2  83.0  
  US30 |    -     -     -     -     -     -     -   0.0   0.0  75.0  70.0  67.2  
EURUSD | 41.8  63.0  65.7   0.0   0.0   0.0   0.0   0.0   0.0  91.6  88.7  84.1  
USDJPY | 47.1  61.3  61.9  76.3  76.8   0.0   0.0   0.0   0.0  78.0  76.3  87.5  
NAS100 |    -     -     -     -     -     -     -   0.0   0.0  51.9  48.6  47.6  
NZDUSD | 35.2  59.1  66.0  73.6  75.4  57.4  66.0  72.0  74.3     -  83.5     -  
SPX500 |    -     -     -     -     -     -     -   0.0   0.0  60.5  53.4  51.7  
 USOil |    -     -     -     -     -     -     -   0.0   0.0  68.7  68.4  64.3  



Num gaps large than 3 days
_Asset | 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013
--------------------------------------------------------------------------------
 UK100 |    -     -     -     -     -     -     -     5     5     3     3     4  
USDCHF |    0     0     0     1     1     0     0     1     0     0     0     0  
GBPUSD |    0     0     0     1     1     0     0     1     0     0     0     0  
EURCHF |    -     -     -     -     -     -     -     -     -     0     0     0  
XAUUSD |    -     -     -     -     -     -     -     -     0     1     0     1  
 GER30 |    -     -     -     -     -     -     -     3     2     1     2     1  
XAGUSD |    -     -     -     -     -     -     -     -     0     1     0     1  
USDCAD |    0     0     0     1     0     0     0     1     0     0     0     0  
AUDUSD |    0     0     0     0     0     0     0     1     0     0     0     0  
  US30 |    -     -     -     -     -     -     -     2     1     2     0     1  
EURUSD |    0     0     0     0     1     0     0     1     0     0     0     0  
USDJPY |    0     0     0     1     1     0     0     1     0     0     0     0  
NAS100 |    -     -     -     -     -     -     -     4     1     2     0     1  
NZDUSD |    0     0     0     0     0     0     0     1     0     -     0     -  
SPX500 |    -     -     -     -     -     -     -     4     1     2     0     1  
 USOil |    -     -     -     -     -     -     -     2     2     2     1     1  



Num duplicates
_Asset | 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013
--------------------------------------------------------------------------------
 UK100 |    -     -     -     -     -     -     -   577   618   629   598   485  
USDCHF |  981  1132  1143  1225  1209  1052  1184  1194  1225  1231  1228  1011  
GBPUSD |  825  1054  1163  1232  1214  1075  1165  1185  1214  1227  1225  1012  
EURCHF |    -     -     -     -     -     -     -     -     -  1229  1181  1089  
XAUUSD |    -     -     -     -     -     -     -     -  1154  1153  1168   965  
 GER30 |    -     -     -     -     -     -     -   629   680   624   679   556  
XAGUSD |    -     -     -     -     -     -     -     -  1059  1136  1121   929  
USDCAD |  619  1040  1082  1215  1171   931  1039  1069  1176  1219  1218   986  
AUDUSD |  582  1001  1086  1211  1177  1013  1175  1187  1216  1231  1228  1019  
  US30 |    -     -     -     -     -     -     -   821  1025  1076  1037   828  
EURUSD |  840  1103  1133  1224  1191  1054  1209  1211  1226  1238  1231  1019  
USDJPY |  945  1084  1157  1228  1195  1123  1208  1204  1219  1227  1219  1020  
NAS100 |    -     -     -     -     -     -     -   514   609   740   648   489  
NZDUSD |  496   904  1071  1209  1167   990  1088  1081  1171     -  1229     -  
SPX500 |    -     -     -     -     -     -     -   746   849   930   834   638  
 USOil |    -     -     -     -     -     -     -   794  1012  1075  1032   799  



Start/end dates ok
_Asset | 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013
--------------------------------------------------------------------------------
 UK100 |    -     -     -     -     -     -     -   Y/Y   Y/Y   Y/Y   Y/Y   Y/N  
USDCHF |  Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/N  
GBPUSD |  Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/N  
EURCHF |    -     -     -     -     -     -     -     -     -   Y/Y   Y/Y   Y/N  
XAUUSD |    -     -     -     -     -     -     -     -   Y/Y   Y/Y   Y/Y   Y/N  
 GER30 |    -     -     -     -     -     -     -   Y/Y   Y/Y   Y/N   Y/Y   Y/N  
XAGUSD |    -     -     -     -     -     -     -     -   Y/Y   Y/Y   Y/Y   Y/N  
USDCAD |  Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/N  
AUDUSD |  Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/N  
  US30 |    -     -     -     -     -     -     -   N/Y   Y/Y   Y/Y   Y/Y   Y/N  
EURUSD |  Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/N  
USDJPY |  Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/N  
NAS100 |    -     -     -     -     -     -     -   Y/Y   Y/Y   Y/Y   Y/Y   Y/N  
NZDUSD |  Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y   Y/Y     -   Y/Y     -  
SPX500 |    -     -     -     -     -     -     -   Y/Y   Y/Y   Y/Y   Y/Y   Y/N  
 USOil |    -     -     -     -     -     -     -   N/Y   Y/Y   Y/Y   Y/Y   Y/N  



[Finished in 290.2s]

While these may be errors in the .bar files, please take it with a pinch of salt. What effect this have in the end I think jcl is the best one to answer. I merely found some discrepancies while fiddling with the .bar files.
The start/end look wrong since I use .bar files from Oct 2013 and now it is 2014

Updated script:

Code:

import sys
import datetime
from struct import calcsize, unpack_from
from collections import namedtuple

Tick = namedtuple('Tick', ['open', 'close', 'high', 'low', 'time'])
AssetReport = namedtuple('AssetReport', ['year', 'asset', 'wick_percentage', 'start_date_ok', 'end_date_ok', 'num_gaps', 'num_duplicates'])

OLE_TIME_ZERO = datetime.datetime(1899, 12, 30, 0, 0, 0)
YEARS = range(2002, 2013 + 1)
ASSETS = [
    'AUDUSD', 'EURUSD', 'EURCHF', 'GBPUSD', 'GER30', 'NAS100',
    'NZDUSD', 'SPX500', 'UK100', 'US30', 'USDCAD', 'USDCHF',
    'USDJPY', 'USOil', 'XAGUSD', 'XAUUSD'
]

def dt(oledt):
    return OLE_TIME_ZERO + datetime.timedelta(days=float(oledt))

def parse_bar_file(filename):
    ticks = open(filename, 'rb').read()
    format = '<ffffd'
    format_size = calcsize(format)
    num_ticks = len(ticks) / format_size
    ticks = [Tick(*unpack_from(format, ticks, format_size * i)) for i in xrange(num_ticks)]
    ticks.reverse()
    return ticks

def count_wicks(bars):
    has_wick = lambda b: b.high > max(b.open, b.close) or b.low < min(b.open, b.close)
    return len([b for b in bars if has_wick(b)])

def verify_starting_date(bars):
    return dt(bars[0].time).day in [1, 2, 3, 4, 5]

def verify_end_date(bars):
    date = dt(bars[-1].time)
    if date.year == datetime.datetime.now().year:
        return True

    return date.day in [26, 27, 28, 29, 30, 31]

def count_gaps(bars):
    num_gaps = 0
    for index, bar in enumerate(bars):
        if index == 0:
            continue
        
        gap = dt(bar.time) - dt(bars[index - 1].time)
        if gap > datetime.timedelta(days=3):
            num_gaps += 1

    return num_gaps

def count_duplicates(bars):
    unique_timestamps = set()
    num_duplicates = 0
    for bar in bars:
        if bar.time not in unique_timestamps:
            unique_timestamps.add(bar.time)
        else:
            num_duplicates += 1
    return num_duplicates


def generate_reports():
    reports = {}
    for asset in ASSETS:
        print "Generating for", asset, "..."
        year_reports = []
        for year in YEARS:
            try:
                bars = parse_bar_file("%s_%d.bar" % (asset, year))
                if len(bars) == 0:
                    raise IOError("No bars in file")
            except IOError:
                # Some years are missing/empty
                year_reports.append(None)
                continue

            num_wicks = count_wicks(bars)
            wick_prc = num_wicks / float(len(bars))
            start_date_ok = verify_starting_date(bars)
            end_date_ok = verify_end_date(bars)
            num_gaps = count_gaps(bars)
            num_duplicates = count_duplicates(bars)

            year_reports.append(AssetReport(year, asset, wick_prc, start_date_ok, end_date_ok, num_gaps, num_duplicates))
            
        reports[asset] = year_reports
    return reports

def print_table(reports, title, output_func):
    print title
    print "%6s | %s" % ("Asset", "  ".join(map(str, YEARS)))
    print "-" * 80
    for asset, year_reports in reports.items():
        sys.stdout.write("%6s | " % (asset,))
        
        for report in year_reports:
            if report is None:
                sys.stdout.write("   -  ")
            else:   
                sys.stdout.write(output_func(report))
        
        sys.stdout.write("\n")
        
    print "\n\n"


def format_date_ok(r):
    output = " "
    output += "Y" if r.start_date_ok else "N"
    output += "/"
    output += "Y" if r.end_date_ok else "N"
    output += "  "
    return output
        

reports = generate_reports()

print_table(reports, "Wick percentages", lambda r: "%4.1f  " % (r.wick_percentage * 100, ))
print_table(reports, "Num gaps large than 3 days", lambda r: "%4d  " % (r.num_gaps, ))
print_table(reports, "Num duplicates", lambda r: "%4d  " % (r.num_duplicates, ))
print_table(reports, "Start/end dates ok", format_date_ok)

Takes about 5 mins to run on my i5.

Posted By: jcl

Re: price() clarification - 01/06/14 08:43

New data is indeed available by FXCM, and the problem only appeared in the data files downloaded 2011 or earlier with their old API. We'll download and replace the files in the next days. As to my knowledge, the issue has no practical effect on backtesting normal systems, so it is not a matter of urgency.

Posted By: Anonymous

Re: price() clarification - 01/06/14 10:57

Originally Posted By: DdlV

Thanks acidburn. I think it comes down to (again!

In your first definition you're not right that there's always a gap (prices fluctuate and overlap a lot, 1 minute is a long time with many ticks, etc...). Your second definition is hard to parse, but if we concentrate on the "Open is the Close of the previous bar" part, then we would not have any gaps while the market is open? Yet, gaps do happen. I attach part of the M1 chart from this morning market open. Obviously you have no issue with the weekend gap, but you would have hard time explaining other gaps on the chart with your second definition.

Posted By: Anonymous

Re: price() clarification - 01/06/14 11:01

Originally Posted By: pipclown

acidburn, yeah, you're right. Fixed it below.

Thanks pipclown! As i suspected it doesn't disprove your finding, although percentages should now be more correct. Your script is slowly becoming a standard to check history data.

I think it already deserves it's own topic on the forum and sticky status. Good job!

Posted By: Anonymous

Re: price() clarification - 01/06/14 11:05

Originally Posted By: jcl

Probably the impact to 1h and 4h timeframes is negligible. But it is paramount on 1m and 5m. Don't know how many people look for market inefficiencies on such low timeframe, but I was guilty doing that in the past, and intend in the future.

Please just leave a note when updated history files are in place, so we can download and replace the old ones. Thanks!

Posted By: DdlV

Re: price() clarification - 01/06/14 16:36

Thanks acidburn! So the question is: What is the exact definition of "Bar"? Hopefully jcl's glossary will clear this up and show how to exactly calculate from Ticks files.

And thanks to pipclown also!

Posted By: Anonymous

Re: price() clarification - 01/06/14 16:45

Originally Posted By: DdlV

Thanks acidburn! So the question is: What is the exact definition of "Bar"? Hopefully jcl's glossary will clear this up and show how to exactly calculate from Ticks files.

Well I wrote you the definition few posts before. It's very short so you missed it. I wouldn't count on jcl to provide us the definition because he's too fond of his ticks and doesn't like bars very much.

Posted By: DdlV

Re: price() clarification - 01/06/14 18:00

Actually, I don't see where you wrote an exact definition of BAR that would include how to calculate from Ticks, just comments on definitions, so I"m still looking for that exact definition.

Maybe jcl will break down a little and give us Bicks or Tars?

Posted By: Anonymous

Re: price() clarification - 01/06/14 18:06

Originally Posted By: DdlV

Actually, I don't see where you wrote an exact definition of BAR that would include how to calculate from Ticks, just comments on definitions, so I"m still looking for that exact definition.

Here: http://www.opserver.de/ubb7/ubbthreads.php?ubb=showflat&Number=435146#Post435146
And here: http://www.opserver.de/ubb7/ubbthreads.php?ubb=showflat&Number=435286#Post435286

Posted By: DdlV

Re: price() clarification - 01/06/14 18:46

Thanks acidburn. OK, expanding on what's in the first link:

Bar is defined as the Ask HLOC of the Ticks within a given time period as follows: High is the highest Ask of all Ticks, Low is the lowest Ask of all Ticks, Open is the Ask of the first Tick, and Close is the Ask of the last Tick.

Correct?

Thanks.

Posted By: Anonymous

Re: price() clarification - 01/06/14 18:58

Originally Posted By: DdlV

IMHO, yes, now our definitions match. And now you match my algorithm to create OHLC bars from tick data. Congratulations, we have just reinvented the wheel.

Although, to be more general, the price needs not to be Ask, it could've been Bid. I think Zorro uses Ask, most other platforms I've seen use Bid. That of course doesn't change much in practice.

Posted By: DdlV

Re: price() clarification - 01/06/14 19:58

Yes, I used Ask 'cause this is the Zorro forum!

Probably did reinvent, but I haven't found a succinct definition of Bar in web searches yet. Although there are lots of uses of tick, a la jcl!

Posted By: swingtraderkk

Re: price() clarification - 01/09/14 18:41

This thread has been busy while I've been offline for a couple of weeks!

Can I summarize what I would like before I waste time on sub 30 min strategy development:

1) Updated FXCM Bar files as jcl discussed, we'll need to test any strategy on their data before running it live anyway no matter how good our other tick data is.

2) A community project to develop and maintain historical tick data (timestamp, ask, bid) possibly using dukascopy tick data (it is of high quality, but there are others, and Dukascopy doesn't have index, commodity data afaik). Unsure of licencing issues here.

3) Convert the high quality tick data to 1 min OHLC Bar files and share them.

4) For us to be able to use actual spreads between the bid and ask in our script logic not just the reported spread from the broker. Useful as a filter for when things go all NFP crazy. I've asked for this already.

5) For trades to be accurately stopped out/limit hit in back tests when you get temporarily high spreads. I assume the current model uses ask prices with reported as opposed to actual spreads. Unsure how that would work.

5) For zorro to be able to use actual ticks (timestamp, bid ask) in back tests with set(TICKS);

6) To be able to use bars of fixed numbers of ticks as opposed to the current time based bars.

jcl, any thoughts on 4 - 6? How feasible is any of that?

Community, any support for 2 & 3?

Posted By: jcl

Re: price() clarification - 01/09/14 18:56

That is certainly possible. In fact an old Zorro version indeed used actual spreads for the simulation, and we had also tested actual price quotes instead of the 1-minute ticks by the price history API. In both cases the results were no different than with the current version, but the calculation time was much longer. Therefore this was abandoned.

But historic files with ticks from real price quotes is still on our implementation list because it is frequently asked for by users.