How do we identify and categorise the different possible trends? On this page I will describe one of the techniques that I use:
One method I use is to get my computer to systematically evaluate every possible combination of starting and finishing dates within the 3 month window. That is to say, suppose we treat the Start date S, and the Finish date F, as if they were independent variables. For each pair (S,F) we could fit a linear regression line to the DJIA data that lies inclusively between those two dates. In the process the computer could work out for us the value of R2 where R is the correlation coefficient of the data to a straight line.
This allows us to define a function: f = R2(S,F). We can then plot this function as a contour chart in a two dimensional SF-Space:
In the above chart, the vertical axis specifies the starting date, and the horizontal axis specifies the finishing date for a calculated trend.
The color coded legend at the left of the chart indicates the value of R2 for a linear regression on the data that lies inclusively between those two dates. i.e. We are plotting the "strength" of the trend between any two dates in our window. Where the "strength" of a trend is defined by how linear the data is, when measured by the "coefficient of determination" R2. Where R is the correlation coefficient of the data to a straight line.
The diagonal black line is the "causality horizon". All real points are to the right of this line, because the finishing date always has to be later than a starting date. The display actually starts 4 days to the right of the causality horizon, because I require at least 5 consecutive end-of-day data points before I attempt to calculate a trend. This is because experience has shown that shorter trends are practically meaningless from a statistical significance point of view.
Each day's new data enters the display as a vertical slice along the right hand edge of the chart. It moves diagonally up and to the left, parallel to the causality horizon, as time elapses. I find it interesting that strong trends are so highly localised in SF-Space. I usually describe it by saying, that the trends in DJIA SF-Space are like "Islands of trendiness in a vast sea of random walking".
It is very instructive to compare the SF-Space chart to its corresponding time-series chart. Both charts are derived from the same data and contain the same information. But the information is represented in a very different and illuminating way. As a general rule, it is much easier to find and evaluate a strong trend on the SF-Space chart. Once you know the starting and finishing dates from the (S,F) coordinates, you can then easily locate it on the time-series chart. Often this allows you to find trends that you might otherwise have missed. Also, it is easy to find the precise coordinate pair (S,F) that gives you the strongest possible value of R2 for any given trend-island feature. This allows you to optimise the best-fitting trend on the corresponding time-series chart.
Ambiguities are still possible however; and sometimes a degree of subjective judgement is required to resolve them. This is why I call this a semi-objective method. It is not uncommon to find multiple peaks corresponding to more than one possible trendline fit. Often the peaks line up in the vertical or horizontal dimension. This means that one of the corresponding S or F coordinates is ambiguous but not the other one. Whenever two, otherwise equal trends exist, I usually use the longest one, because longer trends can have shorter trend-fragments as subsets. Also longer trends tend to be more statistically significant. The longest trend is the one with the greatest perpendicular distance from the causal horizon. Mostly though, I find that I can just pick the highest peak.
Needless to say charting data in this way requires a considerable degree of number-crunching power. The above SF-Space chart contains 65 x 65 data cells corresponding to a 65 trading-day window or 13 weeks or 3 months. That's over 2000 correlation coefficient calculations, assuming you ignore the non-causal, lower-left half of the chart. This is about the minimum useful image size. So this is not a job for a 486 generation PC, unless you are very patient.
Charts with a width of more than a few hundred cells become very difficult to read. So it is important to choose an appropriate sampling interval to match the timescale you are interested in. e.g. A day trader might use a one minute sampling interval to display a 120 x 120 cell chart showing all the trends that have emerged in the last 2 hours. I suggest you use a fast Pentium if you want to do this in real time. You might also need to write some rather specialised software.