We still need to nail down exactly what techniques we are going to use
What time monday (10/6) works for people to meet to agree on our update for the class, 1pm?
To predict either stock/index price or stock/index variance based on training data. We will evaluate the performance of alternate methods based on there return on test data.
We will come up with a per trade cost We will have a fixed number of dollars to start, $1M? We should have a standard simulation system (call a function with a proposed trade, get a return)
Data source that we know are available 2004-2005 inter-day tick data for ?, 2.2GB compressed
data is available Do not forward this data to anyone not in the class and working on the finance project
Ok now there are two data sets in the directory the first is end of day option data. The option data files are named Year_Month|Quarter.zip. The second data set is 5min,10min,30min,end of day data for 100 Nasdaq stocks from 1997-2002
Here is the description of each field for the option data.
Underlying Symbol, Underlying Price, Exchange (always is an asterisk in end of day data), Option Root, Option Extension, Contract Type (call or put), Expiration Date, Quote Date, Strike, Last Price, Bid Price, Ask Price, Volume, Open Interest, Implied Volatility, Delta,Gamma
UTEK,14.89,*,UQT,QV,put,05/20/2005,2/1/2005 4:00:00 PM,12.5,1.05,0.6,0.8,0,86,0.558568,-0.228549,6.708686,0,0 UTEK,14.89,*,UQT,EC,call,05/20/2005,2/1/2005 4:00:00 PM,15,3.4,1.6,1.75,0,7,0.527764,0.552821,9.279516,0,0 UTEK,14.89,*,UQT,QC,put,05/20/2005,2/1/2005 4:00:00 PM,15,1.85,1.65,1.8,0,339,0.529672,-0.44696,9.245419,0,0 UTEK,14.89,*,UQT,EW,call,05/20/2005,2/1/2005 4:00:00 PM,17.5,0.85,0.7,0.85,0,88,0.50583,0.33157,8.883629,0,0 UTEK,14.89,*,UQT,QW,put,05/20/2005,2/1/2005 4:00:00 PM,17.5,3.7,3.1,3.4,0,96,0.485946,-0.678822,9.127955,0,0 UTEK,14.89,*,UQT,ED,call,05/20/2005,2/1/2005 4:00:00 PM,20,0.25,0.25,0.4,0,131,0.494386,0.171051,6.36422,0,0 UTEK,14.89,*,UQT,QD,put,05/20/2005,2/1/2005 4:00:00 PM,20,2.9,5.2,5.6,0,10,0.519563,-0.813445,6.395854,0,0
Here is the stock data description
The data for a stock is in a directory with that stocks name. In the directory for the stock there are separate files for the different time resolutions.
The format is Date,Time,open, high, low, close, volume
Everyone should propose at least one approach to
Sidharth: I was thinking of trying some simple approaches to see how well they will do. My first thought was to do some form clustering followed by using some form of nearest neighbor approximation to try to predict the price at the next step of the time series. As I was going through the papers on the quant papers link, I saw a couple of papers related to using nearest neighbor for time series prediction. I haven't actually read the papers yet, but thought I would throw out the idea since it was one of the first that came to my mind
America: My first thoughts. (1) Recurrent neural networks since these exhibit short-term memory and can behave chaotically. There has been quite a bit of work using recurrent neural networks so we may want to steer clear of this area. (2) Fitting a function to the time series using linear regression. With linear regression we can define X (our response matrix) to include as much, or as little, information as we want. (3) In contrast we could look at prediction using Fourier transform (modeling the time series as a summation of sine and cosine waves). If I have any more ideas, I'll post them.
Dave: I have a few ideas.
Ian: I believe that market movements are not solely explaned by rational decision makers. Recent models of market movements that include group dynamics include Random Field Ising Model and models for flocking in biological systems. Investors in these models are not just independent decision makers, instead they are coupled. Several factors play a role, cohesion - try to get in the same position as everyone else, seperation - avoid crowds, alignment - try to move in the same direction as everyone else. So investors are probably a combination of rational actors with noise, plus participants in a flock. There are likely several flocks seperated by social networks or by financial segments.
So I would like to see if market behavior can be modeled by groups of these rational+flocking investors. Since the rational behavior would follow the principal of efficient markets, without additional information no predictions can be made. But the flocking behavior may lend itself to predicitons.
Here are some papers on recent related work
Separate Flocking page for discussions specific to this method
Tony: I came across this neat book that uses Projection Pursuit Regression and Neural Networks: http://www.liaad.up.pt/~ltorgo/DataMiningWithR/ . I'm interested in seeing how these off-the-shelf algorithms perform.
Vikram: Any comments on how text mining on all news articles in the time frame can help improve results? I've read quite a few articles, all of which say that the news component is less influential than others factors involved. And many of those do not really mention how the news articles are being understood. Maybe this can help augment other approaches.
Trend Mining with Semantic-Based Learning http://people.csail.mit.edu/pcm/ESWC08PHD/streibel.pdf
TREMA trend mining project http://www.projekt-trema.de/publikationen.html
Random Field Ising Model http://ideas.repec.org/p/sfi/sfiwpa/500060.html#download
A page for tools / packages / code that might be useful for our project.