Recently, I stumbled upon an article written by Tal Perry, founder of lighttag.io, which detailed his attempt to use Word2Vec on market data. He called it Market2Vec, and the basis behind his work was to create stock embeddings, rather than word embeddings, using the Word2Vec framework. The data he implemented was open/close & high/low prices for 1000 different stocks, and he transferred this data to create a single 300-dimensional input vector. (*Right off the bat, I notice that I need to look into how one would condense a vector with thousands of dimensions into one with <=300 dimensions, which is the largest acceptable input vector size for Word2Vec models). Using these condensed input vectors, he was able to pass them through a recurrent neural network which outputs the probability of future activity of these stocks. (There is a lot more at work going on here and I am simply brushing over it to get his idea across — I would recommend reading his article for the details.) This ‘probability’ needs to be defined, and can follow pretty much any definition tailored to the data you want to model. For example, you could train the neural network to output a ‘1’ if a certain index goes up 1% in a certain amount of time, and a ‘0’ if it doesn’t. Then, the neural network, using the data you provide it, would return the probability of a 1 or a 0 happening in a certain context. This is incredibly useful information, as the work done by Mr.Perry can be modeled using almost any input vector, and can be used to predict changes in any data set (say, the S&P500 rather than the VIX, which is what he tried to predict).
Knowing this, I have a general idea of the buildings blocks I will base my original trading algorithm on.
- Market2vec (Fundamental data set)
- NLP and investor sentiment (Partner data set, Word2Vec)
- Connecting trends, cycles (Partner data set, Word2Vec)
Part 1 is essentially what I just talked about above, which is using fundamental data sets (such as opening/closing prices for hundreds of stocks) to make certain predictions generated by Word2Vec embeddings. This is my current focus, and what I am working on right now.
For part 2, I have heard mixed opinions on the effectiveness of analyzing investor sentiment to predict changes in stock price. For example, one comment I received is that sentiment oftentimes comes after significant shifts in market prices, at which point the data is meaningless since it comes in too late. Also, the amount of data and statistical analysis required to reach accurate predictions based on investor sentiment is huge, and the calculations wouldn’t be fast enough in real-time to invest appropriately. However, I do think it’s an interesting topic to explore, and is worth trying. One thought I had is to focus on figuring out sentiment preceding a company’s earnings release, rather than aimlessly analyzing sentiment on an undefined spectrum (which would require more data and would be less meaningful). If the calculated sentiment in regards to an earnings release is overwhelmingly positive, it would make sense to buy stock of the company before the earnings are announced, and vice-versa.
Part 3 is a more ambitious goal of mine, which I will define in greater detail in future posts.
A final thing which I have been learning about are z-scores, which came up while studying a generic mean-reversion algorithm (this is a simple trading algorithm which takes the highest/lowest performing stocks from last week, and predicts that in the upcoming week, they will perform opposite as they did last week. So, low-performing stocks last week will perform well this week, and vice-versa). The idea of z-scores is to simplify changes in stock prices, and make this data more meaningful. For example, if Amazon’s stock goes up 5$, it would be inconsequential, as 5$ is such a small percentage of Amazon’s stock price. However, if a penny stock goes up 5$ in a day, it would be the best day of someone’s life, giving unprecedented returns. Same goes with volatility: Z-scores have to take the regular volatility of a stock into account. If a stock whose price rarely shifts goes up a lot in a day, it would mean a lot more than an increase in a stock whose price is prone to volatility.
To make data more uniform and easier to work with, we calculate z-scores, which take both volatility and percentages into account, by subtracting mean from the raw score then dividing by the standard deviation. This is an incredibly useful tool to know about, as it allows us to compare price changes across all kinds of assets.