Was I Overthinking, or Was I Not? (Featuring Thoughts From a Data Scientist)

Over the break, I had the chance to speak (briefly) with Andrew Zatlin, founder and editor of www.moneyballeconomics.com and former worker in data collection. Recently, he has transitioned away from simple data collection to implementing different metrics for insights into the markets. As you can infer from the name of his website, Zatlin’s key focus is data, but rather than the fundamental data, tick data, and book pressure which high frequency traders so highly praise, Zatlin looks at more macroeconomic indicators. For instance, he works as a forecaster for retail, joblessness claims, and payrolls, to name a few. The idea here is interesting, but not unheard of: You use forecasts of the economy as a whole to guide your predictions of the market, upholding the belief that the stock market directly reflects economic growth. Some people use employment numbers, others use GDP outlooks (like I outlined in an earlier post). With GDP, for instance, the theory is that growth in GDP should correlate with general growth across the markets. I agree, as overall growth should imply growth in the markets, but this is nonetheless a vague projection, and as such there is little margin for profit.
Image result for gdp predictions vs stock market
Zatlin talked a lot about the idea of crawling the internet to acquire data easily, which is an interesting topic to consider as I transition into more discrete data sets (Fed-Funds Rate, Gold Price Data) which aren’t as readily available to the public. Crawling the internet is basically the idea of strategically reading through (crawling) online sites and acquiring desired content through an algorithm. This is what most modern search engines use to index web pages, as they read through content and pick-out keywords (among many other things, I’m sure). I am not too familiar with web crawling, but I know my dad and many of his colleagues have worked with this, so that could be a good place to start consulting for information.
Image result for grouping
Interestingly enough, Zatlin later brought up the concept of using embeddings while searching through new data to avoid repeated information, for clustering, and for efficiency. The idea is relatively simple: Since Word2Vec maps single words of similar meaning, you could use a model such as Universal Sentence Encoder to create embeddings for entire data entries, and then use Word2Vec to map similar data entries. This is incredibly powerful and efficient if done correctly, as you can reduce databases to only truly unique data which is relevant to you. In past computer science classes, I’ve worked with optimizing data by keeping unique entries, but my definition of what constitutes a unique data entry was limited to searching through a data set and deleting duplicates. However, data embeddings would be much more powerful than this, and would allow us to remove fundamentally similar data entries. Say we have an extremely large pool of data, with hundreds of articles detailing the evolution of saguaro cacti. In most cases it is great to have a lot of information, but too much input can be burdensome as well, as Nassim Taleb warned us. So, in reality, we don’t need (nor want) one hundred articles all talking about saguaro cacti, we would only want those which contain crucial information (and no repeated info). Word2Vec would cluster articles of similar significance (using the same context-based training I explained in earlier posts) together. That means, we could look at a certain cluster of 10 articles and pick out one, with the idea being that this one article will teach us about cacti as much as the other nine. By the end, we’ve reduced the size of our data set by 90% while preserving ~90% of the  content!
Data optimization may seem unrelated to predicting the stock market, but it is important to consider all of these factors when the datasets you start to use grow larger and larger. For instance, I recently downloaded a file of data which contained all daily open/close/volume for all stocks and ETfs beginning with the year 1970. You can imagine how large of a file this would be, and training a neural network on such a large amount of data is time costly. I could counteract this issue through cross-validation, as Mr. Coste talked about in my previous interview. But, with my limited knowledge of k-fold neural networks, I think it would be wiser to devote my time to testing real predictive models than have my efforts hampered by such tangential endeavors.
Finally, Zatlin and I talked briefly about the nature of predicting the stock market (which, as I speak with more and more people, I’m starting to learn is quite subjective). His view is, as I mentioned earlier, is that for large-scale predictions, the only applicable data are growth metrics like GDP and unemployment forecasts. To him, these are “the driving forces” of the economy, and the stock market will reflect that. This is a refreshing take, after speaking with mathematically-oriented high-frequency traders.
Switching gears away from this conversation, let’s take a look at the coding end of my project. You might remember that a couple of posts ago, I talked about the difficulty of creating input vectors for stock data, because of all the possible parameters and definitions of what a ‘stock event’ is. This issue was quite problematic for a while, as I couldn’t start training
I decided to begin by looking at daily closing prices, creating a dictionary of daily close prices (keys) and respective dates (values), only to later realize a crucial flaw.
Can you guess what it is?
It is senseless to make a dictionary with close prices as your keys, since a stock might happen to have two identical close prices at different dates! No worries though, as no two dates are alike, so we can simply make the dates be the keys.
A dictionary might seem like the right approach for storing this type of data, but it turns out this is not an ideal data structure for my purposes. Consider the following: What happens if we do find a ‘stock event’ embedding (in this simple case, the embedding for the close price of a day) which matches our query embedding, and we want to use this insight to make a prediction for what will happen tomorrow? To do this, we would need to know what the neighbors of this ‘stock event’ are, or in layman’s terms, we need to be able to access the prices of the dates close to our target date. This can be done by creating a linked list, where each element in the list is linked to its previous day and next day.
(“That’s weird that you call your class fingerPrint()…” Yes, it is, but this will make more sense after my next post).
So, now we have all daily data loaded into a linked list, with each element containing its own date, its closing price, its next day neighbor’s date, and its previous day neighbor’s date. This is great for organizational purposes, but is rather useless for training a Word2Vec network, as Word2Vec is only interested in the ‘words’ (which are the prices in our case). So, we can easily create a list of all prices, treat this list as a data corpus, and train our Word2Vec model on it. The catch is that the prices must be ordered in chronological order (and this is crucial!), because Word2Vec is trying to learn historical patterns through context. If the prices aren’t in chronological order, it undermines the whole purpose of our endeavor, as we would simply be learning the contexts of random numbers. With chronological order, we learn which price shifts occur in what historical contexts, and what events lead to gains/losses in closing prices.
So, now that we have our list of prices, this brings me to my next (potential) revelation, which is on the matter of input vectors. We can’t simply feed a corpus of numbers into Word2Vec and expect it to work, because Word2Vec does not train on numbers, only strings. The past two weeks, I spent an unreasonable amount of time thinking about how we can create Word2Vec input vectors for stock prices, considering options such as z scores or categorizing price shifts as such: Small price gain, medium price gain, large price gain, small price loss, medium price loss, and large price loss.
But, I might have been overthinking all along! Why over complicate this process, when we can simply convert the price (a float) to a string? Once we have these strings, we can treat them as ‘words’ and train Word2Vec on them, even though they are not actually words. This may seem overly simplistic, but it makes technical sense: Word2Vec learns based on context, and we can learn the context that the prices appear in by treating them as words (which is all we’re really interested in). If this is not enough, I’ve found a Python library which converts numbers into actual written words (121 –> one hundred and twenty one).  
With all of this in mind, I can already start training my Word2Vec neural net using the strategy I explained above. We will see how viable this strategy is once I start producing results (stay tuned!).
Image result for shazam app
My next post will be about how we can use Shazam as a source of inspiration for predicting the stock market (stay tuned for this too!).
Also, on the interview end, now is probably the worst time of the year for reaching out to academia, with university finals happening these next two weeks. But, as with stocks, every drop is usually followed by a rebound, and I will have a month-long window to talk with college students during their winter breaks.

Leave a Reply

Your email address will not be published. Required fields are marked *