Deep Learning the Stock Market, from the Perspective of a HFT Developer

A couple of days ago, I spoke with Tom Conerly, an alum of my current high school and current developer at secretive trading research company Jump Trading. As part of my research project in Word2Vec for market predictions, I want to speak with people with actual working experience in the fields of quantitative trading, developing trading algorithms, or general asset management. I am still looking for more people to talk to, so if you have any suggestions or references, please let me know! 
Anyways, back to the interview. Since trading research companies work on the premise of having slight strategical advantages over their competitors, I didn’t expect to get any concrete findings about Jump Trading’s algorithms. Rather, I wanted to get a general idea of how Jump Trading works, and whether or not they have faith in using deep learning for making predictions. Here are the questions and answers from my interview, with answers being paraphrased since I wasn’t allowed to record the conversation: 
S.A: What is life truly like as a newcomer in a trading firm? Is it actually like the horror stories people write about?
T.C:  Jump Trading is divided into separate, small trading teams which all work on unique trading strategies. Hence, the hours aren’t crazy, and it comes off more as a research job, with not that much variance. Investment banking is definitely different, as it is more of a high-stress environment involving watching real-time stock prices and trades go through. This is where one could see more hazing, working people hard, but not so much at a research company like Jump Trading. 
S.A: Working at a company focused on algorithmic trading strategies, is there a stronger emphasis on mathematical modeling, such as statistical probabilities, or on deep learning strategies when creating your algorithms?
T.C: Again, Jump Trading is not a bunch of traders watching in real time, or using software to make trades. There is less ‘human’ involvement in the trading [we] do, as the people work on developing and testing the most efficient, profitable HFT strategies. In this development, there is definitely a strong emphasis on mathematical modeling, but AI is used, only smaller models such as linear regression. With Deep Learning, it ends up being a continuous a trade-off between how powerful the model is, and the negative impacts of stock market noise. This is why in HFT, Deep Learning is uncommon. 
S.A: What’s your opinion on using deep learning to recognize patterns in the behavior of markets? Do you think this could be a profitable strategy? Some say the behavior of the markets don’t follow easily-recognizable patterns, others say they work in identical cycles. 
T.C: [My] intuition is this is not the right direction to go, again because of the fact that the stock market tends to be incredibly noisy. If you are trying to trade high frequency, which means holding assets often for only less than a minute, deep learning won’t be particularly helpful. However, Deep Learning could be a viable hedge-fund strategy, as hedge-funds tend to work on a more macro scale and hold assets for much longer periods of time. There definitely could be a quant trading company doing this right now. 
S.A: What’s one factor you guys consider most heavily when gauging the value of an asset (say: stock, bond, future, currency)?
T.C: When [we] look at an asset, [we] don’t take into account revenue, profits, or general fundamental analysis, as [we] believe all of these factors are reflected by the prices of the assets. The analysis done is more directed towards the real-time state of the market, and one primary way of doing this is by analyzing book pressure (how number of buyers at a given point in time compares to the number of sellers, and vice versa). But, this is not to say this is the only way to gauge the value of assets, as there are many different ways to focus on trading. In [our] case though, the focus is on smaller profits but in larger quantities. 
S.A: What’s the most important factor to consider when creating a trading algorithm?
T.C: It really depends what data you’re building your trading algorithm on. For [us], like I said, book pressure, recent vs. expected prices, are important, while not paying too much attention to what comes out of company earning reports
S.A: How much do market cycles influence your firm’s trading strategies?
T.C: They basically don’t. There are some trading conditions which might affect you, but it really is more of a small-scale approach for [us]. Market cycles, macro trends are more hedge-fund oriented, where investors go long/short for bigger bets on longer time frames. Whereas, in trading over a shorter time period, you have more valuable data to train on. For example, if you are trying to use data to make long-term predictions, you don’t really have that much online data spanning pre-1990s, which limits the scope and ability you have to train your neural net. 
But, on the other hand, hedge-funds can take all kinds of approaches toward making investments, giving them more varied sources of data. 

Related image

What can we take away from this? 

Though high frequency trading differs in many ways from using deep learning in financial pattern recognition, it is interesting (and definitely valuable) to hear from these different perspectives. This interview aligns well with my previous post, where I talk about the potential shortcomings of deep learning for trading algorithms. Mr. Conerly seems to agree with the idea that it isn’t profitable to replace human trading experts with powerful deep learning tools, such as Word2Vec. But, on the other side, Mr. Conerly did acknowledge that some secretive hedge/quant funds may very well be using these types of AI tools for their projections. He adds that HFT uses a lot more data than macro-analysis, since they are analyzing prices on a second-to-second basis (more like milliseconds). This makes me think of the pros/cons of testing my Word2Vec net on macro-scale data vs. micro-scale data. Micro-scale analysis, according to Mr. Conerly, would give you more data to work with (and a more accurate neural net as a result), but the predictions would be less significant and meaningful since they’re tailored to a smaller scale. Macro-scale analysis would yield bigger, more significant predictions (larger margin for error, bigger margin for profit) but would presumably have less data available for training. 
Another thing I found really interesting and applicable to my work is Mr. Conerly’s point on how they [developers at Jump Trading] don’t take fundamental data, or time, into account when investing. His point was that fundamental data doesn’t need to be analyzed because this data is reflected accurately through the stock’s price (efficient market theory), and that specific time intervals are irrelevant when predicting if a stock will go up or down. Disregarding fundamental data (earnings, P/E ratios, company debt, ROE, etc…) and time intervals in a training set would save enormous amounts of computational power, money, and time, so this would be ideal. But, this brings along another question, which is how efficient is the efficient market theory? 
These questions that are starting to emerge are more subjective and uncertain than the ones I asked in my last post, so again, I am not anticipating clear answers. Once again, it is seeming as if these are questions I’ll have to answer on my own, through my research. 
Looking forward, it is now clear that I need to find some actual data sets and start training, predicting (back testing for now), and taking notes of the results. I will need to develop a more explicit research plan, which I will post sometime this week. 

The Shortcomings of Neural Networks for Trading Predictions

As someone who is devoting a large-portion of their senior year (and very likely time beyond that) to researching potential applications of deep learning in trading, I wasn’t thrilled to learn about the recent shortcomings of quantitative traders. Let’s begin with Marcos López de Prado, a frequently cited algorithmic trader who recently  published Advances in Financial Machine Learning. One thing that De Prado talks about is the idea of ‘red-herring patterns’ that are extrapolated by machine learning algorithms. These types of algorithms are, by design, created to analyze large bodies of data and identify patterns within this data. In fact, this idea of noticing patterns is one of the main assumptions I am basing my work on (using Word2Vec embeddings to identify past financial patterns and apply them to real-time data for more accurate predictions). But, what happens when these algorithms identify patterns that aren’t real? An aggressive neural network (In my case: One which adjusts vector weights heavily while learning from data) is prone to make these types of mistakes. Think of this example: A stock happens to go up a couple percent points every Thursday for three weeks in a row. A (poorly written) neural network would deduce that every Thursday in the future, this stock would go up by at least a percent point or two. Now, this is easily avoidable by training a trading algorithm on larger sets of data, but even large data sets are prone to these types of red-herrings. Once a trading algorithm clings on to a pattern, it could backfire horribly when that pattern eventually breaks.
This brings the idea of Black Swans into light. The theory of Black Swans was popularized by Nasim Taleb in his accurately-titled book The Black Swan: The Impact of the Highly Improbable. The general gist of this theory is that the most profoundly impactful events oftentimes are the ones we least expect, due to our fallacious tendencies in analyzing statistics (I will go into more detail on these topics and more in a future blog post, once I am done reading the whole book). Taleb argues that one of our biggest shortcomings in analyzing data is creating ‘false narratives’, which are more convenient and easier to sell to clients. These false narratives oftentimes omit crucial data (silent data), which backfires once the narrative breaks.
But, on the other end, a more passive neural network (one which more slightly adjusts vector weights) can sometimes come to no meaningful conclusions, which means wasted time and computational energy. I want to create a Word2Vec model which can detect patterns, but I also don’t want it to actively follow patterns with no longevity.
So, what does one do? How aggressive/passive should I make my Word2Vec neural network? 

Another theory which I encountered over the weekend is the idea of survivorship bias. In training neural networks, how do we treat data from companies which have failed? If we are analyzing the stock price data for various important stocks over time, what do we with data from once-important stocks which are now defunct, such as Lehman Brothers? I initially thought it would be best to throw this data out, since it is no longer applicable, but it turns out this strategy can have negative consequences. If we only train our network on stocks which have survived, then we will miss out on crucial data about when stocks go bankrupt. So, how do we properly treat this type of data?


All of these seemingly insignificant flaws in trading algorithms can evoke catastrophic mistakes. This concept is synthesized by quantitative investment officer Nigol Koulajian, saying: “You can have one little pindrop that can basically make you lose over 20 years of returns.” This ‘little pindrop’ which Koulajian mentions is the eventual divergence from the false patterns identified by neural networks. I personally think it would take more than a little pindrop to erase 20 years of returns, but the idea still stands. So, this warrants the question, how do we avoid the little pindrop? My (far-fetched?) theory is that you can use neural networks to estimate worst-case scenarios int the same way they are designed to estimate best-case scenarios, and then work to avoid this.
In broader terms, Bloomberg reports that the Eureka Hedge Fund Index, which tracks the returns of hedge funds which are known for using machine learning, has under performed yearly compared to the S&P 500. The harsh truth (right now) is that simply investing in the S&P500 will return ~13% yearly, while machine-learning based hedge funds return ~9% yearly.
Eureka Hedge Fund Index
(The keen observer will notice that despite all the noise, the index has been steadily going up over the past 7 years)
These are some of the questions I ask those few who read what I am writing, and are the types of questions I will ask through my personal research interviews (Good News! I have my first interview scheduled this upcoming Tuesday, and, interviewee permitting, I will post a summary of our talk later in the week).
In my personal opinion, the recent under performance of trading algorithms in general is not a bad sign. This is still a relatively new field, meaning that more research needs to be done and new discoveries need to be made. I think of it this way: If trading algorithms are working perfectly, then what’s the point of a newcomer (like me) coming in and doing research on them? If it ain’t broke, don’t fix it.