The Shortcomings of Neural Networks for Trading Predictions

As someone who is devoting a large-portion of their senior year (and very likely time beyond that) to researching potential applications of deep learning in trading, I wasn’t thrilled to learn about the recent shortcomings of quantitative traders. Let’s begin with Marcos López de Prado, a frequently cited algorithmic trader who recently  published Advances in Financial Machine Learning. One thing that De Prado talks about is the idea of ‘red-herring patterns’ that are extrapolated by machine learning algorithms. These types of algorithms are, by design, created to analyze large bodies of data and identify patterns within this data. In fact, this idea of noticing patterns is one of the main assumptions I am basing my work on (using Word2Vec embeddings to identify past financial patterns and apply them to real-time data for more accurate predictions). But, what happens when these algorithms identify patterns that aren’t real? An aggressive neural network (In my case: One which adjusts vector weights heavily while learning from data) is prone to make these types of mistakes. Think of this example: A stock happens to go up a couple percent points every Thursday for three weeks in a row. A (poorly written) neural network would deduce that every Thursday in the future, this stock would go up by at least a percent point or two. Now, this is easily avoidable by training a trading algorithm on larger sets of data, but even large data sets are prone to these types of red-herrings. Once a trading algorithm clings on to a pattern, it could backfire horribly when that pattern eventually breaks.
This brings the idea of Black Swans into light. The theory of Black Swans was popularized by Nasim Taleb in his accurately-titled book The Black Swan: The Impact of the Highly Improbable. The general gist of this theory is that the most profoundly impactful events oftentimes are the ones we least expect, due to our fallacious tendencies in analyzing statistics (I will go into more detail on these topics and more in a future blog post, once I am done reading the whole book). Taleb argues that one of our biggest shortcomings in analyzing data is creating ‘false narratives’, which are more convenient and easier to sell to clients. These false narratives oftentimes omit crucial data (silent data), which backfires once the narrative breaks.
But, on the other end, a more passive neural network (one which more slightly adjusts vector weights) can sometimes come to no meaningful conclusions, which means wasted time and computational energy. I want to create a Word2Vec model which can detect patterns, but I also don’t want it to actively follow patterns with no longevity.
So, what does one do? How aggressive/passive should I make my Word2Vec neural network? 

Another theory which I encountered over the weekend is the idea of survivorship bias. In training neural networks, how do we treat data from companies which have failed? If we are analyzing the stock price data for various important stocks over time, what do we with data from once-important stocks which are now defunct, such as Lehman Brothers? I initially thought it would be best to throw this data out, since it is no longer applicable, but it turns out this strategy can have negative consequences. If we only train our network on stocks which have survived, then we will miss out on crucial data about when stocks go bankrupt. So, how do we properly treat this type of data?


All of these seemingly insignificant flaws in trading algorithms can evoke catastrophic mistakes. This concept is synthesized by quantitative investment officer Nigol Koulajian, saying: “You can have one little pindrop that can basically make you lose over 20 years of returns.” This ‘little pindrop’ which Koulajian mentions is the eventual divergence from the false patterns identified by neural networks. I personally think it would take more than a little pindrop to erase 20 years of returns, but the idea still stands. So, this warrants the question, how do we avoid the little pindrop? My (far-fetched?) theory is that you can use neural networks to estimate worst-case scenarios int the same way they are designed to estimate best-case scenarios, and then work to avoid this.
In broader terms, Bloomberg reports that the Eureka Hedge Fund Index, which tracks the returns of hedge funds which are known for using machine learning, has under performed yearly compared to the S&P 500. The harsh truth (right now) is that simply investing in the S&P500 will return ~13% yearly, while machine-learning based hedge funds return ~9% yearly.
Eureka Hedge Fund Index
(The keen observer will notice that despite all the noise, the index has been steadily going up over the past 7 years)
These are some of the questions I ask those few who read what I am writing, and are the types of questions I will ask through my personal research interviews (Good News! I have my first interview scheduled this upcoming Tuesday, and, interviewee permitting, I will post a summary of our talk later in the week).
In my personal opinion, the recent under performance of trading algorithms in general is not a bad sign. This is still a relatively new field, meaning that more research needs to be done and new discoveries need to be made. I think of it this way: If trading algorithms are working perfectly, then what’s the point of a newcomer (like me) coming in and doing research on them? If it ain’t broke, don’t fix it.

William Deresiewicz and Homework-Induced Stress

Last week I attended a lecture/Q&A hosted by William Deresiewicz, the famous ex Yale professor who criticizes schools and questions the efficacy of the stressful college application processes. Throughout the discussion, he wasn’t hesitant to remind the crowd full of future college students that they shouldn’t stress out about homework assignments and similar trifles. One statement regarding school-induced stress that stood out to most of the attendees was “The truth is that you guys are not going to end up sleeping under a bridge; it is much more likely that you’ll end up jumping off of one instead.”  This was a powerful assertion which is presumably correct, considering the ridiculous amounts of mental pressure that teenagers have to deal with nowadays. (Click here if you don’t trust my claim).
As a teenager who lives in America, I can confirm, because of interactions with peers and from personal experience as well, that teenagers in America are actually subject to large amounts of stress. Obviously, there are people who really don’t care about what’s happening in their classes and avoid doing homework at all costs, but if you want to be somewhat “successful” in your high school education, then you will have to wade through the copious amount of work.
Thanks to recent research and to books published by people like William Deresiewicz, most people in America are aware of this fact. But like with most situations in the nation, it seems as if we are spending too much time attempting to find out how to deal with the problem that we are facing rather than actually asking ourselves why the problem exists in the first place. We spend a considerable amount time talking to our kids about ways to cope with stress so to make sure that they don’t experience mental breakdowns, but we have never discussed why such pressure subsists.
So, why do American high schools assign so much homework? Why are college admission processes so demanding and time consuming?