Financial Implications of “The Black Swan” by Nassim Nicholas Taleb

The Black Swan by Nassim Taleb is one of the best, and one of the most interesting, books I’ve read in awhile. I’ve heard people reference Taleb’s work a few times before, mainly in financial settings, but I never quite understood why, which is what inspired me to read this. In general, Taleb argues that throughout human history, nearly all majorly significant events were considered highly improbable before they occured (think: Fall of Roman Empire, rise of Nazi Germany, Bubonic Plague pandemic, sinking of the Titanic). Essentially, the events which have the greatest impact on us lie totally outside of our field of prediction, no matter how much past data, observations, and intuitions we use in making these predictions. These momentous, unpredictable events are called Black Swans, known professionally as fat tails, because of they are perceived as rare (Taleb actually argues that these events are much more common than we’d like to think).
Taleb helps us visualize the phenomenon of Black Swans through two fictional worlds: Mediocristan and Extremistan. Mediocristan is a province where “particular events don’t contribute much individually — only collectively” (Taleb 32), and things are distributed rather evenly. Extremistan, on the other hand, is a world of extremes, where “inequalities are such that one single observation can disproportionately impact the aggregate, or the total” (33). Taleb claims that we live in Extremistan, as much as we’d like to believe it to be Mediocristan, since single events can have disproportionately large impacts on our societies. Evidence of us living Exremistan is seen through other metrics as well, most notably through wealth divides (top 1% vs 99%).
Image result for population distribution us
This theory of Black Swans being more significant than regular accumulations of events holds true in all fields in Extremistan, especially (and this is most relevant for us) in finance. One fact Taleb mentions really dumbfounded me, which is that: “In the last fifty years, the ten most extreme days in the financial markets represent half the returns” (275). 10 days in fifty years. The alternative, which is not addressed directly by Taleb, is likely also true: In the last fifty years, the ten most extremely negative days in the financial markets represent half the losses. This is so incredible, and at first this made me wonder why people (like me) try so hard to predict what will happen to a stock’s price day-to-day, when they should really be trying to predict the next Black Swan. I then realized that if a Black Swan can be predicted, it is no longer a Black Swan, and the opportunities for profit are no longer as large since more people are expecting it to happen. Also, once a Black Swan is predicted, a new Black Swan emerges outside of this prediction which becomes the next true Black Swan.
This is also a very sobering thought, considering the work I’m currently doing in predictive trading algorithms. I’m sure Taleb would laugh at the idea of me using historical financial data to predict what will happen to a stock tomorrow, next week, or next month. In his book, Taleb talks a lot about the misleading nature of data, using the example of “1001 days of history”, noting “You observe a hypothetical variable for one thousand days. You subsequently derive solely from this past data a few conclusions concerning the properties of the pattern with projections for the thousand, even five thousand, days. On the thousand and first day — boom! A big change takes place that is completely unprepared for by the past.” I enjoy picturing this concept like this: I know I have been alive every day for the past 40 years, so in conclusion, I can use this data to determine I will be alive tomorrow, next month, and the next 40 years after that! The irony here is that the one event which lies outside of my predictions would be the most significant event in my life. Yet, I am not deterred. I do think that patterns exist in market behavior, and because these patterns repeat cyclically, they can be used to make predictions. The mere existence of Black Swans doesn’t negate the existence of patterns, it’s just possible that one day these patterns will break. I think hedging using options is enough to counter potential negative effects, I just need to learn more about how to do this efficiently.
Another key point brought up is the effect of silent evidence, which relates to history being written by the winners and not the losers. “It is so easy to avoid looking at the cemetery while concocting historical theories” (101) says Taleb, which reminds me of the problem of survivor ship bias in quantitative trading. Survivor ship bias occurs when we train models on existing companies and their stocks, while forgetting the companies which no longer exist (bankruptcies, acquisitions). Oftentimes, this omitted data (the silent data) is most important, as we can learn what events leads up to a bankruptcy or sudden stock collapse. This relates to the idea of blind risk usually leading to better short term rewards, whilst completely backfiring later on down the road: “The fools, the Casanovas, and the blind risk takers are often the ones who win in the short term” (117).
Image result for bankrupt companies
A third interesting point which really made me think was Taleb’s take on information, and more importantly, misinformation. He writes: “The more information you give someone, the more hypotheses they will form along the way, and the worse off they will be. They see more random noise and mistake it for information” (144). This really hit close to home, as someone who is naturally paranoid and always thinking that the solution to my problems lies in all the many books, studies, and reports I haven’t read. It also makes me think about not mentally constructing narratives (which, more often than not, become red-herrings) based off of information I have.
Finally, despite being so apprehensive on the idea of predicting the future, Taleb also claims that “In the end we are being driven by history, all the while thinking that we are the ones doing the driving.” There are many ways to interpret this great quote, but I see it as history being contextual. In other words, we don’t control history, rather, events happen and build off of one another, whether this be in random fashion or through similarly repeating patterns (the latter being my theory, though this might just be me mentally constructing a narrative based on information I have).
If Mr. Taleb ran a hedge fund (which he previously did, under the name Empirica Capital), I’m guessing his strategy would be to benefit off short term, consistent profits while always hedging against potentially huge losses. This way, he would be able to see gradual returns safeguarded from catastrophic events. Likewise, he could make his bigger bets on positive Black Swans, while pursuing a small-scale trading strategy in the time between these rare occurrences. This is where the options trading, which my previous interviewee brought up, comes into play. A strategy founded on single, greatly dispersed events which are not even guaranteed to happen may sound like an overly passive, boring, and minimally profitable way to run a hedge fund. However, in recent years, many asset management companies have gone bust by neglecting Taleb’s advice and discrediting the power of Black Swans. One such instance which comes to mind are the number of hedge funds which invested in cryptocurrencies like Bitcoin, not expecting the price to drop from ~$20,000 all the way down to sub-$4,000.
Bitcoin Graph
Another more salient example is with, a Florida-based hedge fund which recently went bust after unexpected volatility caused his positions to collapse completely. The hedge fund’s manager, who I don’t need to name because he is receiving enough bad publicity already, founded this fund on the premise of ‘naked options’ (as opposed to covered options) — essentially buying either puts or calls without hedging potential losses as a method to save money. The problem with this strategy is that if you sell a call for a stock at a strike price of $50, you are hoping that the stock will remain    <= $50 up to a certain date. However, if the stock rises above 50$, the call seller loses money, since he has to buy the stock at this price which is more than what he paid for. In a very unlikely, and very unfortunate, scenario, the price of the stock could grow an incredible amount (say it goes up to $500), and the resulting losses would be even more incredible. This is what happened to, which lost all of its clients’ investments (+ more) because of an unpredictably volatile period for crude oil prices.

To Conclude…

I’ve always found that finishing a quality book is an overwhelming experience, due to the surplus of information you quickly gained (not to mention trying to remember all of this!) and also because it leaves you feeling a bit empty, as if you’ve lost something. Right now, I think of what Taleb’s work can teach me about my project. Some lessons I’ve learned from reading this book are:

  1. I need to think about what a Black Swan would look like in my context, and how I could protect against this.
  2. I need to familiarize yourself with statistical terminology. This problem was apparent to me before reading the book, but it’s now been solidified. I have never taken a formal statistics class (which is fine considering Taleb’s theory on harmful over-saturation of information), but it’s clear that I have to do some more focused learning on my own.
  3. “Note that a ‘history’ is just a series of numbers through time” (119). I was so happy when I read this, as it immediately made me think about how my project is essentially using ‘a series of numbers’ (right now, open/close prices + volatility) to map historical events.
  4. Don’t be quick to construct narratives, as this can greatly distort perception and information (you try to fit new information to the narrative you’ve established, without ever thinking if this makes sense).
  5. Don’t be too stressed about shortcomings, because failure is only failure if you are failing on your own established objectives, not some nebulous criteria set by outside sources.

Trying to cover every interesting topic raised in The Black Swan would be a senseless attempt, as you can simply read the book yourself considering that Taleb does a much better job explaining these issues than I do. The topics I mentioned above are the ones I find most relevant to my independent project on using Word2Vec to map similar patterns in market behavior, and also some which intrigued me most. I’m interested in hearing what other people who have read/know of this book think about its implications, and what some important lessons I might be missing?
I’m looking forward to my next read, which is a bit more focused and specific than the previous two: Advances in Financial Machine Learning by Marcos Lopez de Prado. From what I’ve heard (and read), this is more of a textbook-type piece which looks at actual solutions to common problems in ML and trading algorithms. Apparently, “Readers become active users who can test the proposed solutions in their particular setting”, which would be great in my case, as I’m moving into more actual programming and implementation.

“We no longer believe in papal infallibility; we seem to believe in the infallibility of the Nobel, though.” (Taleb, 291)

How Word2Vec Accounts for Fundamental Stock Data, and Other Cool Things

How can anyone possibly use Word2Vec on a single parameter of financial data, such as opening and close prices of stocks, to make any sort of meaningful prediction? Are we simply going to ignore the heap of fundamental data behind every stock, such as P/E ratios, return on assets, etc…?
I’ve pondered these questions for a while, and I’ve finally found an answer: Word2Vec doesn’t know anything about the fundamental data behind words (grammar, syntax, or even definitions of individual words), yet it still does remarkably well at matching similar words, as we have seen.
Going off of this observation, is it safe to say that Word2Vec can match stock price shifts of similar significance without knowing anything about the nature of these stocks? My intuition is yes, because of how well Word2Vec does for mapping similar words/texts, but also because of how Word2Vec models are designed on a technical level. There are two variations of Word2Vec, continuous bag of words (CBOW) and skip-gram models, with skip-gram models being better-suited for larger datasets (meaning I’ll probably have to use skip-grams). However, both models work in similar ways: You read through a corpus of text, generate vectors for each unique word within the text, and then create new vector embeddings (what we’re actually looking for) based on the context these words appear in.
So, how are these initial vectors created? It’s quite simple, actually. For the first word in the corpus, you start by creating a one-element array, which looks like: [1]. If the word is ‘The’, then the word ‘the’ corresponds to the first element in the array. For the next unique word you encounter, you create a new array: [0, 1], but you must also add a zero to the end of the first array as well: [1,0]. If the second word is ‘mighty’, then the second element in the array corresponds to mighty. Whenever an array’s second element == 1, then you know the array corresponds to the word ‘mighty’.
You go on like this until you’ve accounted for every unique word. You will end up with an array whose size == the number of unique words, and each word is represented by a huge array full of zeroes with a single 1 at the location which corresponds to that word. For instance, if our data set had 20 unique words, the array for the word ‘mighty’ would be [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0].
One observation you can make right off the bat is that the more unique words you have in your dataset, the longer these individual arrays will be, and more computational energy (i.e: time, money) will be required. So, how do we reduce the size of these vectors?
Image result for neural network 1 hidden layer
Recall that Word2Vec neural networks have three layers, with one hidden layer. The first layer, the input, are the arrays we generated above. The second layer, the hidden, is where array compression happens. This array compression works because the hidden layer is simply a matrix of weights (a matrix of dimensions: Number of unique words X desired vector size, between 50 and 300 dimensions). We multiply the original array for a word by the learned weight matrix for that word, and voila, we have a vector (still an array) of between 50 and 300 dimensions (elements).
Hidden Layer Weight Matrix
How these weights get adjusted is a story for another time, but in general, they work by calculating probabilities of target words appearing in a certain context:
Training Data
You use nearby words to generate training samples, and you adjust weights for the target word (blue) by seeing how often it appears, throughout the rest of the dataset, in proximity to those samples (how likely ‘quick’ is to appear next to ‘fox’ would be used to adjust weights for ‘quick’).
By the end of training, we are not even interested in the output layer of the neural network, as all we need are the vectors created by multiplying the input layer by the weight matrix. These embedded* vectors can then be used to find words (or events in the stock market) of similar significance.
* Embedded is just another way of saying compressed, so embeddings are essentially compressed vectors.

What Does a Falling Tree Teach Us?

So why I am bringing technical definitions into this? To show that Word2Vec learning is entirely based on context. To me, the stock market is a contextual thing, where future activity is predicated on what has happened before (cause/effect relationships). Many would call me wrong to say this, but no one is ever completely right about the markets, so it is worth taking a look at (Bridgewater Associates should be on my side for this one). Therefore, Word2Vec-generated embeddings do not need to include any fundamental background data about stocks, because we don’t care about these fundamentals when looking at context. If a tree falls over in a forest and hits a nearby house, we don’t care about the exact sedimentary composition beneath the tree’s roots at n seconds before the tree fell, we care that we have learned a new thing: Trees falling can lead to houses being destroyed, and we can use this knowledge to make predictions about what will happen when more trees fall in the future. Sure, the sedimentary composition might have caused the tree to fall, but we don’t know for certain, and if it did, it is reflected through the fact that the tree fell, and it is therefore irrelevant.
Image result for falling trees
This reminds me of a point brought up by HFT trader Tom Conerly when I spoke with him, which is that high frequency traders don’t look at fundamentals of stocks, because they believe all fundamentals are proportionally reflected through market price. This saves space, time, and energy, as they can avoid processing mountains of other data which they would otherwise have to look at.
So, now that I’ve decided to ignore fundamental data when creating my neural network, the next big question I need to answer before actually uploading data is which data I want to train my net on. I’m still thinking of starting with basic open/close prices, and making further judgments based on how well this performs.

Areas of Interest, More Potential Datasets

Another interesting financial correlation I’ve read about, and which can be potentially learned through Word2Vec-type A.I, is the relationship between Federal Reserve Funds Rate (interest prices) and general stock market behavior. As interest rates go up, the stock market tightens and starts to go down.
This is super interesting and incredibly useful information. Looking at the graph, you notice there is a delay period between a shift in FFR and subsequent changes in the market. You can definitely use Word2Vec, or some other neural network, to approximate how long this delay period is, and then use that information to make predictions about shifts in the market caused by changes in FFR.
A second interesting correlation I’ve looked at is the relationship between U.S GDP growth and stock market performance. In essence, the idea is that when GDP is projected to grow, there will be a bull market with inflated prices. The opposite is true as well, with poor GDP outlook reflected through bear markets. So, the goal is to learn what indicators anticipate a growing GDP. There’s a really good post on Medium about this, which claims that the best indicator for GDP performance is ISM’s monthly index score. Again, you could train a Word2Vec model to predict GDP growth, and use that information to make stock price projections.
In addition to doing research on advances in A.I for finance, I am going to spend a lot of time researching known correlation relationships in the market, and seeing if these can be applied as datasets for my neural network. Speaking of personal research, I just finished The Black Swan by Nassim Taleb, and will be writing a reflection post soon which should capture the main points I learned from Taleb.

Most of the visuals and definitions I used for explaining Word2Vec Skip-Grams was through Chris McCormick’s tutorial, which I highly suggest reading. 

Deep Learning the Stock Market, from the Perspective of a HFT Developer

A couple of days ago, I spoke with Tom Conerly, an alum of my current high school and current developer at secretive trading research company Jump Trading. As part of my research project in Word2Vec for market predictions, I want to speak with people with actual working experience in the fields of quantitative trading, developing trading algorithms, or general asset management. I am still looking for more people to talk to, so if you have any suggestions or references, please let me know! 
Anyways, back to the interview. Since trading research companies work on the premise of having slight strategical advantages over their competitors, I didn’t expect to get any concrete findings about Jump Trading’s algorithms. Rather, I wanted to get a general idea of how Jump Trading works, and whether or not they have faith in using deep learning for making predictions. Here are the questions and answers from my interview, with answers being paraphrased since I wasn’t allowed to record the conversation: 
S.A: What is life truly like as a newcomer in a trading firm? Is it actually like the horror stories people write about?
T.C:  Jump Trading is divided into separate, small trading teams which all work on unique trading strategies. Hence, the hours aren’t crazy, and it comes off more as a research job, with not that much variance. Investment banking is definitely different, as it is more of a high-stress environment involving watching real-time stock prices and trades go through. This is where one could see more hazing, working people hard, but not so much at a research company like Jump Trading. 
S.A: Working at a company focused on algorithmic trading strategies, is there a stronger emphasis on mathematical modeling, such as statistical probabilities, or on deep learning strategies when creating your algorithms?
T.C: Again, Jump Trading is not a bunch of traders watching in real time, or using software to make trades. There is less ‘human’ involvement in the trading [we] do, as the people work on developing and testing the most efficient, profitable HFT strategies. In this development, there is definitely a strong emphasis on mathematical modeling, but AI is used, only smaller models such as linear regression. With Deep Learning, it ends up being a continuous a trade-off between how powerful the model is, and the negative impacts of stock market noise. This is why in HFT, Deep Learning is uncommon. 
S.A: What’s your opinion on using deep learning to recognize patterns in the behavior of markets? Do you think this could be a profitable strategy? Some say the behavior of the markets don’t follow easily-recognizable patterns, others say they work in identical cycles. 
T.C: [My] intuition is this is not the right direction to go, again because of the fact that the stock market tends to be incredibly noisy. If you are trying to trade high frequency, which means holding assets often for only less than a minute, deep learning won’t be particularly helpful. However, Deep Learning could be a viable hedge-fund strategy, as hedge-funds tend to work on a more macro scale and hold assets for much longer periods of time. There definitely could be a quant trading company doing this right now. 
S.A: What’s one factor you guys consider most heavily when gauging the value of an asset (say: stock, bond, future, currency)?
T.C: When [we] look at an asset, [we] don’t take into account revenue, profits, or general fundamental analysis, as [we] believe all of these factors are reflected by the prices of the assets. The analysis done is more directed towards the real-time state of the market, and one primary way of doing this is by analyzing book pressure (how number of buyers at a given point in time compares to the number of sellers, and vice versa). But, this is not to say this is the only way to gauge the value of assets, as there are many different ways to focus on trading. In [our] case though, the focus is on smaller profits but in larger quantities. 
S.A: What’s the most important factor to consider when creating a trading algorithm?
T.C: It really depends what data you’re building your trading algorithm on. For [us], like I said, book pressure, recent vs. expected prices, are important, while not paying too much attention to what comes out of company earning reports
S.A: How much do market cycles influence your firm’s trading strategies?
T.C: They basically don’t. There are some trading conditions which might affect you, but it really is more of a small-scale approach for [us]. Market cycles, macro trends are more hedge-fund oriented, where investors go long/short for bigger bets on longer time frames. Whereas, in trading over a shorter time period, you have more valuable data to train on. For example, if you are trying to use data to make long-term predictions, you don’t really have that much online data spanning pre-1990s, which limits the scope and ability you have to train your neural net. 
But, on the other hand, hedge-funds can take all kinds of approaches toward making investments, giving them more varied sources of data. 

Related image

What can we take away from this? 

Though high frequency trading differs in many ways from using deep learning in financial pattern recognition, it is interesting (and definitely valuable) to hear from these different perspectives. This interview aligns well with my previous post, where I talk about the potential shortcomings of deep learning for trading algorithms. Mr. Conerly seems to agree with the idea that it isn’t profitable to replace human trading experts with powerful deep learning tools, such as Word2Vec. But, on the other side, Mr. Conerly did acknowledge that some secretive hedge/quant funds may very well be using these types of AI tools for their projections. He adds that HFT uses a lot more data than macro-analysis, since they are analyzing prices on a second-to-second basis (more like milliseconds). This makes me think of the pros/cons of testing my Word2Vec net on macro-scale data vs. micro-scale data. Micro-scale analysis, according to Mr. Conerly, would give you more data to work with (and a more accurate neural net as a result), but the predictions would be less significant and meaningful since they’re tailored to a smaller scale. Macro-scale analysis would yield bigger, more significant predictions (larger margin for error, bigger margin for profit) but would presumably have less data available for training. 
Another thing I found really interesting and applicable to my work is Mr. Conerly’s point on how they [developers at Jump Trading] don’t take fundamental data, or time, into account when investing. His point was that fundamental data doesn’t need to be analyzed because this data is reflected accurately through the stock’s price (efficient market theory), and that specific time intervals are irrelevant when predicting if a stock will go up or down. Disregarding fundamental data (earnings, P/E ratios, company debt, ROE, etc…) and time intervals in a training set would save enormous amounts of computational power, money, and time, so this would be ideal. But, this brings along another question, which is how efficient is the efficient market theory? 
These questions that are starting to emerge are more subjective and uncertain than the ones I asked in my last post, so again, I am not anticipating clear answers. Once again, it is seeming as if these are questions I’ll have to answer on my own, through my research. 
Looking forward, it is now clear that I need to find some actual data sets and start training, predicting (back testing for now), and taking notes of the results. I will need to develop a more explicit research plan, which I will post sometime this week. 

The Shortcomings of Neural Networks for Trading Predictions

As someone who is devoting a large-portion of their senior year (and very likely time beyond that) to researching potential applications of deep learning in trading, I wasn’t thrilled to learn about the recent shortcomings of quantitative traders. Let’s begin with Marcos López de Prado, a frequently cited algorithmic trader who recently  published Advances in Financial Machine Learning. One thing that De Prado talks about is the idea of ‘red-herring patterns’ that are extrapolated by machine learning algorithms. These types of algorithms are, by design, created to analyze large bodies of data and identify patterns within this data. In fact, this idea of noticing patterns is one of the main assumptions I am basing my work on (using Word2Vec embeddings to identify past financial patterns and apply them to real-time data for more accurate predictions). But, what happens when these algorithms identify patterns that aren’t real? An aggressive neural network (In my case: One which adjusts vector weights heavily while learning from data) is prone to make these types of mistakes. Think of this example: A stock happens to go up a couple percent points every Thursday for three weeks in a row. A (poorly written) neural network would deduce that every Thursday in the future, this stock would go up by at least a percent point or two. Now, this is easily avoidable by training a trading algorithm on larger sets of data, but even large data sets are prone to these types of red-herrings. Once a trading algorithm clings on to a pattern, it could backfire horribly when that pattern eventually breaks.
This brings the idea of Black Swans into light. The theory of Black Swans was popularized by Nasim Taleb in his accurately-titled book The Black Swan: The Impact of the Highly Improbable. The general gist of this theory is that the most profoundly impactful events oftentimes are the ones we least expect, due to our fallacious tendencies in analyzing statistics (I will go into more detail on these topics and more in a future blog post, once I am done reading the whole book). Taleb argues that one of our biggest shortcomings in analyzing data is creating ‘false narratives’, which are more convenient and easier to sell to clients. These false narratives oftentimes omit crucial data (silent data), which backfires once the narrative breaks.
But, on the other end, a more passive neural network (one which more slightly adjusts vector weights) can sometimes come to no meaningful conclusions, which means wasted time and computational energy. I want to create a Word2Vec model which can detect patterns, but I also don’t want it to actively follow patterns with no longevity.
So, what does one do? How aggressive/passive should I make my Word2Vec neural network? 

Another theory which I encountered over the weekend is the idea of survivorship bias. In training neural networks, how do we treat data from companies which have failed? If we are analyzing the stock price data for various important stocks over time, what do we with data from once-important stocks which are now defunct, such as Lehman Brothers? I initially thought it would be best to throw this data out, since it is no longer applicable, but it turns out this strategy can have negative consequences. If we only train our network on stocks which have survived, then we will miss out on crucial data about when stocks go bankrupt. So, how do we properly treat this type of data?


All of these seemingly insignificant flaws in trading algorithms can evoke catastrophic mistakes. This concept is synthesized by quantitative investment officer Nigol Koulajian, saying: “You can have one little pindrop that can basically make you lose over 20 years of returns.” This ‘little pindrop’ which Koulajian mentions is the eventual divergence from the false patterns identified by neural networks. I personally think it would take more than a little pindrop to erase 20 years of returns, but the idea still stands. So, this warrants the question, how do we avoid the little pindrop? My (far-fetched?) theory is that you can use neural networks to estimate worst-case scenarios int the same way they are designed to estimate best-case scenarios, and then work to avoid this.
In broader terms, Bloomberg reports that the Eureka Hedge Fund Index, which tracks the returns of hedge funds which are known for using machine learning, has under performed yearly compared to the S&P 500. The harsh truth (right now) is that simply investing in the S&P500 will return ~13% yearly, while machine-learning based hedge funds return ~9% yearly.
Eureka Hedge Fund Index
(The keen observer will notice that despite all the noise, the index has been steadily going up over the past 7 years)
These are some of the questions I ask those few who read what I am writing, and are the types of questions I will ask through my personal research interviews (Good News! I have my first interview scheduled this upcoming Tuesday, and, interviewee permitting, I will post a summary of our talk later in the week).
In my personal opinion, the recent under performance of trading algorithms in general is not a bad sign. This is still a relatively new field, meaning that more research needs to be done and new discoveries need to be made. I think of it this way: If trading algorithms are working perfectly, then what’s the point of a newcomer (like me) coming in and doing research on them? If it ain’t broke, don’t fix it.

Significance of Gold’s Golden Week + Brainstorming Investment Strategies

Categorizing my Work

This past week, I have been delving deeper into my research on the topics of word embeddings and trading algorithms. My research revealed many new things (some negative, some positive, but I consider all lessons learned to be positive) to me, which I will describe in greater detail later on. The first point I wanted to share is how I plan on structuring my work, as I currently have (what I believe to be) a sufficient amount of background information to get meaningful work done on the coding front.
As mentioned in previous posts, there are two primary components to my research work:

  1. Word2Vec for creating embeddings
  2. Trading Algorithm strategies

I will be working on these two components in parallel throughout the year, simultaneously. This is preferable to working on single topics for large blocks of time, as working simultaneously between Word2Vec and trading algorithms helps with finding useful correlations. For instance, last week I read a super interesting article on which pointed out a recurring trend happening during China’s Golden-Week festival.

Interesting Applications for Neural Networks

Every year, Chinese people recognize the Golden Festival, a week-long celebration welcoming in the new Lunar year. During this time, the Chinese stock exchanges close, as people travel home for festivities. Interestingly enough, writers are Zerohedge noticed that every year, during this week, prices of precious metals would fall considerably, only to increase sharply as soon as the Chinese markets reopen.
Here is a visualization, with all pictures taken from here:

To test this theory, I went and looked at recent gold prices. China’s Golden Week of 2018 began October 2nd, and went to October 9.
Golden Week 2018
Not surprisingly, gold prices lost about 22$ to the ounce during this week, but have jumped 40$ to the ounce since then.
The reason why I mention this example is because it would be a perfect place to implement neural networks. This is a clearly recurring pattern which occurs during the same time period (China’s Golden week happens on the same date every year, but what about holidays/events that change dates year-to-year), and a recurrent neural network should be able to figure this out on its own. I’m thinking of using this data as a entry-point into incorporating AI into my trading algorithm, as it is a clearly defined task: Train the network on this data (+ other unrelated gold-price data to make the set more varied), and ask it to predict what will happen to gold prices starting October 1, 2019.

Questions to Consider

This may sound simple, but there are a lot of factors which come into play. For instance, how would we incorporate times (dates) into the neural network? What’s the purpose of having a neural network recognize a financial pattern, without knowing when it would appear? What exact data would we use for training the network? Gold prices & dates are necessary, but how would we convert these to inputs for a RNN? Would it make sense to use Word2Vec for this?
These are all questions that I need to start experimenting with, and trying to answer on my own. I’ve tried researching the answers online, but there is little to no consensus on answers, as people genuinely don’t know what the best approach is (and those who DO know definitely won’t share with the rest of the world). As I’ve come to realize through my research on trading algorithms in general, there are infinitely many different approaches you can take, and equally as many trading strategies.
Some other interesting trading strategies that I’ve researched are:
1. Mean-Reversion
Rather basic investing strategy where we assume worse performing stocks one week will perform best next week, and vice-versa.
I am currently writing a sample algorithm of this strategy using’s python IDE.
2. Sentiment analysis 
The theory that one can use investor sentiment (calculated using NLP neural networks applied on Twitter/Stocktwits data) to make accurate predictions about which stocks will go up/down (positive sentiment –> BUY, negative sentiment –> Sell).
The Word2Vec search I’m currently writing will hopefully be able to do some basic sentiment analysis.
3. Selling/buying based on large-cap hedge fund involvement
Apparently, a super effective strategy in recent years has been to buy / sell stocks based on how under/overweight they are. An underweight stock is a stock which has a large short positioning by large-scale hedge funds relative to the stock’s ‘weight’ (influence) in the S&P500. An overweight stock is just the opposite. It appears that buying underweight stocks while shorting overweight stocks is a strong investing strategy, which has consistently been returning positive post-2008 crash. One doesn’t really need a neural network to do this, but I thought it was worth noting.
But, there are many more investing strategies to consider. The main question is which ones would be most applicable to the Word2Vec I want to implement?

Reflecting back on the week

I think my strategy of implementing Word2Vec-created embeddings into a trading algorithm has potential, but only I can figure out if this is true or not. This type of ambiguous work is hard for me to grasp right now, as all of my questions have no real answers — One must figure them out for himself (every man for himself type situation). But, I don’t want to get bogged down in the research, so I will try to choose one definite strategy (fundamental dataset) in this upcoming week so I can get to tangible work.
There is always more learning to be done, and I will continue to research online, through books (I’m about halfway through Black Swan by Nicolas Taleb — I will write a reflection post once I’m done, as there are some super interesting parallels between my current work and what Mr. Taleb writes about) and through interviews.×300.jpg


A Teenager’s View Of Smartphones

Recently, rumors have emerged mentioning possible specs and release dates for the new iPhone, likely to be named “The iPhone 6s.” This news comes a few months after the Samsung Galaxy 6 and 6 Edge were officially put out for sale. Amid this sea of “new’ mobile phones deemed smart were models from Microsoft, Nokia, Amazon, and Oppo. Today, I am not writing to give my opinion of my latest phone or provide you with likely false assumptions on what the new iPhone will look like, but to present a teenager’s honest view of smartphones, since large companies always seem to be interested in discovering what teens think of their products.
To begin, I would like to explain why I put the word “new” in quotations in my third sentence. I do this because it annoys me very much when a company like Apple comes along with a shiny phone with a screen that is slightly larger than its predecessor’s and says that it is “A new, innovative piece of technological art that will change the world with its 8-megapixel camera.” We understand that it was constructed this year, but that doesn’t mean you can tell everyone that it is completely new when the most significant change took place when you increased the price. If you want to receive our respect and consequently receive our money, you should consider investing some money in actual research&development. It’s not like you don’t have the funds to do so, anyway…
The next point I would like to address is the fact that us teenagers are gradually beginning to decrease the time we spend using our smartphones. In fact, there are only around five uses for these overpriced touchscreens that you can talk to. These include:

  • Checking our Instagram or Twitter feeds (No, none of us use Facebook).
  • Using Google to find the definition of words we don’t understand and to locate the nearest Chipotle.
  • Showing our friends Youtube videos we think are funny.
  • Documenting all of our actions on Snapchat.
  • Downloading apps that seem fun until you actually open them.

Sorry to break it to you Google, but we don’t use Google+ or Gmail anymore.
On behalf of all teenagers, I am going to send a message to phone makers:
The smartphone craze is nearing an end. We are not interested in buying any more phones or tablets alike, no matter how large their screens are or how quickly their processors operate. All we want is a practical device that allows us to take pictures, play games, and text. The only time when buying a new phone seems reasonable anymore is when the screen on our old one breaks or when it stops working properly. Based on the attitudes and thoughts of my peers, which closely resemble mine, I am going to make the assumption that iPhone, Galaxy, and all other smartphone sales are going to begin to decline soon. One day, the masses are going to realize that a phone with a touchscreen and a decent camera shouldn’t cost over five hundred dollars.
I strongly advise tech companies and phone makers alike to start exploring and manufacturing new technology–And no, Apple, by new I don’t mean ios9 or iPhone 7– that is practical, interesting, and simple. It’s really that easy. The time is ripe for a completely new product that differs from what we currently have in a good way.