Financial Implications of “The Black Swan” by Nassim Nicholas Taleb

The Black Swan by Nassim Taleb is one of the best, and one of the most interesting, books I’ve read in awhile. I’ve heard people reference Taleb’s work a few times before, mainly in financial settings, but I never quite understood why, which is what inspired me to read this. In general, Taleb argues that throughout human history, nearly all majorly significant events were considered highly improbable before they occured (think: Fall of Roman Empire, rise of Nazi Germany, Bubonic Plague pandemic, sinking of the Titanic). Essentially, the events which have the greatest impact on us lie totally outside of our field of prediction, no matter how much past data, observations, and intuitions we use in making these predictions. These momentous, unpredictable events are called Black Swans, known professionally as fat tails, because of they are perceived as rare (Taleb actually argues that these events are much more common than we’d like to think).
Taleb helps us visualize the phenomenon of Black Swans through two fictional worlds: Mediocristan and Extremistan. Mediocristan is a province where “particular events don’t contribute much individually — only collectively” (Taleb 32), and things are distributed rather evenly. Extremistan, on the other hand, is a world of extremes, where “inequalities are such that one single observation can disproportionately impact the aggregate, or the total” (33). Taleb claims that we live in Extremistan, as much as we’d like to believe it to be Mediocristan, since single events can have disproportionately large impacts on our societies. Evidence of us living Exremistan is seen through other metrics as well, most notably through wealth divides (top 1% vs 99%).
Image result for population distribution us
This theory of Black Swans being more significant than regular accumulations of events holds true in all fields in Extremistan, especially (and this is most relevant for us) in finance. One fact Taleb mentions really dumbfounded me, which is that: “In the last fifty years, the ten most extreme days in the financial markets represent half the returns” (275). 10 days in fifty years. The alternative, which is not addressed directly by Taleb, is likely also true: In the last fifty years, the ten most extremely negative days in the financial markets represent half the losses. This is so incredible, and at first this made me wonder why people (like me) try so hard to predict what will happen to a stock’s price day-to-day, when they should really be trying to predict the next Black Swan. I then realized that if a Black Swan can be predicted, it is no longer a Black Swan, and the opportunities for profit are no longer as large since more people are expecting it to happen. Also, once a Black Swan is predicted, a new Black Swan emerges outside of this prediction which becomes the next true Black Swan.
This is also a very sobering thought, considering the work I’m currently doing in predictive trading algorithms. I’m sure Taleb would laugh at the idea of me using historical financial data to predict what will happen to a stock tomorrow, next week, or next month. In his book, Taleb talks a lot about the misleading nature of data, using the example of “1001 days of history”, noting “You observe a hypothetical variable for one thousand days. You subsequently derive solely from this past data a few conclusions concerning the properties of the pattern with projections for the thousand, even five thousand, days. On the thousand and first day — boom! A big change takes place that is completely unprepared for by the past.” I enjoy picturing this concept like this: I know I have been alive every day for the past 40 years, so in conclusion, I can use this data to determine I will be alive tomorrow, next month, and the next 40 years after that! The irony here is that the one event which lies outside of my predictions would be the most significant event in my life. Yet, I am not deterred. I do think that patterns exist in market behavior, and because these patterns repeat cyclically, they can be used to make predictions. The mere existence of Black Swans doesn’t negate the existence of patterns, it’s just possible that one day these patterns will break. I think hedging using options is enough to counter potential negative effects, I just need to learn more about how to do this efficiently.
Another key point brought up is the effect of silent evidence, which relates to history being written by the winners and not the losers. “It is so easy to avoid looking at the cemetery while concocting historical theories” (101) says Taleb, which reminds me of the problem of survivor ship bias in quantitative trading. Survivor ship bias occurs when we train models on existing companies and their stocks, while forgetting the companies which no longer exist (bankruptcies, acquisitions). Oftentimes, this omitted data (the silent data) is most important, as we can learn what events leads up to a bankruptcy or sudden stock collapse. This relates to the idea of blind risk usually leading to better short term rewards, whilst completely backfiring later on down the road: “The fools, the Casanovas, and the blind risk takers are often the ones who win in the short term” (117).
Image result for bankrupt companies
A third interesting point which really made me think was Taleb’s take on information, and more importantly, misinformation. He writes: “The more information you give someone, the more hypotheses they will form along the way, and the worse off they will be. They see more random noise and mistake it for information” (144). This really hit close to home, as someone who is naturally paranoid and always thinking that the solution to my problems lies in all the many books, studies, and reports I haven’t read. It also makes me think about not mentally constructing narratives (which, more often than not, become red-herrings) based off of information I have.
Finally, despite being so apprehensive on the idea of predicting the future, Taleb also claims that “In the end we are being driven by history, all the while thinking that we are the ones doing the driving.” There are many ways to interpret this great quote, but I see it as history being contextual. In other words, we don’t control history, rather, events happen and build off of one another, whether this be in random fashion or through similarly repeating patterns (the latter being my theory, though this might just be me mentally constructing a narrative based on information I have).
If Mr. Taleb ran a hedge fund (which he previously did, under the name Empirica Capital), I’m guessing his strategy would be to benefit off short term, consistent profits while always hedging against potentially huge losses. This way, he would be able to see gradual returns safeguarded from catastrophic events. Likewise, he could make his bigger bets on positive Black Swans, while pursuing a small-scale trading strategy in the time between these rare occurrences. This is where the options trading, which my previous interviewee brought up, comes into play. A strategy founded on single, greatly dispersed events which are not even guaranteed to happen may sound like an overly passive, boring, and minimally profitable way to run a hedge fund. However, in recent years, many asset management companies have gone bust by neglecting Taleb’s advice and discrediting the power of Black Swans. One such instance which comes to mind are the number of hedge funds which invested in cryptocurrencies like Bitcoin, not expecting the price to drop from ~$20,000 all the way down to sub-$4,000.
Bitcoin Graph
Another more salient example is with, a Florida-based hedge fund which recently went bust after unexpected volatility caused his positions to collapse completely. The hedge fund’s manager, who I don’t need to name because he is receiving enough bad publicity already, founded this fund on the premise of ‘naked options’ (as opposed to covered options) — essentially buying either puts or calls without hedging potential losses as a method to save money. The problem with this strategy is that if you sell a call for a stock at a strike price of $50, you are hoping that the stock will remain    <= $50 up to a certain date. However, if the stock rises above 50$, the call seller loses money, since he has to buy the stock at this price which is more than what he paid for. In a very unlikely, and very unfortunate, scenario, the price of the stock could grow an incredible amount (say it goes up to $500), and the resulting losses would be even more incredible. This is what happened to, which lost all of its clients’ investments (+ more) because of an unpredictably volatile period for crude oil prices.

To Conclude…

I’ve always found that finishing a quality book is an overwhelming experience, due to the surplus of information you quickly gained (not to mention trying to remember all of this!) and also because it leaves you feeling a bit empty, as if you’ve lost something. Right now, I think of what Taleb’s work can teach me about my project. Some lessons I’ve learned from reading this book are:

  1. I need to think about what a Black Swan would look like in my context, and how I could protect against this.
  2. I need to familiarize yourself with statistical terminology. This problem was apparent to me before reading the book, but it’s now been solidified. I have never taken a formal statistics class (which is fine considering Taleb’s theory on harmful over-saturation of information), but it’s clear that I have to do some more focused learning on my own.
  3. “Note that a ‘history’ is just a series of numbers through time” (119). I was so happy when I read this, as it immediately made me think about how my project is essentially using ‘a series of numbers’ (right now, open/close prices + volatility) to map historical events.
  4. Don’t be quick to construct narratives, as this can greatly distort perception and information (you try to fit new information to the narrative you’ve established, without ever thinking if this makes sense).
  5. Don’t be too stressed about shortcomings, because failure is only failure if you are failing on your own established objectives, not some nebulous criteria set by outside sources.

Trying to cover every interesting topic raised in The Black Swan would be a senseless attempt, as you can simply read the book yourself considering that Taleb does a much better job explaining these issues than I do. The topics I mentioned above are the ones I find most relevant to my independent project on using Word2Vec to map similar patterns in market behavior, and also some which intrigued me most. I’m interested in hearing what other people who have read/know of this book think about its implications, and what some important lessons I might be missing?
I’m looking forward to my next read, which is a bit more focused and specific than the previous two: Advances in Financial Machine Learning by Marcos Lopez de Prado. From what I’ve heard (and read), this is more of a textbook-type piece which looks at actual solutions to common problems in ML and trading algorithms. Apparently, “Readers become active users who can test the proposed solutions in their particular setting”, which would be great in my case, as I’m moving into more actual programming and implementation.

“We no longer believe in papal infallibility; we seem to believe in the infallibility of the Nobel, though.” (Taleb, 291)

How Word2Vec Accounts for Fundamental Stock Data, and Other Cool Things

How can anyone possibly use Word2Vec on a single parameter of financial data, such as opening and close prices of stocks, to make any sort of meaningful prediction? Are we simply going to ignore the heap of fundamental data behind every stock, such as P/E ratios, return on assets, etc…?
I’ve pondered these questions for a while, and I’ve finally found an answer: Word2Vec doesn’t know anything about the fundamental data behind words (grammar, syntax, or even definitions of individual words), yet it still does remarkably well at matching similar words, as we have seen.
Going off of this observation, is it safe to say that Word2Vec can match stock price shifts of similar significance without knowing anything about the nature of these stocks? My intuition is yes, because of how well Word2Vec does for mapping similar words/texts, but also because of how Word2Vec models are designed on a technical level. There are two variations of Word2Vec, continuous bag of words (CBOW) and skip-gram models, with skip-gram models being better-suited for larger datasets (meaning I’ll probably have to use skip-grams). However, both models work in similar ways: You read through a corpus of text, generate vectors for each unique word within the text, and then create new vector embeddings (what we’re actually looking for) based on the context these words appear in.
So, how are these initial vectors created? It’s quite simple, actually. For the first word in the corpus, you start by creating a one-element array, which looks like: [1]. If the word is ‘The’, then the word ‘the’ corresponds to the first element in the array. For the next unique word you encounter, you create a new array: [0, 1], but you must also add a zero to the end of the first array as well: [1,0]. If the second word is ‘mighty’, then the second element in the array corresponds to mighty. Whenever an array’s second element == 1, then you know the array corresponds to the word ‘mighty’.
You go on like this until you’ve accounted for every unique word. You will end up with an array whose size == the number of unique words, and each word is represented by a huge array full of zeroes with a single 1 at the location which corresponds to that word. For instance, if our data set had 20 unique words, the array for the word ‘mighty’ would be [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0].
One observation you can make right off the bat is that the more unique words you have in your dataset, the longer these individual arrays will be, and more computational energy (i.e: time, money) will be required. So, how do we reduce the size of these vectors?
Image result for neural network 1 hidden layer
Recall that Word2Vec neural networks have three layers, with one hidden layer. The first layer, the input, are the arrays we generated above. The second layer, the hidden, is where array compression happens. This array compression works because the hidden layer is simply a matrix of weights (a matrix of dimensions: Number of unique words X desired vector size, between 50 and 300 dimensions). We multiply the original array for a word by the learned weight matrix for that word, and voila, we have a vector (still an array) of between 50 and 300 dimensions (elements).
Hidden Layer Weight Matrix
How these weights get adjusted is a story for another time, but in general, they work by calculating probabilities of target words appearing in a certain context:
Training Data
You use nearby words to generate training samples, and you adjust weights for the target word (blue) by seeing how often it appears, throughout the rest of the dataset, in proximity to those samples (how likely ‘quick’ is to appear next to ‘fox’ would be used to adjust weights for ‘quick’).
By the end of training, we are not even interested in the output layer of the neural network, as all we need are the vectors created by multiplying the input layer by the weight matrix. These embedded* vectors can then be used to find words (or events in the stock market) of similar significance.
* Embedded is just another way of saying compressed, so embeddings are essentially compressed vectors.

What Does a Falling Tree Teach Us?

So why I am bringing technical definitions into this? To show that Word2Vec learning is entirely based on context. To me, the stock market is a contextual thing, where future activity is predicated on what has happened before (cause/effect relationships). Many would call me wrong to say this, but no one is ever completely right about the markets, so it is worth taking a look at (Bridgewater Associates should be on my side for this one). Therefore, Word2Vec-generated embeddings do not need to include any fundamental background data about stocks, because we don’t care about these fundamentals when looking at context. If a tree falls over in a forest and hits a nearby house, we don’t care about the exact sedimentary composition beneath the tree’s roots at n seconds before the tree fell, we care that we have learned a new thing: Trees falling can lead to houses being destroyed, and we can use this knowledge to make predictions about what will happen when more trees fall in the future. Sure, the sedimentary composition might have caused the tree to fall, but we don’t know for certain, and if it did, it is reflected through the fact that the tree fell, and it is therefore irrelevant.
Image result for falling trees
This reminds me of a point brought up by HFT trader Tom Conerly when I spoke with him, which is that high frequency traders don’t look at fundamentals of stocks, because they believe all fundamentals are proportionally reflected through market price. This saves space, time, and energy, as they can avoid processing mountains of other data which they would otherwise have to look at.
So, now that I’ve decided to ignore fundamental data when creating my neural network, the next big question I need to answer before actually uploading data is which data I want to train my net on. I’m still thinking of starting with basic open/close prices, and making further judgments based on how well this performs.

Areas of Interest, More Potential Datasets

Another interesting financial correlation I’ve read about, and which can be potentially learned through Word2Vec-type A.I, is the relationship between Federal Reserve Funds Rate (interest prices) and general stock market behavior. As interest rates go up, the stock market tightens and starts to go down.
This is super interesting and incredibly useful information. Looking at the graph, you notice there is a delay period between a shift in FFR and subsequent changes in the market. You can definitely use Word2Vec, or some other neural network, to approximate how long this delay period is, and then use that information to make predictions about shifts in the market caused by changes in FFR.
A second interesting correlation I’ve looked at is the relationship between U.S GDP growth and stock market performance. In essence, the idea is that when GDP is projected to grow, there will be a bull market with inflated prices. The opposite is true as well, with poor GDP outlook reflected through bear markets. So, the goal is to learn what indicators anticipate a growing GDP. There’s a really good post on Medium about this, which claims that the best indicator for GDP performance is ISM’s monthly index score. Again, you could train a Word2Vec model to predict GDP growth, and use that information to make stock price projections.
In addition to doing research on advances in A.I for finance, I am going to spend a lot of time researching known correlation relationships in the market, and seeing if these can be applied as datasets for my neural network. Speaking of personal research, I just finished The Black Swan by Nassim Taleb, and will be writing a reflection post soon which should capture the main points I learned from Taleb.

Most of the visuals and definitions I used for explaining Word2Vec Skip-Grams was through Chris McCormick’s tutorial, which I highly suggest reading. 

The Land of Distractions

“The oppression of the poor must establish the monopoly of the rich. Profit or income inequality are always highest in countries which are going fastest to ruin.”
-Adam Smith

This quote, which was spoken over 200 years ago, still seems oddly relevant today. With wealth disparity increasing and labor force participation rates dropping, it becomes evident that we are currently riding on a one-way trip towards calamity. Or we should be.
One would expect that in such a time, a time of historical injustice and senselessness, the streets of rich neighborhoods would be flooded with angry mobs of mistreated lower-class workers fighting for closure. But it appears that this isn’t the case. In fact, the past few years have not only been devoid of uprising and aspiration to change, but it seems as if nobody thinks that something is wrong.
This begs the obvious question of “Why?” Why are we so oblivious to what is clearly happening right in front of our eyes? It would be reasonable to think that those who live in a country that only cares about money would also be inclined to be well-informed about the occurrences in the world of finance. In addition, the information is accessible to everyone, on finance blogs, statistics charts, and online news headlines.
But, the more time you spend on the internet trying to find new, helpful information, the more evident the problem becomes.
It is basically impossible to go on the internet nowadays without getting hit by a barrage of ads and seemingly interesting links. Want to check the latest stock news? Well, you’ll have to wade through all of the grilled cheese tutorial videos and Kylie Jenner life-updates first. So, we have all the information in the world at our disposal, but all of this information is hidden behind a wall of carefully placed distractions.
This is yet another one of the many remarkable aspects about America that makes it such a fascinating country; It provides its citizens with an abundance of information from which enough knowledge can be acquired to get a graduate degree in any field, but strongly supports systems whose sole purpose is to divert the user’s attention away from this information. It’s shocking, but not necessarily aimless. To me, this all seems like a clever way to distinguish the future 1% and the other 99%. The few who understand this will strive to benefit as much as possible from the plethora of free information that is provided to them, while ignoring the common distractions and frivolous topics that are so highly praised in mainstream media. On the other end of the spectrum lay those who avoid experimenting with obscure subjects, such as investing and macroeconomics, because it is unpopular with the crowd.