A.I in Finance, in General.

As I finish up my official post for this week, I thought I’d share a really interesting and informative video on A.I in finance:

The video is done by Siraj Raval, a Youtuber with a whole pool of great videos on different implementations of A.I. I like this video because Siraj really clearly lays out the many shortcomings of trading companies which could be improved with A.I-based algorithms. For instance, I didn’t know that investment banks are trying to use technology which can recognize fraudulent stock orders, to improve security and avoid losses. Also, did you know that banks are testing A.I which can predict whether or not a customer will pay back their loans? And, did you know that 90% of the world’s data (not just financial data, but ALL data) was created in the past two years? I can hardly wrap my mind around the last one, but if it is really true, then this is probably the best time for testing A.I on financial data.
Up until now, my view of A.I in finance was pretty restricted to trading algorithms, but I know see that this is just one of the many (he brings up 5 categories in the video, with several subcategories to each) areas of interest.
What I found even more intriguing was that Siraj actually brings up the idea of using vector support machines to learn financial data. This is nothing groundbreaking, but it is interesting to consider how I might use VSMs in my work. Vector support machines are algorithms which can classify a data set of vectors into different categories, and then use this knowledge to classify future, unknown vector entries into these defined categories. Essentially, VSMs can predict where a vector ‘belongs’. This could be useful, as both Word2Vec and BERT generate vector representations of information. Whichever one I use, I might want a system which can classify my vectors in order to improve learning or efficiency.
Finally, one quote that resonated most with me was this:
“If you can tackle a single, niche problem very well, then you are golden. That’s when they [hedge funds] will come to you, and when you will build a brand around this.”
This really sums up [my understanding of] the current state of affairs in quantitative finance. Right now, there is so much data and so much technology available to investors and hopefuls alike, that there is no longer a point to trying to ‘predict the stock market’ as a whole. Rather, all this data means there’s an abundance of small, specific opportunities which can be capitalized upon, and everyone is competing to see whose specific opportunity yields the most return, most efficiently.

Oversaturation of Trading Algorithms to Blame for Huge Market Rebound? Also, Could BERT Replace Word2Vec in 2019?

It is currently January 5th, 2019, which means winter break is coming to a close, the new year is starting to unfold, and we can finally reflect on the market madness that managed to surprise seemingly every investor alive last week. December was a notably bad one for the stock market, with sources reporting it to be the worst December since the Great Depression. This, coupled with anticipation of rising Fed Funds Rates and rising unemployment, led many to believe that the market would plunge through the final week of the year, setting a dim tone for the start of 2019.
Yet, to the dismay of put owners, the market rebounded right at the time when it was expected to crash, bringing prices back up to where they were a few weeks prior. Still, these last-minute antics did nothing to erase December’s collective losses, as this graph demonstrates:
Different explanations were offered as to why the stock market rose so suddenly, including a Pension Fund Re-Balancing where pension fund managers supposedly had to move ~$60 billion of assets into stock holdings by the end of 2018, while others claimed that the rebound was merely a large short squeeze.
Whatever the true root of the cause may be, this rebound also seems to have effectively calmed stocks down, as the market has behaved pretty normally for the first trading days of 2019. What’s more, essentially the entire market was up yesterday, with Dow Jones up 3.3%.
It’s still very early to jump to conclusions (or, if you held AAPL, drop 9% to conclusions), but it may feel as if we just dodged a huge bullet, as a decline in the final week of 2018 would certainly have negatively impacted projections for 2019. But, considering what this massive market rebound has taught us, my overarching hypothesis is that this stock decline was not avoided, nor was it postponed. It still exists, and should be arriving on time somewhere around April or May. Consider the relationship between raised Federal Funds Rates and stock market behavior, as I bring up in one of my previous posts:
Over the past 50+ years, there has been a clear correlation between raising of Federal Funds Rates and ensuing declines in the stock market (it’s the S&P in this graph, but the same is true for nearly all stocks/ETFs). This decline isn’t immediate, but it happens after a buffer of around six months. Take a look at what happened to S&P prices when the Fed gradually rose interest rates beginning in 2004, for reference.
Why is this relevant? Two weeks ago, the Federal Reserve raised rates for the ninth time since 2015, bringing the total interest rate to around 2.5%.  This specific raise of 0.25%  was nothing huge on its own, but considering the gradual raising of interest rates in 2015 which will continue in 2019, warning bells start to ring. I would again compare the current gradual raising of interest rates to the raising of interest rates beginning in 2004, or 1999 and 2001. In each of these periods, the magnitude of the raise in FFR directly relates to a subsequent market decline of proportional magnitude. On an unrelated yet interesting note, the S&P has become progressively more responsive to shifts in FFR over the past 25 years, as you can also see through the graph.
So, the December 19 raise in interest rates might not be the singular straw that breaks the camel’s back, but the pile of straw is still there, and more straw is piling on in 2019, as the Fed reported their goal of reaching 3% interest this year. When the economy does reach the point of people tightening on spending because of inflated prices and growing interest rates, this is when the market will slow to a halt and we will start moving into the recessive phase of the cycle.
It is not only Federal rates which point to impending decline, as GDP forecasts from the St.Louis Fed estimate 2019 growth to be slower than in 2018, and tensions continue to grow between U.S and China (though i.m.h.o, these tensions have been blown out of proportion by the media, as both countries will eventually need to settle disputes for trade).
I should also mention that I don’t think this market decline will be catastrophic. GDP is still growing and unemployment is still low, which isn’t enough to counter the negatives brought in by raised FFR and slowing GDP growth, but this still means that the economy is producing.
It’s not all dark either, as there’s always room for turning profit. Stock market behavior is most volatile at times of peak growth (just preceding the major decline), as you can see through the erratic up/down spikes at both the peaks in the red graph. I don’t think we are quite at that point of the cycle yet, but it feels like we’re starting to get there, as the turbulent past two weeks of decline/rebound indicate. It is at these turbulent times when huge price swings occur intraday, allowing for big returns on options.
These have all been fascinating developments, and I plan on using the string of open/closing prices of stocks for the past two weeks as the first predictive test for my Word2Vec trading algorithm. I hope to capture the market rebound through a series of price-shift embeddings, which I will then compare to all of my historical price-shift embeddings to see which event in my data maps most closely to the market behavior we’re seeing right now. If the theory behind my code works, the aftermath of this past event should model what will follow the current market action.
Image result for omega advisors
While reading about the past two weeks of trading, I came across another very interesting question, which are the consequences of an over-saturation of trading algorithms. While some blamed the stock rebound on pension funds and shorting, Leon Cooperman, founder of investment advisory firm Omega Advisers, called out trading algorithms for their trend-following tendencies. Cooperman insists that the SEC is doing an inadequate job monitoring trading algorithms’ effects on market behavior by refusing the reinstate their Uptick Rule. With so many algorithms now trading through world markets, there’s a growing concern that these algorithms’ pattern-seeking dispositions will cause increased volatility and unpredictable price shifts. According to skeptics like Mr. Cooperman, in addition to things being much better back in his day when trading was done through shoving and yelling, this over-saturation of trading algorithms can lead to extreme instances of herding and exacerbated price shifts. I think of this as a snowball effect, where one algorithm investing in a stock will prompt more algorithms to do the same, which will cause the price of the stock to rise, which will in turn incite even more different algorithms to start investing too (so on and so forth).
Related image
This is actually a rather mind-boggling concept to ponder: If these algos use real-time data, their projections are inspired by current market shifts. But, if around 60% of all trading in U.S equity markets is done by algorithms, the daily market shifts analysed by trading algos is mostly just the output from other trading algos. Essentially, trading algorithms of today analyse moves made by other algorithms, which also make their moves based on what is going on in the market, creating a sort of paradoxical back-and-forth interplay. As quantitative trading takes over, algorithms that analyse real-time data are just learning from the decisions made by other algorithms, which Cooperman and company seem to believe will ruin traditional trading and erase the efficient market theory kept in check by human investors.
I happen to strongly disagree with Cooperman’s claims for several reasons. First of all, trading algorithms use the same signals and statistical probabilities to make investments as humans do. I mean, humans are the ones writing these programs (for now), and humans tell the program which indicators to look out for when investing. So, there’s no difference if algorithms do the trading instead of humans, except maybe increased speed and consistency. Not only that, but the main goal of trading algorithms is to locate and profit from market inefficiencies. This is no different from the main goal of human traders. At the end of the day, human traders are just as prone to herding as their digital counterparts. Financial bubbles existed long before anyone knew what financial engineering meant, and the problem of crowd-think will probably never subside.
Finally, I can’t imagine a future where 100% of trading is done by algorithms with no human involvement. Even if all trading is eventually done through tools and software which use trading algorithms, there will still be humans programming these algorithms, changing their parameters, testing their performance, and inventing improved strategies. Quantitative trading is merely the next logical step in the evolution of finance, allowing for more efficiency and liquidity.
Right now, the focus should not be on questioning the integrity or consequences of our trading algorithms, it should be finding new AI-based algorithms which can better synthesize financial data. For instance, the next giant leap for trading are algorithms which learn the behaviors of other trading algorithms, so that they can make investments based on what they think other computers will do (and I’m sure this is already being investigated by secretive hedge funds and HFT funds).
I am extremely curious to see how the dynamic between trading algorithms plays out, and I would love to hear how others think trading will look like in the near future.
Image result for bert

The last thing I’ve been thinking about over winter break is BERT, which is a new technology fresh out of Google’s AI lab. BERT, an acronym for Bidirectional Encoder Representations from Transformers, is an improved NLP model which can learn bidirectionally and performs better on benchmark data sets than any other NLP system. Much like Word2Vec, BERT creates embeddings for words, but BERT’s embeddings are much more diverse and powerful, as they take into account the issue of homographs.
I will continue reporting on BERT in the following weeks as I do more research and improve my understanding, as I think this is an exciting breakthrough which has a lot of potential to be explored (it only released publicly 2 months ago), especially in the context of my project. One amazing piece of info Google shares is that BERT “also learns to model relationships between sentences by pre-training on a very simple task that can be generated from any text corpus.”

Apparently, BERT models can also be trained to recognize sentences which appear in similar contexts. Even more significant is the fact that the sentences used in the top left are related by causation, so BERT can recognize cause/effect relationships across sentences. What I mean is that ‘The man went to the store’, and therefore “He bought a gallon of milk”, which is one event which lead to another. BERT recognized that “He bought a gallon of milk” might be a likely follow-up to the sentence “The man went to the store”, which is a remarkable intuition for a self-taught computer.
This has the potential to work incredibly well for the task of recognizing what might happen to the price of a stock following a certain sequence of up/down price shifts. Rather than learning which sentence will lead to which other sentence, we could learn which stock market shifts led to which other stock market shifts. This is already what I have been trying to do with Word2Vec, but BERT proves even better equipped for the job because it can embed and map entire sentences, not just one word at a time. For instance, if you create embeddings for entire sentences, this is practically just creating one big embedding for a sequence of words. Translating this over to my project where words are single up/down price shifts for a stock on a single day, BERT could potentially create embeddings for sequences of stock activity, like a month, a week, or a quarter of stock trading. Then, we create an embedding for a week of real-time stock data, and see which historical week_embeddings match closest to our query. This would solve the issue of sequencing and Shazam Fingerprints which I discuss in my previous post, while also allowing us to extrapolate many more embeddings from the same amount of daily stock open/close data.
It’s really great to see all of these technological advancements and how they relate to finance, and also to see the growing interest in researching new trading strategies. I even got the chance to speak with a few undergrads at Columbia and Princeton over break, who confirmed that quantitative finance is the fastest growing major at their schools.
With all that said, if anyone has some interesting new ideas, opportunities, or people to talk to, please let me know, and I look forward to another great year of research and learning!

Drawing Inspiration From Unlikely Sources: What I Learned from Shazam

The other day, I read an article about how Shazam actually works to recognize music so quickly and efficiently. One would wonder how to pinpoint a single 5-second song snippet within a database of 8 million+ songs. As it turns out, they use a process called ‘Fingerprinting’ to generate condensed ‘fingerprints’ of information about every song. These fingerprints contain numeric interpretations of information about the particular song, such as tempo, bandwidth, and amplitude of sound waves. As it turns out, the fingerprints are really similar in function to embeddings! If you think about it, a data embedding in my project would be a vector containing crucial information about a stock at some time period, including close price, open price, volatility, et cetera… And building off of this analogy, the query_embedding in Shazam would be the 5-second clip of music you play it. Once Shazam receives this input, it parses the clip into several smaller clips (either 0.25, 0.5 or 1.0 seconds), and creates a fingerprint for each of the sub-clips. Once these fingerprints (like query_embeddings) are created, they can be mapped against the whole database of fingerprints, returning the most similar one.
Here is where the most important reason for me mentioning Shazam appears: Sequencing of fingerprints.
If Shazam matches two audio clips of 0.25 seconds, this is not enough to prove that these two audio clips represent the same songs. Plenty of different songs have short instances which sound the same, especially nowadays, so we need more proof to match two songs. Shazam’s clever approach is through sequencing, wherein the app chooses the data fingerprint that closest matches the query fingerprint, and then compares the following data fingerprints to the following query fingerprints. In essence, if one fingerprint match is found, you would then compare the neighboring fingerprints, and if the neighbors all match as well, then this proves that the two songs are the same. This would mean I first have to find a single event match, then see if the neighbors of this event match as well, and if the second is true, then I can start considering the expected outcomes.
The other beautiful thing about fingerprints or embeddings is that they are static, and therefore can be stored/accessed quickly and even offline. Consider it this way: A song, once released, does not change. This means the fingerprint for this song will always be the same and so it can be stored in some file of easily-accessible data, rather than having to update the fingerprint every time before searching. The same applies to stocks (or pretty much any historical data), as past open/close prices for stocks don’t change spontaneously. If you have wondered how Shazam’s music recognition works so quickly, this feature of fingerprints/embeddings is why.
If you want to learn more about the many intricacies of Shazam’s algorithms, here’s a more in-depth analysis of their music recognition.
Image result for sequence
This idea of sequencing embeddings feels like the final piece I needed for putting together my Word2Vec trading algorithm (though I’m sure there will be many more pieces to come), as I know understand how this code would work:

  1. Train Word2Vec on data set
  2. Create embeddings for all stock data based on findings from Word2Vec training.
  3. Store all of these embeddings into a file.
  4. On the daily trading algorithm, every couple of minutes, create a query_embedding from real-time data. Then, attempt to match
  5. If a match of great accuracy is found, proceed to sequencing (how large of a sequence match is enough, though?)
  6. Once a sequence passes our criteria for a ‘good enough match’, proceed to calculate expected outcome from this sequence (found by looking at what stock price changes followed this sequence of data in the past)
  7. Factor in both % accuracy of match between query_embedding between the sequence of data_embeddings, and % expected gain/loss.
  8. If a large loss is expected, then exit your position. If a large gain is expected, then invest (using a max_exposure function to ensure not too much capital is invested in a single place).
  9. Run code for hedging against large bets.
  10. Keep this process running throughout the trading day, while investing in low-risk stocks between investments based on Word2Vec generated predictions.

*I feel like this process would be better conveyed through a diagram, so I’ll try to get one of those uploaded next time.
As I write this, I already see a potential problem, which is how to treat new daily stock data? Once we train Word2Vec on our data set, we want to keep improving the Word2Vec by giving it new data, so should we just retrain the network every couple of weeks and update our data based on our new findings? To me, it seems reckless to ignore all the new stock data being created every day, but how often should we update our set of data embeddings? This will be a source of thought for the next week, as I am beginning to actually write the algorithm I describe above (keep an eye out for findings and reports coming soon).


A different type of chaos theory

I ran into another interesting article over the weekend because I was so drawn in by the title: “Why Stock Predicting AI Will Never Take Over the World” by Matt Wright. Right now, this is a highly polarized issue, with some claiming that the entire market will be automatized in a decade or so, while others argue that “the market is an entirely human phenomenon” which cannot be recreated by self-learning bots. Mr. Wright is on the latter front of this debate, claiming that predicting the market is impossible due to Level 2 Chaos Theory (a theory which is much less intense than the name suggests). Level 2 Chaos is the theory that in the stock market, if you do magically come up with a prediction that is 100% accurate, so many people will rush to profit off of this prediction that the prediction will no longer be valid.
I responded to Wright’s story with the following (will update with a response if one arrives):

Mr. Wright, I agree with your claim that many people investing based on the same prediction will cause this prediction to become invalid, but don’t you think it is possible to counter L2CE in the stock market by safeguarding your predictions for yourself (and investing a controlled amount, rather than causing adverse effects by investing too much)? Or, alternatively, if you know many people are going to invest based on a prediction, you can hedge against the influx of investments by betting against the prediction? 

Also, more generally, you claim that this L2CE will occur if people try to predict exact stock prices. Does this mean you think it is senseless to predict broader patterns in the market, rather than predicting what the price of a stock will be tomorrow?

In my opinion, Mr. Wright is correct, but this is not sufficient reason to say that stock predicting AI will never take over the market (In fact, I would say stock predicting AI has already ‘taken over’ the markets, in the sense that the most successful hedge funds use AI to guide their predictions and to shape their portfolios). An algorithm like mine, which works to predict patterns (greater shifts) in the market, would not suffer from Level 2 Chaos because I am not singularly investing enough money to offset a larger market cycle, and I don’t intend to distribute my exact predictions every day. I wonder what others think about L2CE limiting the powers of trading algorithms? Are you on Mr. Wright’s side on this one?
Image result for bear market
In other financial news, stocks have not fared well so far this month, with this being (almost!) the worst start to a December for stocks since 1931. Some claim this apprehensive behavior on the markets is a direct result of uncertainties in foreign policy between the U.S and China, while others attribute the market decline to the limited number of trading sessions left in 2018 (wanting to save money for holidays, not ending year on bad note). However, this is all, in my opinion, the anticipation of the Federal Reserve’s final policy meeting of the year this Wednesday, where the Fed will decide whether or not to raise Federal Funds Rates. A couple of posts ago, I talked about how greater stock market behavior (like the graph of the S&P) is almost inversely proportional to the graph of historical FFR. As interest rates go up, stocks go down (with a buffer of about ~4 months), and vice-versa. You can check this for yourself with the help of this graph:
Image result for federal funds rate vs stock prices
Surely, with interest rates being so low for so long, I expect the Fed to raise them, with most estimates being a raise of +1/4%. A raising of rates this Wednesday would probably mean an immediate negative reaction, a slow start to 2019, followed by some overall growth which is then accompanied by a gradual decline in the markets. At the same time, I highly doubt that this slow end to the year foreshadows a greater, 1929-esque crash coming mid-2019.



Loud Silence: How Doing Nothing can Return the Most

A study done by professor Hendrik Bessembinder from earlier this year highlights the extreme disparity in net gains in the stock market: “When stated in terms of lifetime dollar wealth creation, the best-performing four percent of listed companies explain the net gain for the entire U.S. stock market since 1926, as other stocks collectively matched Treasury bills.” So, what Mr. Bessembinder states is that over the past ~92 years, the top returning four percent of publicly traded companies made up ALL (literally 100%) net gains for the ENTIRE stock market. To add insult to injury, the other 96% of companies on the market performed akin to U.S Treasury Bills. These Treasury Bills, for those who aren’t familiar, are widely heralded as the safest investments possible (least risk also means least reward), and only return about 4% profit on the year (in best cases).
Talk about living in Extremistan…
What surprises me most about this study is that this is not simply a snippet of the past 10-15 years, the time period which I (and many others, I’m sure) consider to be the most volatile and conducive to income inequality, but this hyper-concentration of wealth has shaped the market for the past century. Not to jump to unfounded conclusions, but if this has been going on for the past 100 years, then it is reasonable to assume that it will continue happening for the next 5, 10, 20 years as well.
I mention this because nowadays, the investing world (and the world in general) is becoming more automatized, more laden with competition, and more glutted with information. It is certainly easy to be intimidated by the surplus of data and potential opportunities, and portfolio managers, investment bankers who look ‘the busiest’ give the perception that they are the only ones who can truly grasp all that is going on. But, as with the stock data I’m training my neural networks on, this is mostly full of distracting, meaningless noise. Just because your portfolio is as diverse as possible, or has hundreds of different stocks from different sectors, or you change your portfolio on a daily basis, this doesn’t imply a superior investing strategy. You could invest into a thousand stocks that are all part of the bottom 96%, and break even, while your coworker holds onto only one stock that happens to be part of the 4%, and makes a killing over the next years.
One quote that really sums this up comes from Warren Buffet, who said: “I could improve your ultimate financial welfare by giving you a ticket with only twenty slots in it so that you had twenty punches – representing all the investments that you got to make in a lifetime. And once you’d punched through the card, you couldn’t make any more investments at all.”
Sometimes, it is best to stay vigilant and uninvolved, until you hone in on a single big investment opportunity. I visualize this as a day on a fishing boat: You can use all of your bait at once throwing lines in just to throw lines in, or you could sit with your single line until a big fish bites.
Now that I think about the algorithm I am building, this is definitely an interesting topic to keep in mind. My Word2Vec pattern mapping can’t possibly map similar stock events for all real-time data all the time. If my code could match similar stock events 24/7, then it means my code is broken. In my case, it would be best to run event-matching code all the time, but to only consider investing once the events match past a certain threshold of accuracy (for now, let’s call this Accuracy Threshold). If two events match with 100% similarity (which is extremely unlikely, I’m just using this as an example), then it means there is real potential that some recurring pattern will repeat. Once the code crosses the accuracy threshold, we then need to look at what happened after the data_embedding we matched with. In other words, if our real-time data maps closely to a data_embedding, and the data_embedding was followed by a price increase of 12%, then it is definitely worth investing. The matter is factoring-in percent similarity (which should translate to percent-likelihood that our real-time data will follow the same outcome of the data_embedding it maps to) with expected gain. A very high percent similarity with a high expected gain is best case scenario, while a low percent similarity with a high expected gain is more ambiguous.
So, my question for today is, how should I factor both these percentages into my algorithm to decide whether or not I should go for the investment? Because after all, I need to be able to identify the big fish when they are biting.

Was I Overthinking, or Was I Not? (Featuring Thoughts From a Data Scientist)

Over the break, I had the chance to speak (briefly) with Andrew Zatlin, founder and editor of www.moneyballeconomics.com and former worker in data collection. Recently, he has transitioned away from simple data collection to implementing different metrics for insights into the markets. As you can infer from the name of his website, Zatlin’s key focus is data, but rather than the fundamental data, tick data, and book pressure which high frequency traders so highly praise, Zatlin looks at more macroeconomic indicators. For instance, he works as a forecaster for retail, joblessness claims, and payrolls, to name a few. The idea here is interesting, but not unheard of: You use forecasts of the economy as a whole to guide your predictions of the market, upholding the belief that the stock market directly reflects economic growth. Some people use employment numbers, others use GDP outlooks (like I outlined in an earlier post). With GDP, for instance, the theory is that growth in GDP should correlate with general growth across the markets. I agree, as overall growth should imply growth in the markets, but this is nonetheless a vague projection, and as such there is little margin for profit.
Image result for gdp predictions vs stock market
Zatlin talked a lot about the idea of crawling the internet to acquire data easily, which is an interesting topic to consider as I transition into more discrete data sets (Fed-Funds Rate, Gold Price Data) which aren’t as readily available to the public. Crawling the internet is basically the idea of strategically reading through (crawling) online sites and acquiring desired content through an algorithm. This is what most modern search engines use to index web pages, as they read through content and pick-out keywords (among many other things, I’m sure). I am not too familiar with web crawling, but I know my dad and many of his colleagues have worked with this, so that could be a good place to start consulting for information.
Image result for grouping
Interestingly enough, Zatlin later brought up the concept of using embeddings while searching through new data to avoid repeated information, for clustering, and for efficiency. The idea is relatively simple: Since Word2Vec maps single words of similar meaning, you could use a model such as Universal Sentence Encoder to create embeddings for entire data entries, and then use Word2Vec to map similar data entries. This is incredibly powerful and efficient if done correctly, as you can reduce databases to only truly unique data which is relevant to you. In past computer science classes, I’ve worked with optimizing data by keeping unique entries, but my definition of what constitutes a unique data entry was limited to searching through a data set and deleting duplicates. However, data embeddings would be much more powerful than this, and would allow us to remove fundamentally similar data entries. Say we have an extremely large pool of data, with hundreds of articles detailing the evolution of saguaro cacti. In most cases it is great to have a lot of information, but too much input can be burdensome as well, as Nassim Taleb warned us. So, in reality, we don’t need (nor want) one hundred articles all talking about saguaro cacti, we would only want those which contain crucial information (and no repeated info). Word2Vec would cluster articles of similar significance (using the same context-based training I explained in earlier posts) together. That means, we could look at a certain cluster of 10 articles and pick out one, with the idea being that this one article will teach us about cacti as much as the other nine. By the end, we’ve reduced the size of our data set by 90% while preserving ~90% of the  content!
Data optimization may seem unrelated to predicting the stock market, but it is important to consider all of these factors when the datasets you start to use grow larger and larger. For instance, I recently downloaded a file of data which contained all daily open/close/volume for all stocks and ETfs beginning with the year 1970. You can imagine how large of a file this would be, and training a neural network on such a large amount of data is time costly. I could counteract this issue through cross-validation, as Mr. Coste talked about in my previous interview. But, with my limited knowledge of k-fold neural networks, I think it would be wiser to devote my time to testing real predictive models than have my efforts hampered by such tangential endeavors.
Finally, Zatlin and I talked briefly about the nature of predicting the stock market (which, as I speak with more and more people, I’m starting to learn is quite subjective). His view is, as I mentioned earlier, is that for large-scale predictions, the only applicable data are growth metrics like GDP and unemployment forecasts. To him, these are “the driving forces” of the economy, and the stock market will reflect that. This is a refreshing take, after speaking with mathematically-oriented high-frequency traders.
Switching gears away from this conversation, let’s take a look at the coding end of my project. You might remember that a couple of posts ago, I talked about the difficulty of creating input vectors for stock data, because of all the possible parameters and definitions of what a ‘stock event’ is. This issue was quite problematic for a while, as I couldn’t start training
I decided to begin by looking at daily closing prices, creating a dictionary of daily close prices (keys) and respective dates (values), only to later realize a crucial flaw.
Can you guess what it is?
It is senseless to make a dictionary with close prices as your keys, since a stock might happen to have two identical close prices at different dates! No worries though, as no two dates are alike, so we can simply make the dates be the keys.
A dictionary might seem like the right approach for storing this type of data, but it turns out this is not an ideal data structure for my purposes. Consider the following: What happens if we do find a ‘stock event’ embedding (in this simple case, the embedding for the close price of a day) which matches our query embedding, and we want to use this insight to make a prediction for what will happen tomorrow? To do this, we would need to know what the neighbors of this ‘stock event’ are, or in layman’s terms, we need to be able to access the prices of the dates close to our target date. This can be done by creating a linked list, where each element in the list is linked to its previous day and next day.
(“That’s weird that you call your class fingerPrint()…” Yes, it is, but this will make more sense after my next post).
So, now we have all daily data loaded into a linked list, with each element containing its own date, its closing price, its next day neighbor’s date, and its previous day neighbor’s date. This is great for organizational purposes, but is rather useless for training a Word2Vec network, as Word2Vec is only interested in the ‘words’ (which are the prices in our case). So, we can easily create a list of all prices, treat this list as a data corpus, and train our Word2Vec model on it. The catch is that the prices must be ordered in chronological order (and this is crucial!), because Word2Vec is trying to learn historical patterns through context. If the prices aren’t in chronological order, it undermines the whole purpose of our endeavor, as we would simply be learning the contexts of random numbers. With chronological order, we learn which price shifts occur in what historical contexts, and what events lead to gains/losses in closing prices.
So, now that we have our list of prices, this brings me to my next (potential) revelation, which is on the matter of input vectors. We can’t simply feed a corpus of numbers into Word2Vec and expect it to work, because Word2Vec does not train on numbers, only strings. The past two weeks, I spent an unreasonable amount of time thinking about how we can create Word2Vec input vectors for stock prices, considering options such as z scores or categorizing price shifts as such: Small price gain, medium price gain, large price gain, small price loss, medium price loss, and large price loss.
But, I might have been overthinking all along! Why over complicate this process, when we can simply convert the price (a float) to a string? Once we have these strings, we can treat them as ‘words’ and train Word2Vec on them, even though they are not actually words. This may seem overly simplistic, but it makes technical sense: Word2Vec learns based on context, and we can learn the context that the prices appear in by treating them as words (which is all we’re really interested in). If this is not enough, I’ve found a Python library which converts numbers into actual written words (121 –> one hundred and twenty one).  
With all of this in mind, I can already start training my Word2Vec neural net using the strategy I explained above. We will see how viable this strategy is once I start producing results (stay tuned!).
Image result for shazam app
My next post will be about how we can use Shazam as a source of inspiration for predicting the stock market (stay tuned for this too!).
Also, on the interview end, now is probably the worst time of the year for reaching out to academia, with university finals happening these next two weeks. But, as with stocks, every drop is usually followed by a rebound, and I will have a month-long window to talk with college students during their winter breaks.

Interesting Read: Sentiment and Market Predictions vs. Reality

An article exploring why bull market predictions are so often wrong was recently posted to Medium. The article begins by showing how popular Reddit headlines which evoke optimist sentiment in the Bitcoin market often correlate to the opposite — The drastic fall of Bitcoin’s price and market cap.

Ironically, bullish headlines littered the front cover of Reddit all through Bitcoin’s collapse. The concept of bloggers being wrong about market outlooks is nothing new, but I would definitely recommend reading this article, as it explains some of the disconnect between news headlines and actual stock market behavior. It’s also pretty useful, as the logic used by the author could just as easily be used to explain why Bear Market predictions are wrong — it’s the same concepts of extrapolation bias and probability neglect.
This is interesting to me, knowing that earlier in the year, I considered using online sentiment as a viable method for generating market projections. I also see this and think about all the noise in stock market data, and wonder how much of this noise comes from naive investors following bullish/bearish headlines like the ones above. With all of this in mind, I wonder how hedge funds like Periscope Capital Inc. rely so heavily on sentiment analysis when creating their portfolios.
Another interesting note is on the Eureka Hedge Fund Index, which is starting to dip after rising steadily for the past seven years:

Financial Implications of “The Black Swan” by Nassim Nicholas Taleb

The Black Swan by Nassim Taleb is one of the best, and one of the most interesting, books I’ve read in awhile. I’ve heard people reference Taleb’s work a few times before, mainly in financial settings, but I never quite understood why, which is what inspired me to read this. In general, Taleb argues that throughout human history, nearly all majorly significant events were considered highly improbable before they occured (think: Fall of Roman Empire, rise of Nazi Germany, Bubonic Plague pandemic, sinking of the Titanic). Essentially, the events which have the greatest impact on us lie totally outside of our field of prediction, no matter how much past data, observations, and intuitions we use in making these predictions. These momentous, unpredictable events are called Black Swans, known professionally as fat tails, because of they are perceived as rare (Taleb actually argues that these events are much more common than we’d like to think).
Taleb helps us visualize the phenomenon of Black Swans through two fictional worlds: Mediocristan and Extremistan. Mediocristan is a province where “particular events don’t contribute much individually — only collectively” (Taleb 32), and things are distributed rather evenly. Extremistan, on the other hand, is a world of extremes, where “inequalities are such that one single observation can disproportionately impact the aggregate, or the total” (33). Taleb claims that we live in Extremistan, as much as we’d like to believe it to be Mediocristan, since single events can have disproportionately large impacts on our societies. Evidence of us living Exremistan is seen through other metrics as well, most notably through wealth divides (top 1% vs 99%).
Image result for population distribution us
This theory of Black Swans being more significant than regular accumulations of events holds true in all fields in Extremistan, especially (and this is most relevant for us) in finance. One fact Taleb mentions really dumbfounded me, which is that: “In the last fifty years, the ten most extreme days in the financial markets represent half the returns” (275). 10 days in fifty years. The alternative, which is not addressed directly by Taleb, is likely also true: In the last fifty years, the ten most extremely negative days in the financial markets represent half the losses. This is so incredible, and at first this made me wonder why people (like me) try so hard to predict what will happen to a stock’s price day-to-day, when they should really be trying to predict the next Black Swan. I then realized that if a Black Swan can be predicted, it is no longer a Black Swan, and the opportunities for profit are no longer as large since more people are expecting it to happen. Also, once a Black Swan is predicted, a new Black Swan emerges outside of this prediction which becomes the next true Black Swan.
This is also a very sobering thought, considering the work I’m currently doing in predictive trading algorithms. I’m sure Taleb would laugh at the idea of me using historical financial data to predict what will happen to a stock tomorrow, next week, or next month. In his book, Taleb talks a lot about the misleading nature of data, using the example of “1001 days of history”, noting “You observe a hypothetical variable for one thousand days. You subsequently derive solely from this past data a few conclusions concerning the properties of the pattern with projections for the thousand, even five thousand, days. On the thousand and first day — boom! A big change takes place that is completely unprepared for by the past.” I enjoy picturing this concept like this: I know I have been alive every day for the past 40 years, so in conclusion, I can use this data to determine I will be alive tomorrow, next month, and the next 40 years after that! The irony here is that the one event which lies outside of my predictions would be the most significant event in my life. Yet, I am not deterred. I do think that patterns exist in market behavior, and because these patterns repeat cyclically, they can be used to make predictions. The mere existence of Black Swans doesn’t negate the existence of patterns, it’s just possible that one day these patterns will break. I think hedging using options is enough to counter potential negative effects, I just need to learn more about how to do this efficiently.
Another key point brought up is the effect of silent evidence, which relates to history being written by the winners and not the losers. “It is so easy to avoid looking at the cemetery while concocting historical theories” (101) says Taleb, which reminds me of the problem of survivor ship bias in quantitative trading. Survivor ship bias occurs when we train models on existing companies and their stocks, while forgetting the companies which no longer exist (bankruptcies, acquisitions). Oftentimes, this omitted data (the silent data) is most important, as we can learn what events leads up to a bankruptcy or sudden stock collapse. This relates to the idea of blind risk usually leading to better short term rewards, whilst completely backfiring later on down the road: “The fools, the Casanovas, and the blind risk takers are often the ones who win in the short term” (117).
Image result for bankrupt companies
A third interesting point which really made me think was Taleb’s take on information, and more importantly, misinformation. He writes: “The more information you give someone, the more hypotheses they will form along the way, and the worse off they will be. They see more random noise and mistake it for information” (144). This really hit close to home, as someone who is naturally paranoid and always thinking that the solution to my problems lies in all the many books, studies, and reports I haven’t read. It also makes me think about not mentally constructing narratives (which, more often than not, become red-herrings) based off of information I have.
Finally, despite being so apprehensive on the idea of predicting the future, Taleb also claims that “In the end we are being driven by history, all the while thinking that we are the ones doing the driving.” There are many ways to interpret this great quote, but I see it as history being contextual. In other words, we don’t control history, rather, events happen and build off of one another, whether this be in random fashion or through similarly repeating patterns (the latter being my theory, though this might just be me mentally constructing a narrative based on information I have).
If Mr. Taleb ran a hedge fund (which he previously did, under the name Empirica Capital), I’m guessing his strategy would be to benefit off short term, consistent profits while always hedging against potentially huge losses. This way, he would be able to see gradual returns safeguarded from catastrophic events. Likewise, he could make his bigger bets on positive Black Swans, while pursuing a small-scale trading strategy in the time between these rare occurrences. This is where the options trading, which my previous interviewee brought up, comes into play. A strategy founded on single, greatly dispersed events which are not even guaranteed to happen may sound like an overly passive, boring, and minimally profitable way to run a hedge fund. However, in recent years, many asset management companies have gone bust by neglecting Taleb’s advice and discrediting the power of Black Swans. One such instance which comes to mind are the number of hedge funds which invested in cryptocurrencies like Bitcoin, not expecting the price to drop from ~$20,000 all the way down to sub-$4,000.
Bitcoin Graph
Another more salient example is with optionstraders.com, a Florida-based hedge fund which recently went bust after unexpected volatility caused his positions to collapse completely. The hedge fund’s manager, who I don’t need to name because he is receiving enough bad publicity already, founded this fund on the premise of ‘naked options’ (as opposed to covered options) — essentially buying either puts or calls without hedging potential losses as a method to save money. The problem with this strategy is that if you sell a call for a stock at a strike price of $50, you are hoping that the stock will remain    <= $50 up to a certain date. However, if the stock rises above 50$, the call seller loses money, since he has to buy the stock at this price which is more than what he paid for. In a very unlikely, and very unfortunate, scenario, the price of the stock could grow an incredible amount (say it goes up to $500), and the resulting losses would be even more incredible. This is what happened to optionstraders.com, which lost all of its clients’ investments (+ more) because of an unpredictably volatile period for crude oil prices.

To Conclude…

I’ve always found that finishing a quality book is an overwhelming experience, due to the surplus of information you quickly gained (not to mention trying to remember all of this!) and also because it leaves you feeling a bit empty, as if you’ve lost something. Right now, I think of what Taleb’s work can teach me about my project. Some lessons I’ve learned from reading this book are:

  1. I need to think about what a Black Swan would look like in my context, and how I could protect against this.
  2. I need to familiarize yourself with statistical terminology. This problem was apparent to me before reading the book, but it’s now been solidified. I have never taken a formal statistics class (which is fine considering Taleb’s theory on harmful over-saturation of information), but it’s clear that I have to do some more focused learning on my own.
  3. “Note that a ‘history’ is just a series of numbers through time” (119). I was so happy when I read this, as it immediately made me think about how my project is essentially using ‘a series of numbers’ (right now, open/close prices + volatility) to map historical events.
  4. Don’t be quick to construct narratives, as this can greatly distort perception and information (you try to fit new information to the narrative you’ve established, without ever thinking if this makes sense).
  5. Don’t be too stressed about shortcomings, because failure is only failure if you are failing on your own established objectives, not some nebulous criteria set by outside sources.

Trying to cover every interesting topic raised in The Black Swan would be a senseless attempt, as you can simply read the book yourself considering that Taleb does a much better job explaining these issues than I do. The topics I mentioned above are the ones I find most relevant to my independent project on using Word2Vec to map similar patterns in market behavior, and also some which intrigued me most. I’m interested in hearing what other people who have read/know of this book think about its implications, and what some important lessons I might be missing?
I’m looking forward to my next read, which is a bit more focused and specific than the previous two: Advances in Financial Machine Learning by Marcos Lopez de Prado. From what I’ve heard (and read), this is more of a textbook-type piece which looks at actual solutions to common problems in ML and trading algorithms. Apparently, “Readers become active users who can test the proposed solutions in their particular setting”, which would be great in my case, as I’m moving into more actual programming and implementation.

“We no longer believe in papal infallibility; we seem to believe in the infallibility of the Nobel, though.” (Taleb, 291)


The Mechanics of Predicting the Stock Market, from a Financial Engineering Major

Earlier this week, I had the privilege to speak with Jeremy Coste, a current student at Columbia University who studies quantitative finance and financial engineering. This was a very valuable and insightful conversation for me, as a prospective student in these disciplines. It is always great to hear from a student, as they tend to give honest feedback on the work they’re doing (they don’t have as much to lose as someone who runs their own trading firm, and whose income relies on convincing others to invest in their outlooks). Mr. Coste is in a similar situation as my last interviewee, who also graduated from a competitive engineering school and later transitioned into a prominent trading institution. Some differences are that Mr. Coste works at Bank of America, while Mr. Conerly works at a more specialized high frequency trading firm. This is an interesting note, as contrasting these two interviews could give insights into general bank trading strategies vs. HFT trading strategies.
Once again, due to the secretive nature of quantitative trading, I was unable to get a full transcript of the interview, but the answers I provide are accurate renditions of Mr. Coste’s answers.
S.A: Could you give me a general idea of what you do for work? Did you get your knowledge through school, or does this field require more personal, independent research? 
J.C: I’m currently studying at Columbia’s Financial Engineering department, but I also work as an intern with Bank of America as a quant. I have experience in deep learning, reinforcement learning, unsupervised learning. Financial engineering as a study is challenging mathematically, and about 60% of the students I work with are from China, 20% from India, with a few from France and U.S as the rest. There’s a lot of applied math, stochastic calculus, and every course has a lot of fairly advanced math. It’s sort of building up from multivariable calculus, and then moving into more programming (languages, datasets). But, like with most studies, there’s definitely a lot of independent work that needs to be done as well. 
S.A: In your experience of writing/learning about trading algorithms, is there a stronger emphasis on mathematical modeling, such as statistical probabilities, or on deep learning/big data strategies?
J.C: There’s a few main branches to working on trading algorithms, and optimization is a big one (math heavy). The stochastic calculus aspect is for modeling how the market moves, while the machine learning side is very data heavy. For all of the data courses, this is clumped into the data analyst section, while core courses focus more on quantitative aspect.
Data is a much more prominent aspect, which is where I, as a quant, focus a lot of my work. From what I’ve seen, everyone is using deep data. Data is the future, so it’s a good deal, but with all the hype that’s starting to surround AI and finance, data scientists are coming up with automated methods that can compare/contrast all these different data analyzing strategies. This is why right now, the shift is more towards jobs in researching new machine learning strategies which aren’t already being used. Deep learning algos are something of a black box, which is why there’s so much room for opportunity and research, as there’s so many variables and parameters you can change. One technique I think you should look at when working with this many variables is cross validation, as this can be super useful. These deep learning tools can also be more specific, like researching new models that could perform better in certain specific circumstances.
S.A: What’s your opinion on using deep learning to map similar patterns and events in the markets? Do you think this could be a profitable strategy? (some say the behavior of the markets don’t follow easily-recognizable patterns, others say they work in identical cycles) If so, would this type of strategy be better suited for long-term or short-term projections?
J.C: First off, I think it’s important to define what a stock event is when you’re talking about mapping similar events.  
There is definitely some correlation between patterns and stocks, even just looking at charts of the S&P, booms follow recessions. There’s definitely patterns we can see aside from that, too. Tick data could be interesting to look at, which are small pieces of stock data for say every tenth of a second, and we could use this to predict whether the next tick will go up or down. Small scale can yield higher results, as we have a bigger abundance of data on a smaller scale rather than historical stock data going 20 years back. RNNs on these small scales should give a better percentage, like 60%.
One other interesting thing to look at is, like you mentioned, predicting whether or not a stock will go up or down in the next day. This is a much more difficult task, and therefore would have a much lower accuracy of around 50%, but it would also yield bigger rewards. One thing to remember though is that even if your accuracy is at 55% for predicting next day stock performance, this is great, as you’re getting it right more than half of the time and the rewards are so large that they outweigh the losses. Obviously, a better percentage is more profitable and desirable, but don’t be discouraged by a low percentage to begin with.
Also, what banks and hedge funds will do is hedge bankruptcies through puts and calls, which is the primary use of these options. These options can be incredibly useful, as they’re very cheap. It’s important to know that standard deviations are used to calculate prices of puts/calls, and that you can use volatility to decide whether or not to buy puts/calls in certain situations.
S.A: What’s one factor you would consider most heavily when gauging the value of an asset (say: stock, bond, future, currency)?  
J.C: I don’t know if you can value a company in just one factor, and volume definitely doesn’t work to estimate value. I would think market cap is good for this, another thing to consider is enterprise value which is based on belief in the efficient market theory. Neural nets can handle a lot of factors, so it doesn’t really make sense to limit yourself to just one, especially when looking at something as complex as stock price prediction. I would consider using variable reduction (principle product analysis). With this tool, you can include say 100 factors, and then variable reduce to find which of these 100 factors are most significant.  
So, to recap, use as many features as you can find in the data, and my belief is that the more information the better. Eliminate as many heuristics as you can, because these can be especially problematic and commonplace with stock data
S.A: What are some good ressources you recommend I look at, or people I talk to, as a beginner in quantitative trading?
J.C: If you’re interested in this field, the math used, the programming used, I would take a look at: A Practical Guide to Quantitative Finance Interviews, the Green Book. This one is referenced in a lot of quant finance classes, and can familiarize you with what employers look for in recruits for their trading algorithm sectors. Other than that, most books I’ve used are just text books.
To find people to interview, I wouldn’t start with students in quant finance, because most of them are really busy with work and internships. I would use LinkedIn as a tool, and message people through LinkedIn. LinkedIn is great because you can use keywords to find people working in similar fields, so you can for instance use quantitative finance keywords to find relevant people. Some schools I know have great quantitative finance programs are UC Berkeley, NYU, and obviously Carnegie Mellon and MIT.  
Image result for takeout boxImage result for takeout box


Mr. Coste’s point about low-percentage accuracy is quite interesting, and super inspiring as well. If you think about it, consistently predicting next-day price changes with an accuracy of only 50% sounds really bad, but in the long run, you would be breaking even. Say you’re flipping a coin, and heads means you’re right about what a stock will do tomorrow (+$$) and tails means you’re wrong (-$$). In a hundred days, you should have about 50 heads and 50 tails, meaning an equal amount of profits and losses (if the magnitude of the investments are the same every time). What this also means is that a consistent accuracy of 51% would ensure profit in the long run, albeit very small. This doesn’t mean that I’m shooting for a low accuracy, because the better the accuracy, the more often you’re profiting. I mention this because I think it’s interesting how people consider stock predictions to be exercises in futility, while forgetting that you don’t need to be right 100% of the time (this is impossible), you just need to be right enough of the time while hedging losses in times when you’re wrong.
Image result for flip a coin
But, Mr. Coste also emphasizes that predicting how much a stock will gain/dip tomorrow is very profitable (especially if you predict the magnitude at which the price changes), and much more profitable than predicting tick-data, which is what HFT traders do. Again, we see that the trade-off is high accuracy for smaller profits, or small accuracy for bigger profits. Considering that HFT is dominated by the “speed arms race” (cool blog about this) and the fact that tick-data requires many more transactions and therefore many more fees, it makes sense for me to work on predicting day-to-day behavior of stocks. Also, with the datasets I have found and the revelations I made in my last post, it seems like this is the right direction to take my project.
A second compelling takeaway is Mr. Coste’s point about using as much data as possible, rather than limiting myself to a single factor to predict stock price shifts. This seems like the obvious answer, but I don’t know much about coding variable reduction and cross validation systems, so this might be another area I need to study. I wonder if variable reduction is affected by the ‘noise’ of too much data, as I have talked about how neural networks can often find misleading patterns if you give it too much information.
A third interesting note is about using options, specifically puts and calls, to hedge potential losses. I am familiar with options, and have traded them in the past, but for some reason I have neglected using them in my trading algorithm (the benefits of talking to people!). I don’t think deciding whether or not to purchase hedge options is a job for A.I, as I think this is more intuition based on volatility and how much you are investing in a certain position. With the stock data I have, I can definitely calculate volatility, and factor this into some investments.
A final take away from this conversation is that Mr. Coste acknowledges patterns in market behavior, and that A.I could be used to find these patterns. This is very reassuring, as this is the premise of the project I am working on.

How Word2Vec Accounts for Fundamental Stock Data, and Other Cool Things

How can anyone possibly use Word2Vec on a single parameter of financial data, such as opening and close prices of stocks, to make any sort of meaningful prediction? Are we simply going to ignore the heap of fundamental data behind every stock, such as P/E ratios, return on assets, etc…?
I’ve pondered these questions for a while, and I’ve finally found an answer: Word2Vec doesn’t know anything about the fundamental data behind words (grammar, syntax, or even definitions of individual words), yet it still does remarkably well at matching similar words, as we have seen.
Going off of this observation, is it safe to say that Word2Vec can match stock price shifts of similar significance without knowing anything about the nature of these stocks? My intuition is yes, because of how well Word2Vec does for mapping similar words/texts, but also because of how Word2Vec models are designed on a technical level. There are two variations of Word2Vec, continuous bag of words (CBOW) and skip-gram models, with skip-gram models being better-suited for larger datasets (meaning I’ll probably have to use skip-grams). However, both models work in similar ways: You read through a corpus of text, generate vectors for each unique word within the text, and then create new vector embeddings (what we’re actually looking for) based on the context these words appear in.
So, how are these initial vectors created? It’s quite simple, actually. For the first word in the corpus, you start by creating a one-element array, which looks like: [1]. If the word is ‘The’, then the word ‘the’ corresponds to the first element in the array. For the next unique word you encounter, you create a new array: [0, 1], but you must also add a zero to the end of the first array as well: [1,0]. If the second word is ‘mighty’, then the second element in the array corresponds to mighty. Whenever an array’s second element == 1, then you know the array corresponds to the word ‘mighty’.
You go on like this until you’ve accounted for every unique word. You will end up with an array whose size == the number of unique words, and each word is represented by a huge array full of zeroes with a single 1 at the location which corresponds to that word. For instance, if our data set had 20 unique words, the array for the word ‘mighty’ would be [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0].
One observation you can make right off the bat is that the more unique words you have in your dataset, the longer these individual arrays will be, and more computational energy (i.e: time, money) will be required. So, how do we reduce the size of these vectors?
Image result for neural network 1 hidden layer
Recall that Word2Vec neural networks have three layers, with one hidden layer. The first layer, the input, are the arrays we generated above. The second layer, the hidden, is where array compression happens. This array compression works because the hidden layer is simply a matrix of weights (a matrix of dimensions: Number of unique words X desired vector size, between 50 and 300 dimensions). We multiply the original array for a word by the learned weight matrix for that word, and voila, we have a vector (still an array) of between 50 and 300 dimensions (elements).
Hidden Layer Weight Matrix
How these weights get adjusted is a story for another time, but in general, they work by calculating probabilities of target words appearing in a certain context:
Training Data
You use nearby words to generate training samples, and you adjust weights for the target word (blue) by seeing how often it appears, throughout the rest of the dataset, in proximity to those samples (how likely ‘quick’ is to appear next to ‘fox’ would be used to adjust weights for ‘quick’).
By the end of training, we are not even interested in the output layer of the neural network, as all we need are the vectors created by multiplying the input layer by the weight matrix. These embedded* vectors can then be used to find words (or events in the stock market) of similar significance.
* Embedded is just another way of saying compressed, so embeddings are essentially compressed vectors.

What Does a Falling Tree Teach Us?

So why I am bringing technical definitions into this? To show that Word2Vec learning is entirely based on context. To me, the stock market is a contextual thing, where future activity is predicated on what has happened before (cause/effect relationships). Many would call me wrong to say this, but no one is ever completely right about the markets, so it is worth taking a look at (Bridgewater Associates should be on my side for this one). Therefore, Word2Vec-generated embeddings do not need to include any fundamental background data about stocks, because we don’t care about these fundamentals when looking at context. If a tree falls over in a forest and hits a nearby house, we don’t care about the exact sedimentary composition beneath the tree’s roots at n seconds before the tree fell, we care that we have learned a new thing: Trees falling can lead to houses being destroyed, and we can use this knowledge to make predictions about what will happen when more trees fall in the future. Sure, the sedimentary composition might have caused the tree to fall, but we don’t know for certain, and if it did, it is reflected through the fact that the tree fell, and it is therefore irrelevant.
Image result for falling trees
This reminds me of a point brought up by HFT trader Tom Conerly when I spoke with him, which is that high frequency traders don’t look at fundamentals of stocks, because they believe all fundamentals are proportionally reflected through market price. This saves space, time, and energy, as they can avoid processing mountains of other data which they would otherwise have to look at.
So, now that I’ve decided to ignore fundamental data when creating my neural network, the next big question I need to answer before actually uploading data is which data I want to train my net on. I’m still thinking of starting with basic open/close prices, and making further judgments based on how well this performs.

Areas of Interest, More Potential Datasets

Another interesting financial correlation I’ve read about, and which can be potentially learned through Word2Vec-type A.I, is the relationship between Federal Reserve Funds Rate (interest prices) and general stock market behavior. As interest rates go up, the stock market tightens and starts to go down.
This is super interesting and incredibly useful information. Looking at the graph, you notice there is a delay period between a shift in FFR and subsequent changes in the market. You can definitely use Word2Vec, or some other neural network, to approximate how long this delay period is, and then use that information to make predictions about shifts in the market caused by changes in FFR.
A second interesting correlation I’ve looked at is the relationship between U.S GDP growth and stock market performance. In essence, the idea is that when GDP is projected to grow, there will be a bull market with inflated prices. The opposite is true as well, with poor GDP outlook reflected through bear markets. So, the goal is to learn what indicators anticipate a growing GDP. There’s a really good post on Medium about this, which claims that the best indicator for GDP performance is ISM’s monthly index score. Again, you could train a Word2Vec model to predict GDP growth, and use that information to make stock price projections.
In addition to doing research on advances in A.I for finance, I am going to spend a lot of time researching known correlation relationships in the market, and seeing if these can be applied as datasets for my neural network. Speaking of personal research, I just finished The Black Swan by Nassim Taleb, and will be writing a reflection post soon which should capture the main points I learned from Taleb.


Most of the visuals and definitions I used for explaining Word2Vec Skip-Grams was through Chris McCormick’s tutorial, which I highly suggest reading. 

Preparing my first neural net, ft.(Considering viable data sets)


It has been a pretty busy stretch of days for seniors all across America, with early applications being due four days ago for most colleges. Upon submitting my application, I took a second to step back away from the madness and think about my research into quantitative trading, among other things. At a time when I am so invested in this project, it is difficult for me to transition away from my research and move into coursework, knowing that time I spend doing readings for a class I’m not so interested in could be exchanged for building my neural network. This makes me wonder about being a well-rounded scholar, and if it’s really better to be generally educated on various different topics rather than being an expert on a single, specific topic. Through my research, I am beginning to see how many research papers, data sets, and A.I strategies have been used for quant trading, and am feeling overwhelmed by all of this. Yet, one thing I like to keep in mind is a quote from Nassim Taleb’s The Black Swan (which I am still reading, and would definitely recommend you read):

“The more information you give someone, the more hypotheses they will formulate along the way, and the worse off they will be. They see random noise and mistake it for information.” – Taleb, 144

So, according to Taleb, information should be assessed by quality rather than quality — It is not about how much data you have, but what data you have.
Image result for quantity vs quality

How’s the coding?

Speaking of data, I am now at the point where I need to sit down and seriously consider which data sets I can use for my Word2Vec (or Market2Vec, or Stock2Vec, or Vec2Money) trading algorithm. Right now, on the programming end of things, I’m at the point where my neural network is structured, and is ready to be customized for more mathematical data rather than the natural language I have been using up until now. Essentially, I have python code (running from a Linux computer where my Tensorflow, Universal Sentence Encoders, and Word2Vec models are installed) which takes a data set as an input and creates embeddings of this data. In past cases, I have used Q&A data to train my neural net on, and this has yielded very interesting results. I will go into more technical data in my next blog post, but the general framework I have goes as follows:

  1. Upload data set.
  2. Read through data in effective manner (either organize the data well before uploading it or tailor your reader function to the data you have).
  3. Create embeddings for the input data.
  4. Come up with some query, and make an embedding for this query.
  5. Compare query_embedding to mainData_embeddings, see which vectors map closest to each other.
  6. See causes/effects of the data_embedding you find, and use this information to make a prediction about your query.

Right now, steps 3 and 6 appear to be the most challenging and thought provoking. For step 3, what is the best way to make an embedding for stock data? If, for instance, I train my net with a dataset of historical open/close prices for stocks, what parts of this data should I use for creating my embeddings? More importantly, HOW do I create an embedding for open/close prices? I will go over this second question in more detail in, you guessed it, my next blog post.
Here is the problem with step 6: Once you find a data_embedding mapping to the query_embedding, how do you use this information to make a useful prediction? In order to do this, you would need to understand the significance of this data_embedding. This is a whole different problem on its own, and the answer will depend based on what data set I am using. Going back to the example of open/close prices, how would we find the cause/effects of a data_embedding in this case? My intuition is to simply look at the open/close prices of the next day, and if the close price is up, then there is a higher chance that the query_embedding will go up the next day as well. This would require a system which keeps track of ‘neighbors’, so that we know which data_embeddings precede/follow the data_embedding we are interested in.
The good news is that steps 1, 2, 4, and 5 are relatively simply, and can be done pretty easily.

Who have I been talking to?

This past week, I have continued my outreach efforts, emailing financial engineering professors, capital fund managers, independent figures, and even some people involved in A.I startups. Though most of them left me on read, I am getting a first-class education in persistence and perseverance. I have been able to connect with one former financial engineering graduate student from Columbia University, and am planning on speaking with him soon (stay tuned!). I am going to ask similar questions to those from my last interview, but also want to add-in some of the more specific questions which arose in my last two posts. If anyone has suggestions for good questions to ask, share them in the comments (always appreciated).
I have been reading a lot on Quora to see what people think about deep learning and finance. There seems to be an active debate, between those who believe in the future of A.I for predicting market movements, and those who believe A.I is incompatible with the noisiness of stock data. There is a lot of research being done on this as well, on how to manage noisy and misleading data.
I brought this issue to my dad, who mentioned that precious metals markets might be better for modelling, as there are less external factors involved with changes in, say, gold prices. Less factors means the data would be less noisy, which would mean more insightful neural network predictions.
I also have been reading about Marcos Lopez de Prado’s new book, Advances in Financial Machine Learning, which I will definitely want to read. De Prado is another firm believer in the powers of A.I, who thinks that the future of trading will be dominated by competing trading algorithms.

More Abstract

Another interesting note is how involved China seems to be in the revolution of financial engineering. There are, of course, many American, Indian, European researchers publishing on this field, but the great majority seem to be written by Chinese scholars. One clear example of this is from Two Sigma’s recent post about their ‘favorite’ papers on A.I from the International Conference of Machine Learning 2018. The article, written by three Asian researchers at Two Sigma, features a handful of cutting-edge pieces predominantly written by Asian researchers. This is very interesting to think about, especially considering how China publicly stated that they want to be the world leader in A.I by 2030. Perhaps this surge in quantitative trading research by Asian authors is indicative of China’s plan, and says something about which side China takes in the debate of financial deep learning.