VCs, VPs, and P.h.Ds : Tweaking my Algorithm with Inputs from Various Sources

This past week, I have been trying to answer some of the most pressing technical questions about my trading algorithm. At the end of the day, all of these questions can only really be answered through research: Changing parameters, adjusting weights, removing factors, etc… and seeing whether or not this improves results. I still think it is foolish not to use the resources around me and to learn from others’ knowledge/experiences, as anyone can work on their own and bang their head against a wall all day.
The conversations themselves didn’t have enough new, raw information to stand alone like some of the previous interviews I’ve done, so I’ve decided to compile all four together.
First, I met with Andrew Merrill, my current computer science teacher:
S.A: My code works by receiving real-time market data and creating an embedding for this data. Then, I compare this real-time embedding to the data set of historical embeddings. This comparison yields a top-k list, wherein my code returns a list of historical embeddings which are closest to the real-time embedding (embeddings are compared with cosine distances, which calculate percent similarity). Once we find a match (where the top result in a topk list is above 90% similarity), we need to calculate certain parameters to decide whether or not we should proceed. 
A.M: Yes, that makes sense. So you’re actively cycling through real-time data and waiting to find a pair of embeddings which match. And you’re just ignoring the embeddings which don’t match?
S.A: Yes. 
A.M: Okay. Once you find two embeddings which match, you must examine the real-time embedding’s entire topK list. For instance, what if our real-time embedding matches a historical embedding whose expected outcome is +1.2%. This is good, but what if the second item in the topK list has an expected outcome of -2.6%? And the third item has an EO of -0.01%? This is rather meaningless information because the topK list is too volatile, and we don’t have definitive evidence that our real-time stock prices will go up or down.
S.A: So we need to calculate the volatility of the topK list before deciding to act on a certain investment? I’d imagine we have to come up with some ‘volatility score’ for each topK list, and use this volatility score into the Confidence Factor. 
A.M: Yes, you could definitely calculate volatility, or you could use weighted averages to gauge the significance of a topK list. You could use the percent match as the weight, and multiply that by the expected outcome. This way, you’d factor in both expected outcomes and match percentages into a single number which encompasses the significance of the topK list. So, my guess is that you would calculate this weighted average for each topK list, and if if is above a certain threshold — you would have to set this threshold — you have verified that this topK list is valid.
(Below is a visual representation of what each element in a topK list contains. A topK list is returned when querying an embedding, and contains an ordered list of closest matching embeddings.)

TopK List

Andrew’s comment on how we need to analyze the volatility of our topK lists was very insightful, as this never came to my mind. His point is completely valid and logical: If the embedding which matches closest to our query has an Expected Outcome of +2.3%, but all of the other embeddings in our query’s topK list have Expected Outcomes like -1.2% and -0.4%, then this investment is more questionable. If a query’s topK list consistently anticipates outcomes > +1.00%, then this is a much more valid investment opportunity, according to our Word2Vec model.
This is some simple code of a weighted average function which analyzes a single topK list, and returns a single weightedAverage:
This code (or, some variant of this code) will be used whenever our query matches some embedding, and we need to check if this query’s entire topK list is viable. By viable, I mean not too volatile, not too marginal, and low deviation.
In addition to speaking with Andrew Merrill, I got a chance to speak (briefly) with Vladimir Prelovac, former Vice President of GoDaddy and founder of ManageWP. Mr. Prelovac has expressed interest in the applications of NLP embeddings, and so I wanted to see his thoughts on embeddings playing a role in quant finance. The answers weren’t very long, but they were meaningful and to-the-point, and I managed to get useful insight from them. Here is the transcript:
S.A: From what I’ve seen, A.I in finance is reserved to high frequency trading which analyzes patterns on the micro-scale, normally a couple seconds or less. Why do you think A.I is not being used for more long-term patterns (patterns spanning across 1 day to a couple weeks)? Is this strategy not profitable?
V.P: Think of market as of the weather. It is easier to predict what will happen within next one second than one year.
S.A: When training my NLP models, is it better to create specific embeddings for specific stocks (i.e: AAPL embeddings and FB embeddings), or is it better to train my model on all stock data available (this would give us more data, but the embeddings would be more generalized)?
V.P: When doing word embeddings you want to use all available corpora as the results will get better. Id assume the same principle will apply.
S.A: Once I find a set of price shifts in the market which are contextually similar (the real-time stock market data matches some past price shift in the market), what else do you think I should consider when deciding if it’s worth investing in the stock? In other words, what are the most important factors in deciding whether or not a stock will go up within the next few days?
V.P: That is called factor modelling and there isn’t a single good answer. Ideally the A.I should discover this all by itself, i.e through reinforcement learning.

One interesting takeaway here is that Mr. Prelovac and I agree on question two. The idea that unsupervised NLP machine learning models improve when given more data speaks to the potential of NLP in quant finance, since there is a massive corpus of stock market data which grows every day.
I really liked Mr. Prelovac’s answer to the third question as well, since he brings up the idea of ‘overarching A.I’ which learns how to adjust the parameters of a trading algorithm on its own, rather than relying on an external source to change parameters and observe results. This shouldn’t be too hard to code, as the algorithm already knows whether or not it is profiting, and this knowledge is enough to infer whether or not parameters need to be adjusted for better performance.
Think of such a system (an overarching A.I system which can adjust trading algorithms to improve performance) working in a way comparable to Word2Vec, in that weights are adjusted based on three inputs: push, pull, or stay. If two words appear closely in the same context, their embedding vectors are pulled closer to one another. If two words appear in the same context but far away from one another, their embeddings are slightly pushed away from one another. If the overarching A.I notices the trading algorithm performing well, it will try to identify the changes which led to this good performance and amplify these positive changes. This amplification is done by changing certain parameters, like increasing training window on the Word2Vec model or lowering the match threshold when querying data. The alternative takes place when the overarching A.I notices that the trading algorithm is not performing well.
Finally, I spoke with Kenny Nguyen, my faithful math teacher and statistics P.h.D. I also asked him how I might analyze topK lists to decide whether or not a pair of matching embeddings is significant enough to invest in. Just for clarification, what I mean by this is that when our real-time query data matches a historical embedding, we can use what happened after this historical embedding to predict what will happen after our real-time data. The question is how to decide if we are confident enough in a pair of matching embeddings to do this.
According to Kenny, I need to use Bayesian Inference to consider all of the different factors involved when we find two similar embeddings (match percentage, expected outcome, expected outcome volatility, potential for profit, etc…). Bayesian Inference, to be concise, is a branch of statistics which focuses on using different input streams of data to calculate (and adjust) our expectations for what will happen following a certain event. As you might imagine, Bayesian Inference is hugely popular in quantitative finance, because it revolves around predicting the future using past data. In its simplest form, Bayesian Inference, or Bayes’s Rule, compares the probabilities of certain things occuring. In the end, we get a formula which looks something like this:

Here, P(A|B) is the probability of A happening given that B has happened, and P(A) is simply the probability of A happening. 

All in all, I received quite a bit of helpful information from these conversations. Looking ahead, I’ll definitely want to write a full-length post about quantum computing and Bayesian Statistics, as I believe these are two hugely important topics in the current world of quantitative economics.

Leave a Reply

Your email address will not be published. Required fields are marked *