Preliminary Update: My Code and Results

This year, I have attempted to delve into the world of quantitative finance through independent research and coding, as well as speaking with people actually working in this industry. If you have followed my blog, you would know that I’ve been posting a lot this year, but the great majority of these posts have been speculative or analytical pieces. There’s certainly nothing wrong with this, but I feel like I haven’t given enough attention to my actual code. If we were on Wall Street right now, the code would be all that really matters, so I feel like I should exercise some due diligence by showing you all what my trading algorithm has been up to.

Let’s begin with some background: The whole concept behind my project this year has been to use natural language processing (NLP) machine learning algorithms to analyze historical stock market prices, and to eventually find some new market patterns through this analysis. Specifically, I’ve been using Word2Vec, an ML model which works by reading through a corpus of text, creating vectors for each unique word, and adjusting these vectors by observing which words appear in which contexts. These vectors are referred to as embeddings, and by the end of Word2Vec training, words which appear in similar contexts will have embeddings which appear close to one another in a vector space.

On the left, you will see an example of a Word2Vec model which was trained on Estonian and English. Since the word “cat” will likely appear in the same textual contexts across different languages (a cat in Estonia is no different from a cat in California, it’s just a different word), the word embeddings for “cat” and “kass” appear very close to each other in the vector space. On the right, you will see a 2-D representation of word embeddings, where each dot represents a different word embedding. As you can see, words with similar definitions like ”country” and “union” appear closely to one another. What’s interesting is that Word2Vec was able to group words which aren’t similar by definition, but by assumption and inference. For instance, you can see that the word “island” is right next to the word “sea”, and “together” is next to “energy”. How can a computer know that island is related to sea? This is the type of inference you would expect from a small child, not a computer! That’s the truly amazing part about NLP machine learning, and this is a large part of the reason why I chose to focus on this project. If Word2Vec can infer which words are similar just off a simple skip-gram model, I imagine that it might be able to infer something from reading through stock prices.

Diagram showing word vector (embedding) adjustment.

But how could natural language processing work for stock prices? These are numbers, not words?

You are right to ask, and this was one of the biggest challenges I faced in my project. I tried to come up with clever ways to convert price shifts to words as input vectors, but in the end, I realized that I was overthinking.

I decided to treat a series of historical prices (e.g: Daily AAPL close prices) as my text corpus. So, rather than having Word2Vec read through a Wikipedia page, it would read through a CSV file of closing prices for a particular stock or ETF. Each closing price (like: ‘$124.2’, ‘$0.25’, ‘$6.73’) would be a word in this corpus, and Word2Vec would create embeddings for each unique closing price much like how it would make embeddings for each unique word in a piece of text.

CSV file of daily AAPL stock fundamentals, including opening/close prices. My Word2Vec model reads through files like these and treats the numbers as words.

But to me, a closing price isn’t significant enough. A closing price is not an action, but a result of a day of trading. So, I decided to use daily price shifts as my metric (price shift = closing price – opening price).

Dow Jones price chart.

Take a look at the chart above; These are historical prices for the Dow Jones dating back 100 years. If you look at the chart, nearly all of the sharpest price drops are followed by sudden growth. This is purely psychological, as people rush to buy when prices reach record lows. Yet, if Word2Vec could detect this pattern, it would mean that the neural network has made an inference about stock behavior which integrates psychology, chart analysis, and cycle interpretation! And all of this is done indirectly, simply because we are using neural networks which learn from context. The whole point of Word2Vec is to understand words based on their contexts. If it is true that the stock market behaves in a contextual manner, meaning future prices are influenced by recent price changes, then Word2Vec is an ideal ML model for quant finance.

So, we have trained Word2Vec on stock prices to create embeddings for unique price shifts. There’s now a file full of embeddings for each price shift, and we can query through this file to find similar embeddings. This is no different from the example above where we compared the embeddings of “cat” and “kass”, only now we are trying to find contextually similar price shifts in the markets. This is an incredibly powerful capability, and this is the basis of the entire Word2Vec trading algorithm.

A traditional trading algorithm takes real-time market prices as inputs and runs these inputs through some algorithm. My Word2Vec trading algorithm is no different, as it follows a general schema to return an output of whether or not to invest. As with all algorithms, there’s a lot of steps in between input and output. Here are the steps my Word2Vec trading algorithm takes:

The first 3.5 steps should be clear by now, or else I’ve done a poor job explaining (or you’ve done a poor job reading). The fourth and fifth steps are crucial as well, and these involve what we do when we find an embedding which is similar to our query input.

When you query an input, this input will likely be a sequence of stock price shifts over a certain period of time. Then, the algorithm will return which trading day(s) in history map most closest to this input query. If this search yields a very closely matching embedding (similarity of >95%) with a promising expected outcome, then that might seem great at first. However, each search doesn’t only yield the single closest-matching embedding, but it also returns a list of the top 5 or 10 or 15 closest matching embeddings (we call this a topK list). This is pretty much the exact same system as Google uses when you search a query.

In honor of Damian Lillard’s shot and the Blazers’ historic playoff run.

A google search for “Damian Lillard” yields a series of matches (links) ordered by relevancy. Note that Google’s algorithm determines which matches are most relevant, much like how our Word2Vec network determines which embeddings are closest.

In our algorithm, querying “+0.4%” yields the above list. Each item is a different historical embedding within our data set. The first match has an expected outcome of +0.8%, which means that based off this first result, the algorithm thinks our query will be followed by a gain of +0.8% (which would be great!). However, the second match has an expected outcome of only +0.2%, while the third has an expected outcome of -1.5%. After noticing this inconsistency in the topK list, you would be less confident about making an investment than you were after seeing the first match.

With this in mind, the algorithm needs to be able to read through a topK list and gauge its volatility and consistency. If a topK list shows volatile expected outcomes which vary greatly, then the confidence in the investment goes down. My approach to this problem was rather simple, as I run a series of calculations to find a confidence factor. So, as soon as our algorithm matches the query to a series of historical embeddings, it reads through the series of matches (the topK list) and calculates a confidence factor. If the confidence factor is high and the expected outcome is in our favor, then the algorithm proceeds to suggest an investment.

This algorithm currently works, and I’ve trained it on multiple stocks and ETFS. Here are some sample results from querying various price movements in AAPL stock:

This is simply showing how many lines are read through during training, but it still looks cool.

As you can see, my code returns an embedding which matches the query. In these cases, it appears that the matching embedding is always a price shift of the same magnitude (querying +0.3% will most likely return the embedding of a price shift of +0.3%). This is happening because I am only querying a single day rather than a sequence of days (I explain this better later on in the post). In addition to finding a match, my code uses topK list analysis + other predictive factors to return a prediction for what will happen to the stock’s price over the next 3 days.

It would be nice to see what happens when you train on stocks other than AAPL (or on ETFs or entire sectors or even the entire market), so that is precisely what I am working on right now as I’m making the final adjustments to my code.

The pieces are coming together, but there’s still work to be done.

These are preliminary results for several reasons.

Firstly, the queries are single-day price shifts instead of sequences of price shifts. Ideally, such a program could analyze week-long, month-long, even yearlong patterns in stock prices and return similar patterns in stock price behavior from the past. For some odd reason, my program cannot make a query_embedding for arrays with multiple elements (i.e: a sequence). Right now, I’m working to solve this problem with Universal Sentence Encoder so that I can match longer sequences of price changes. Once this is done, my algorithm can be used to figure out what stage of a growth cycle we are currently in, to deduce whether or not we are approaching a relative maximum/minimum for stock prices, and to interpret how a stock will react to a particularly good/bad week of trading. It’s nice to see that the program works with single-day price shifts right now, but this is ultimately meaningless. At the same time though, I know that I am just one step away from a working market analysis tool.

Second, I clearly need a nicer interface. I’m thinking of creating a user interface for this tool where users can select a stock/ETF, choose a time window as a query, and have my code return a visual representation of its prediction (through something like a mock price graph or a table of values). Also, once I have a nice way of packaging these predictions, I can implement some of the portfolio management functions that I learned through Quantopian. This would allow for automatized investing and portfolio optimization, which is what a trading algorithm would do in an ideal setting.

Third, my preliminary results raise some interesting questions which should be addressed. For instance, why does my algorithm return embeddings so far in the past? In each of the four queries, the top result is an embedding from pre-2000, when the data I am using spans from 1980 to 2017. Is my algorithm ignoring more recent embeddings because it stops searching after finding a match early on in the data set, or do older price shifts truly match more closely to our queries?

Fourth, I need to backtest this algorithm to see how it stacks up compared to established trading algorithms. Once I have backtest results like loss % and Sharpe Ratios, I can market my product more easily and I can also fine-tune my product to achieve better and better results.

Before I can optimize accuracy and returns, I need to backtest my algorithm.

This is where my algorithm stands right now, and every day I am getting closer to a working product which can give users meaningful insights into market patterns. I will continue to post updates as I continue with my work, and I’m still open to advice, suggestions, and criticism as I do so (so feel free to reach out if you want to get involved!).

These slides were taken from my final presentation to my compsci research class. You can view the presentation in its entirety below:

Quantum Computing’s Role in Predicting the Stock Market

Trading algorithms don’t always have to be about speed, but in most cases, they are. As is widely documented, one surefire way to make profit in trading is by finding market inefficiencies before others do. This is what inspired Thomas Peterffy to write code which could quickly compute the true price of options before other mathematicians could do this by hand, and what inspired Spread Networks to spend $300 million constructing a dark-fibre cable which could send prices from Chicago to New York at near the speed of light. In an even more general sense, it’s how all trading works: You buy stock before others realize it’s valuable, and you sell before others realize it’s overpriced.

This race for speed has manifested itself in many fascinating ways, and is a big part of the reason why quantitative trading is so interesting to me. We now have high frequency traders who work to find micro-patterns in book pressure for stocks, hedge funds which use satellites to track the number of cars in retail store parking lots, shortwave traders who use cell towers in Chicago to send and receive prices from London and Frankfurt, and even traders who are making transactions through a blockchain-derived distributed ledger system. My research on quant finance this year has given me insight into the many strategies used for attaining max efficiency (lowest latency possible) in trading. For instance, the company which is most invested in shortwave trading is actually Jump Trading, and I interviewed an engineer at Jump Trading earlier this year. I think this research is well reasoned too, considering how crucial speed is in this field.

The race for speed, much like many other walks of life, tends to follow a recurring cyclical pattern: First there’s a new breakthrough, often technological, which opens a floodgate of new opportunities (There was the carrier pigeon, then there were cars, then planes, and now the internet). This new breakthrough becomes the nexus of profit until the market becomes too saturated with competition. The new technology starts to lose its edge as competitors catch up (speed is only relative, and if others are travelling the same speed as you, you’re no longer going fast). This can be called a ceiling, and this ceiling will continue to impede progress until a new breakthrough is made. The amazing part of this cyclical phenomenon is that it is entirely logical: More competition leads to more research in hopes to beat the competition, which leads to more breakthroughs.

Cyclicality appears everywhere in life, and is the reason why clocks go in circles.

People love to speculate about what the next breakthrough will be, and over the past couple years, quantum physics (or quantum computing or quantum mechanics, just slightly different variations of the same concepts) has received a lot of this spotlight for good reason. Quantum physics has been studied and recognized for while, with Max Planck, Erwin Schrodinger and Werner Heisenberg laying the groundwork at the start of the 20th century. Recently though people have speculated that the laws of quantum physics, which I have been trying to understand through Terry Rudolph’s Q is for Quantum, can be applied to produce much more efficient computing systems. The general idea behind all of this is that normal computers use binary digits (bits) which are always either 0 or 1, whereas quantum computers would use quantum digits (qubits) which can exist in two states at the same time, allowing for four basic units of computation as opposed to two (00, 01, 10, 11). With more basic units, quantum computers would be able to do certain massive calculations which are unattainable with normal computers (for instance, breaking RSA encryption).

Are you starting to see where this is going? Of course, the people on Wall Street immediately associated these technological advancements with money. If quantum computers can really offer such a massive speed advantage over traditional computers, it is worth looking at their potential applications in finance. I did not plan to fully grasp the intricacies of quantum mechanics through Terry Rudolph’s book, but I did want to get informed. After all, if this really is the future of computing (and I’m sold), then it’s worth getting on board now.

The race for speed on Wall Street continues with quantum mechanics.

So, what did I learn from Q is for Quantum by Terry Rudolph?

The book is purposefully simplified, and the specific scientific terms are never explicitly mentioned. Instead, the book is written as a series of thought puzzles which are explained and solved through quantum physics. The first thought puzzle goes as follows:

Visualization of a Pete Box, taken from Terry Rudolph’s book.

A Pete Box is a box which can take either a black ball or a white ball as an input. The Pete Box then outputs a ball of random color, either white or black, but there is no discernible pattern to predict which color will come out. In other words, it’s truly random and unpredictable. So how would such a box work?

This problem sounds simple, but it is actually impossible to solve when thinking about it in normal computing terms. Terry Rudolph then goes on to explain that using quantum mechanics, we can actually create a machine which can do this (I won’t explain how to avoid spoilers, and also he explains it better than I can).

After reading and reflecting upon the whole book, I identified three main overarching lessons which I learned. Here’s a list of them, and I’ll go into each one in detail:

The misty state of a white ball.
  1. Superposition: I’ve already begun to explain this, but the idea is that particles can exist in multiple states at the same time. It is said that these states are superimposed on one another, and the particle lives in all these states simultaneously until it comes into contact with another entity (like a human trying to see the particle, at which point it collapses into a single state while ignoring all the other states it was in). Superposition is probably the most crucial concept to understand if you want to understand quantum computers. So a single electron can be both spin-up and spin-down at exactly the same time, until it is interfered with. Terry Rudolph refers to a superposition of states as a “misty state”, and this is visualized through the figure above.
  2. Entanglement: This one is actually super crazy and I still don’t fully grasp how it works. From how I see it, superposition and quantum states are more essential to the idea of building a useful quantum computer. Quantum entanglement is the idea that two entities can be linked together no matter what the physical distance in between them. If an electron is spin-up in one end of the room, a different electron entangled with this electron will guaranteed be spin-down when you measure its spin, even if it is on the opposite end of the room. This, more fundamentally, goes back to the idea of misty states: These two electrons are the product of the same misty state, and their misty state cannot be reduced or simplified to a more basic misty state. So, the real states of the two electrons are co-dependent on one another, going back to their single misty state from where they originate.
  3. Quantum Computers: The whole reason we’re learning about quantum mechanics! Using the concept of superposition and multiple coexisting states, we can theoretically build computers which use quantum digits as basic units of calculation. A quantum digit, or a qubit, contains twice as much information in two digits than two binary digits (00, 01, 10, 11 vs. 0, 1), so every n qubits can represent 2^n pieces of information. This superposition would allow a quantum computer to vastly outperform normal computers for brute force tasks, like factoring massive integers.

Finally, I think the concept behind the book is fascinating; Terry Rudolph believes that quantum mechanics can (and should be) taught to all middle/high school students. Traditionally, quantum physics are a subdivision of physics which are only taught to high-achieving college students, usually at a graduate level. But in reality, these concepts of entanglement and multiple states are nothing spectacularly complex. It’s easy to think about how an electron can either spin up or spin down. For me, the hard part is wrapping my head around the fact that on a tiny scale, single entities can exist in multiple states at the same time.

This is crazy to think about, because it challenges so many preconceived notions we have about the world and our reality. MIT Technology Review recently published a study which proved that a photon with superimposed polarizations can be perceived two different ways, depending on who is looking and at what time. Essentially, the photon can either be polarized vertically or horizontally, emitting either a vertical or horizontal ray of light. The concept of quantum superposition claims that the photon exists in both these polarized states at the same time, and this was proved when two viewers observed the photon’s light and recorded different observations.

If you’re interested in my philosophical and speculative take on the implications of these ‘multiple realities’, you should check out my new website for abstract thought (it’s a work in progress A.T.M but it should be public soon, and it should also be a refreshing shift away from STEM and finance).

The whole field of quantum mechanics is vast and there’s so many other interesting things to talk about, but for now I’ll leave that to the physicists. To avoid going off an infinite tangent, we should consider our initial question, which is how quantum computing can play a role in improving trading strategies.

The main idea behind this is speed: Quantum computers, if feasible, would be able to do ‘brute-force’ problems much, much faster than normal computers because qubits allow for twice as many units of computation than normal binary computers.

Chemists and drug manufacturers have reported that quantum computers could be used to calculate all the different possible chemical reactions set off by a new drug, a facet of quantum chemistry. This goes back to the idea of how a single qubit can represent two different states, therefore n qubits can represent 2^n different states. With a decently large amount of qubits working together, you could encapsulate all the different states that a molecule can be in before, during, after a chemical interaction. This would allow chemists to observe which drugs yield the most desired results by comparing their potential outcomes (and the outcomes of those outcomes, and so on), as represented through the superimposed (misty) qubit states.

Using this same logic, quantum computers could be used to analyze and compare all the possible outcomes after an event in the stock market. In chess, AI has already been trained to compare every possible move in a match and to choose the best one. This is impossible to do with stocks using normal computers because there’s too many factors and outcomes to consider and cross analyze. However, with the power of qubits and superposition, we could analyze all of these possible scenarios in a matter of minutes, even seconds. The misty states which Terry Rudolph talks about are really just possible states for a stock to exist in (+1.2%, -0.3%, -2.5%, 5.2%), and large misty states encompass many possible states.

Another more far-fetched application would be to use the power of entanglement for high frequency trading. Entanglement says that the state of an entity will directly affect the state of its entangled entity near instantaneously, no matter how far apart these entities are. This has allowed physicists to successfully experiment with teleportation (on a tiny scale though, don’t get too hyped!). The concept of entanglement could then also be used to immediately transfer data like stock prices, through entangled quantum computers at different locations.

Finally, there are quantum neural networks being designed, which could be used to enhance the self-learning mechanisms in trading algorithms. This is one of the newer developments, and I will write more about this once I have the time to do adequate research.

In the end, you must remember that quantum computers are not better for every computational task than traditional computers! Quantum computers are no faster than normal computers for simple operations, and even some complex ones. The true power of quantum computers lies in large calculations with many factors (brute force problems like factoring, cross-analysis, matrix multiplication, etc…). As I have learned through my research on quant finance, the stock market is a place full of ever-growing data and plethora of signals/indicators. With that in mind, it is starting to make a lot more sense why Wall Street is so keen about getting their hands on a universal quantum computer.

What’s it Worth, Anyways? The Things I Learned from a Value Investor

Yesterday, I had the privilege to speak with Scott Conyers, a local Portland-based investment manager who runs the fittingly-named Scott Conyers Capital Management. Mr. Conyers is a self-proclaimed value investor who takes a conservative approach to investing, with a primary focus on analyzing the intrinsic values of companies and their respective stocks.

A week before this interview happened, I started reading One Up on Wall Street by Peter Lynch. I bring this up because this book has made me realize how I have been researching new groundbreaking technology in complex (and oftentimes abstract) mathematical modeling this entire year, but I have almost entirely neglected traditional approaches to investment. Sure, it’s nice to be on the cutting-edge, but you also need to understand what exactly you are cutting. I’ve interviewed high frequency traders, market makers, quants, VCs, and financial engineering students, but not once have I spoken to a good old value investor. After my conversation with Mr. Conyers, I’m starting to feel a bit stupid for not reaching out to one sooner (it’s funny how reading advanced papers on complicated subjects can make you feel comfortably intelligent, which is actually one of the hallmarks of dumb people [dumb people often think they’re the smartest etc…] but I can philosophize another time).

Anyways, back to Mr. Conyers and value investing. Before trading algorithms began to steadily take over Wall Street beginning in the 1980s, pretty much all risky investing was value investing. Back then, you could invest in bonds or treasuries, but these tend to be much safer investments than stocks. Nowadays, we’re so spoiled with complex positions such as puts and calls that we forget how risky stocks can be. The general idea behind value investing is to speculate the future or current value of a company, and to invest in its stock accordingly. In principle, it sounds simple: If the current intrinsic value of a stock > current market price of the stock, then invest in the company. Likewise, if you speculate that the future value of a stock is greater than what it is currently valued at, then you invest again.

In practice, things can get muddier. For one, how do you calculate the intrinsic value of a company and its stock? For two, what if the speculated growth isn’t significant enough to out-pace fees and inflation?

Mr. Conyers was able to provide some invaluable insight into these questions, and more:

*Note: The conversation has been paraphrased.

S.A: Mr. Conyers I would like to thank you once again for offering to help with my project and my research. So, to start things off, I understand you work with finding inefficiencies by comparing the intrinsic value of a company to its market price. Could you explain how you go about finding the intrinsic value of a company?

S.C:I’ll give you an idea of how my business works, which isn’t too different from any other money management firm. You get an initial investment from a client, and you are given permission to invest money into different stocks and assets. Obviously, my goal is to make money for the client, and I take a set percentage of the profits. So there’s a lot of ways to go about this, this is where you have hedge funds, venture capitalists, and all sorts of investment strategies.

I am a value investor, and you are correct that that means I would like to understand the intrinsic value of a company and its stock. Value investing is pretty specific, and it is definitely different from momentum investors or growth investors because they anticipate growth, and growth companies.

Most of my work involves comparing the intrinsic value of a stock to its market value, and comparing hype of the news to how this hype is reflected in the markets. This last part is interesting, because a lot of the time, the markets are influenced by what others are thinking, this is where the hype and bubbles come into play. So, if I recognize that a company is greatly hyped up by the media or by influential investors, but this same company has a disproportionately low intrinsic value, this is a sign to stay away from that company’s stock.

A good example of this is how standard, trusted companies can change over time. For example, Google and Facebook and Tesla and Starbucks were all trusted by investors, and so their stock prices went up consistently thanks to this positive feedback and image. But now, as we’ve seen, no one is considering Tesla or Facebook to be safe bets any more.

I never bought Starbucks stock, maybe regrettably, because I thought it was always overpriced, but this stock has kept on growing. This is one problem of value investing, as a company can be severely overvalued, but can continue to grow at the same time.

S.A: What made you think Starbucks was overvalued?

S.C: I just didn’t see the growth potential others were seeing, I didn’t think it was possible or profitable for there to be multiple Starbucks on each block. In other words, I never envisioned that a city like Portland could have over a hundred Starbucks stores. But never underestimate a company which sells addictive substances, like coffee or cigarettes.

So to get back to your question, to calculate intrinsic value, you take all the future cash flows of a company, which is approximated through earnings, and discount them back to today’s value.

The discounting is taking into account inevitable losses, such as what is lost is through inflation, fees, or other risks. So, if you find that future earnings minus these discounts is still greater than what the current market price reflects, your intrinsic value is greater than the market value. Obviously, this is a simplification of all the math and calculations, but the general idea is the same. Some financial experts have tried methods to calculate how risky a stock is or will be in the future, and they weigh this into their calculations as well.

Take Lyft for example, which makes no money, so how do we figure out it’s worth? We look at what might happen in the future: Maybe Lyft will raise prices to counter losses, but this would also drive away customers. Maybe all the talk about driverless cars will come true, which it probably won’t, and Lyft will save huge amounts of money by not having to pay drivers.

So it all goes back to speculating on how a company will operate in the future: When I didn’t buy Starbucks, I thought there was no way they could open so many locations in the future, so that was my speculation. Turns out I was wrong, because they did, but that’s only one example. That’s the same view I have on Lyft and Uber today.

S.A: You mention on your website that you employ a conservative investment approach? How would you define the difference between a liberal investment approach and a conservative one?

S.C: Liberal would not be the opposite of conservative in this case, conservative just means I think the companies I invest in will exist ~40 years from now. Sustainable companies, to me, that’s what I mean by conservative investing. That’s why I work to analyze the intrinsic value of the companies.

S.A: Not trying to unveil any trading secrets here, but I am just curious if you guys employ any quantitative models / algorithms in shaping your clients’ portfolios?

S.C: I’m not here to bash on quants or people involved in that, but those guys who use those types of models don’t care about the companies themselves, they’re only working in arbitrage.

There’s a clear disconnect between the models they use and looking at the speculative value of companies, future cash flows, markets. I don’t consider them to be creating any value, they’re just feeding off of inefficiencies, and the markets would do fine without them. If one day, all quantitative trading stopped suddenly, the market would still be liquid. It’s clearly a zero sum game, so for them to win, someone has to have lost.

This is not to discourage you though! In the back of your head, you should always remember the true intrinsic value of a company, which doesn’t get reflected through these algorithms.

The movie The Hummingbird Project comes to mind, as this shows the extent of the measures that are taken just to cut off milliseconds of time off trades so someone else can be overcharged.

S.A: So I’m guessing you don’t use quant algorithms to influence your investment decisions?

S.C: Again, my main focus is the intrinsic value. Now, interest rates are super low, they’ve been super low for a while, and the president wants them even lower. They’ve been too artificially low for the past 15 years. I’m mentioning this because the interest rates are so low right now that some of the traditional mathematical equations value investors use aren’t valid. There’s some formulas which require you to divide by federal interest rates, but once you start dividing by smaller and smaller fractions, the numbers can blow up and this is very misleading. So things are definitely changing and we need to adapt to that, but there’s a lot of traditional trading formulas that I will use. For instance, there’s the required rate of return formula, which is referred to as k. That’s then used for other formulas, like return on investment and discounting techniques.

In terms of other math models I use, we also use tools which give predicted return on investments, which works by analyzing patterns of returns on investments in the past. This might be more similar to what you’re talking about in using AI or math to analyze patterns, but these won’t completely inform my decision to invest or not to invest.

S.A: How much do you analyze market cycles when looking at the potential value of an investment? Do these cycles influence your decision to invest in a position, or is it more about the fundamentals (intrinsic value)?  

S.C: In my day, there were people called chartists which looked at a chart of stock prices. There’s definitely logic to looking at patterns, as these types of patterns happen for a reason, and if that reason is rooted in human behavior, it is bound to happen again and again.

One great example of this are pressure points when a stock reaches a previous high, because people will anticipate a fall again. When a stock is rising and reaches a new historical high, it becomes difficult for the stock to rise past this historical high because most people are afraid the stock will start to fall at this point, like it did when it reached this high in the past. The same goes with the snowball effect of bubbles, which definitely follow predictable patterns. 2008 is a fantastic and recent example of this.

Interestingly enough, I’m not seeing any bubbles today. Again, I decide whether or not we’re in a bubble by looking at the intrinsic value. All the farmers by my house are planting hazelnut trees, and there’s no way there’s such a sudden surge in demand for hazelnuts, so I’d say there’s a hazelnut bubble going right now.

Cycles were talked about all the time at economics school, but since then, we’ve realized (we think we’ve realized) that these cycles are meaningless because we can counter them with artificial interest rates. You don’t have to balance your budget, because you can just print money. Economics is changing for sure. If we’re heading into a cycle, then this will definitely influence my decision making, as I consider it part of the future value, and therefore part of the intrinsic value. I need to look at which companies will do well in the cycle. Again, cycles really build off themselves, which is why they definitely need to be acknowledged.

S.A: Do you think this artificial counter-approach to cycles will come to an end, or is this the new future of economics?

S.C: It will work, right up until it doesn’t. This happens in all historical examples, so why would it not happen this time. There’s a possibility that this will actually be the new future of economics, at which point we would all need to adjust our approaches.  

S.A: I think we’ve touched on all of my questions, and before we end, I just want to say that finance and investing, specifically quant finance, is definitely a huge interest of mine which I will keep working on in the future. I’d like to ask if there’s some final advice you would give me going forward as a future investor?

S.C: I got an engineering undergraduate, but I got an MBA after. I used 5% of what I learned from my MBA, and the rest of the 95% came from experience. Living life is better than reading about it.

When you get to be in your 20s, bright young people like you tend to get stuck to academia, and you can end up spending so much time doing work that is meaningless in terms of real experience and results. You should definitely pursue a higher education and develop your skills, but you really need to get out and work in the field.

There’s clearly a lot to unpack here, so I will try to make my summary concise. Here’s a list of key takeaways from my conversation with Scott Conyers:

  • Mr. Conyers talks about the concept of value investing, which is a form of money management much different from the arbitrage and trading algorithms I’ve been researching all year.
  • Value investing involves calculating the current and future intrinsic values of companies, which is a much more holistic way of analyzing a company’s value as opposed to analyzing and reproducing patterns in the markets.
  • Value investing does involve a lot of math, but this math is focused on discounting, return on investments, and speculating future earnings. Again, this comes back to calculating the intrinsic, or perceived, value of a company.
  • Mr. Conyers believes quant finance tends to ignore the intrinsic value of companies, though I surmise he was thinking of HFT rather than analytical hedge funds. According to him, quant investors simply use their technology and algorithms to find small inefficiencies which they can capitalize on — A clear zero-sum game.
  • Mr. Conyers’ final advice to me, or anyone who is a future investor, is to get out in the field of investing to gain actual experience, rather than confining my abilities to academia and theory.

Though all of these points are fascinating, I was very interested by Mr. Conyer’s explanation of how artificially low interest rates have literally changed certain fundamental approaches to investing. By suppressing natural market cycles through unnaturally low interest rates, the Fed has essentially changed the economy as a whole.

Realistically, this knowledge doesn’t help with my project itself, in regards to coding, Word2Vec, or backtesting. But I believe there’s a priceless value to understanding how so-called normal investing works, from the perspective of someone who has been doing it for nearly 30 years. As I will also discuss in my report on One Up on Wall Street, value investing truly offers a lot insights into how companies gain and lose value, which is what really drives the markets (without the companies, there would be no stock market). Algorithmic trading has proven its expansive capabilities to us, and I believe these capabilities are only growing in direct relation with time. As of now, there’s certainly a human touch to investing which algorithms haven’t been able to replicate. I do think we are getting closer though, especially with NLP and automatized sentiment analysis.

Maybe all of the traditional wisdom offered by value investors is now rendered useless by algorithmic trading, or the artificial suppression of market cycles by the Fed. Maybe the future of investing is already here, and there’s no point in looking back. But, as is the case with training neural networks, the more information the better.

Does A.I Have a Place in Venture Capital?

The week before Spring Break began, I had the privilege to speak with Angela Jackson of Portland Seed Fund, one of Oregon’s leading venture capital firms. Our conversation wasn’t initially supposed to be about data and the economy, but we ended up having a really interesting discussion about the current startup climate in Portland and the greater Pacific Northwest.

Portland Seed Fund is an early stage VC fund which most often invests in small, emerging companies. The way these venture capital firms thrive is by taking large stakes in companies before they start generating a lot of profit, so that they benefit if the company gets more funding, goes public, or gets acquired later on. As you might imagine, the most crucial part of being a venture capitalist is foresight, or the ability to predict what consumers will want in the near/distant future. Whereas hedge fund managers try to predict future market cycles to find price inefficiencies, venture capitalists try to predict the future market climate to find new business opportunities. For instance, if a venture capitalist thinks that in the next two years, there will be a surge in demand for grocery delivery, then they will likely invest in companies making cutting-edge advances in food delivery services. Or alternatively, if a VC thinks that a new, original idea will have a place in future markets (or even be able to take over established markets), then they will take an early position in the company.

So, there are a lot of similarities between venture capital and managing investment funds, but there are also a lot of crucial differences as well. A big one that comes to mind is the fact that hedge funds don’t really engage with the companies themselves, only the numbers these companies produce. Venture capitalists, in a traditional setting, will observe a pitch from the founder/CEO in person and inquire about their team, to get a sense of how the company functions as a whole (after all, lackluster teams produce lackluster results). After this basic screening, the VCs will delve into the numbers, though often at a more superficial level compared to the quantitative analysis done by financial engineering institutions like hedge funds (keep reading to find out why!).

At the end of the day, though, venture capital is about finding the big companies before they become big, much like how running a hedge fund is about finding the ‘big fish’ opportunities before other investors do. With that in mind, it’s worth thinking about which specific strategies venture capitalists employ to find successful companies early on, and contrasting these strategies to the modern quant finance approach we’re all so familiar with.

So which data points can provide insight into the future of a company?

There are a number of methods and the answers vary based on the organization, but the most common metrics which VCs will analyze are:

  • Scalability: This is a relatively simple concept, but it is much easier to grasp in theory than in practice. A company, algorithm, software, or any other product is scalable if it has the bandwidth and resources to meet increasingly large demands. As companies grow and become more popular, there’s going to be more demand for their products, which often translates to greater overhead and production costs. Tech companies are usually attractive to venture capitalists, especially in the last two decades, because they are incredibly scalable by nature. For instance, a car manufacturing company (I’m not saying Tesla, but we’re all thinking it) can be super popular and lucrative, but for each person who wants to buy one of their cars, a new car needs to be made (it sounds dumb, but bear with me). As more people want to buy your Model 3s cars, you need to build more cars to sell, and building more cars will cost you more money, time, and resources (Economics 101). On the other hand, every time a new every customer wants to open an account on Facebook, it’s the same application process and platform which the other ~2 billion Facebook users used, so no new product needs to be manufactured. Sure, there’s the cost and burden of maintaining servers which can handle growing traffic, but that pales in comparison to the price of opening and operating a vehicle manufacturing and assembly plant.
  • Current Market: Who does this product appeal to? Who will be using this service? This is what people mean when they say “this company is disrupting a 27 billion dollar industry” (i.e: this company is entering a market that can generate $27 billion in revenues). This one can be tricky, because there’s no clear ‘right’ answer. A fourth grader can tell you that a company which is scalable is better than one which is not, provided they know what scalable means. However, the current market to which a product appeals to is always subject to change. Think of a startup which wants to enter a saturated market with intense, established competition like e-commerce or search engines. You might think that this startup is doomed to fail because there’s too much competition and no one needs a new search engine because Google already exists. But what if this startup solves a problem which Google neglects, a problem so significant that users gradually start to move away from Google because it’s no longer the superior product. Don’t forget that MySpace practically owned all online social media before people even knew what a Facebook was! Facebook had no ads on their site while MySpace had a lot of ads which made it cumbersome to use, so people switched to Facebook and the rest is history. People often use market and revenue stream interchangeably, but these are definitely not the same thing!!! Users and customers don’t always translate to profit —Uber claims to have 75 million users worldwide, but they’re losing over a billion dollars per year (based on statistics from 2018).
  • Market Share: What percentage of your respective market do you control? This is a subtle way of comparing a company’s performance with its competitors, and companies with larger market shares tend to make more revenue. This is completely logical, as a company with 40% share of a certain market will have twice the customers than a competitor with 20% share in the market, and should therefore have twice the revenue. But it’s not only revenue, as Harvard Business Revue claims that a company’s market share is almost directly proportional to the return on investment in this company. This is super interesting to me, because HBR claims that high market share is often indicative of quality management and better positioning on the experience curve.
  • Revenue streams: Where are you going to get money from? There’s a myriad of answers to this question, and startups employ all types of different revenue/business models. VCs tend to prefer companies with diverse streams of revenue and companies with scalable revenue streams that can grow at a consistent rate. That’s all there really is to it.
  • Team: This is where we move into the more social and ambiguous metrics for measuring the future success of a company. Countless studies have shown that companies with happier employees perform , and there’s no question that teams with better chemistry are more likely to succeed (the only question is do you really need a study to believe this?). It is very difficult to quantify how well a team works together, and so VCs have to rely on intuition and social intelligence to make decisions about a company’s team. However, with recent advances in AI, we are now able to teach machines more complex undefined tasks such as sentiment analysis and speech recognition. I definitely believe the powers of NLP can be applied for better analyzing startup teams, but more on that later.
  • Enthusiasm: According to Steve Jurvetson, widely recognized as one of the best VCs in the world due to his early stakes in Tesla, Baidu, and SpaceX, a founder’s (or whoever is pitching) disposition during their pitch can make or break the deal. It’s one of those things that sounds too simple to be true, but if those in charge aren’t enthusiastic about their product, then why should customers be?

There are several different articles which explore the importance of these metrics in greater detail, such as this one, this one, and especially this one.

What I find most interesting in all of this is that hedge funds and trading firms have already graduated to quantitative/automatized strategies like trading algorithms, but venture capitalists (the majority of them) have strayed from using these automated methods. The question which interests me is whether or not venture capitalists will start implementing automated algorithms to help guide their decisions, like hedge funds have been doing the past decade plus. Is there a place for A.I in venture capital? If so, what data can be used to learn and gain insights?

My conversation with Angela Jackson spurred this interest, as we talked a lot about what data PSF uses in creating their porfolios, where they get this data, and what specifically they do with this data. AI didn’t come up explicitly, but nowadays that question should be implied everywhere there is data. According to Ms. Jackson, the biggest problem is that venture capitalists have minimal data compared to the expansive databases of bar data, fundamentals, and earnings reports that hedge funds can access.

According to Angela Jackson, there’s been a push in recent years to create aggregates of data between venture capital firms throughout the Pacific Northwest. The main incentive behind this is, among other things, to create a more comprehensive dataset for understanding which startups fail and which succeed. The hope is that as more venture capital firms start working together on such initiatives, the more data there will be to analyze and potentially train neural networks on. However, this doesn’t solve the problem of new startups having little to no data about themselves by the time they pitch to early round investors. This is a challenging problem to address, and I haven’t been able to come up with a reasonable solution. If any of my readers do, I’d love to hear what you think!

So, maybe the answer (right now) isn’t in company data like projected/current market share or scalability. As I mentioned earlier in this article, one great entry point for AI in venture capital is sentiment analysis — specifically, using sentiment analysis to gauge confidence of , or to detect how well teams work together. Sentiment analysis is already used to scan earnings reports, Twitter feeds, news headlines, and all sorts of bodies of text (doesn’t have to be text, as sentiment analysis works for audio clips too) to figure out whether or not there is positive sentiment evoked. The primary technology behind sentiment analysis is natural language processing, through embeddings or other word processing algorithms. These NLP strategies have proven effective in various settings, so there’s no reason why sentiment analysis should be confined to simple stock analysis. In fact, I believe this is one place where the true potential of NLP will come to light.

To summarize, I find it fascinating why venture capitalists have refrained from using automated methods such as AI or NLP to guide their decision making, while hedge funds have been relying so heavily on trading algorithms to find investment opportunities for the past two decades. Despite the potentially limited data sets available for analyzing small-scale startups, I definitely think it is worth exploring the idea of aggregating VC data to eventually train neural networks on. In the even nearer future though, I think we should start exploring the utility of NLP and sentiment analysis to analyze how well startup teams work together, as a metric for anticipating the future success of an emerging company.

Whether or not AI becomes the future of venture capital, I do think it has a lot of potential, and it seems like we are off to a good start.

How are my Predictions for 2019 Playing Out?

2019 has been treating the markets well so far. The S&P500 has had one of its best starts in history (up 11.1% through January and February) according to SeekingAlpha, and the Dow Jones has performed similarly well with an 11.4% YTD gain. This growth hasn’t been confined to the U.S, as nearly all major indexes across the world have been positive for the year. Chinese ETFs have posted exceptionally large returns so far, with the Shanghai Composite and Dow Jones China both up more than 20% on the year. Alongside these numbers, we’ve seen GDP growth estimates reflect more positive sentiment than the pessimistic outlooks from those at the end of 2018, with the Atlanta Fed reporting that real GDP growth in the first quarter of 2019 is currently around +0.4% and with most experts projecting +2.1% to +2.4% for the year.

Performance of major international ETFs.
If you look in the YTD % Change column, you’ll only find two red ETFs, which captures market movement in 2019 pretty well.

Who could’ve predicted that all of this growth would happen just two months after volatile and unpredictable shifts in the markets which nearly caused a flash crash, and raising of Federal Funds Rates in December? Also, what’s next? Is this steady growth a sign that we have successfully avoided the recession which experts predicted would stain 2019?

To address the first question, I offered a calculated guess in my “End of 2018” post. Here, I predicted that the raising of Federal Funds Rates at the end of 2018 (along with the Fed’s report that they would continue raising rates through 2019) coupled with slowing GDP growth and instability in the markets (as evidenced by the market crash/rebound in late December) foreshadows greater market recession in April/May of 2019. In other words, I predicted that the market would see notably large returns for the first few months of 2019, then the train would hit a grinding halt somewhere around April or May (right now, it’s looking more like May). This will not be a full-blown crash, but rather a large dip in prices followed by a stagnant period of price readjustments and returns from inflated highs.

Here’s some what I wrote specifically:

“Whatever the true root of the cause may be, this rebound also seems to have effectively calmed stocks down, as the market has behaved pretty normally for the first trading days of 2019. What’s more, essentially the entire market was up yesterday, with Dow Jones up 3.3%.
It’s still very early to jump to conclusions (or, if you held AAPL, drop 9% to conclusions), but it may feel as if we just dodged a huge bullet, as a decline in the final week of 2018 would certainly have negatively impacted projections for 2019. But, considering what this massive market rebound has taught us, my overarching hypothesis is that this stock decline was not avoided, nor was it postponed. It still exists, and should be arriving on time somewhere around April or May.”

Later on, I mention how the times directly preceding market crashes are always times of extraordinary sudden growth. This gets more and more true as you get closer to the event horizon (the crash), and the extreme volatility days/minutes/seconds (or microseconds for all you high frequency traders) right before the crash are the precise moments where most profit can be made.

As for the other two questions, no one knows for certain what will happen next (except, hopefully, the embedding-based trading algorithm I’m writing…). There’s no way to tell if the market is actually going to reach a plateau in April/May other than waiting until April/May, but we can look at some of the indicators manually:

  • GDP Growth: This one is essential for projecting macro-scale movements in the markets. It kind of makes sense too; GDP measures a country’s economic influence, so if this influence is growing, that should mean the stock market will grow as well.  This is clearly not always true, as U.S GDP grew 2.9% last year while that same year was the worst year for stocks since 2008. This year, GDP growth estimates are much smaller than the numbers we saw in 2018, with growth estimates hanging around +2.5% for 2019 according to Kiplinger (the Fed’s estimates got cut in late 2018 as well). This may seem like a small dip (and it is), so this slowing GDP growth is nothing catastrophic on its own. The trouble emerges if the market is very unstable and volatile, or in other words, more receptive to these small signs of stagnation. The period of instability at the end of 2018 might be an indicator for how sporadic the market currently is, but this is countered by the fact that growth has been rather steady and consistent in the past two months.
As you can see, GDP has been steadily dropping these past three quarters leading into 2019. At the same time though, quarterly growth is consistently stronger than in past years like 2016/2017.
  • Market Performance: According to Seeking Alpha, whenever major ETFs like the S&P and Dow Jones are positive through the first two months of the year, the rest of the year will be positive as well. In fact, out of the 30 years when both January and February returned positive, only 1 year finished in the red (29 out of 30 times means this is true 97% of the time, and there aren’t many indicators on Wall Street which are right 97% of the time). So, going off of this one data point, one could confidently claim that the market will continue growing in 2019.
  • Gold Prices: Traditionally, the price of gold is interpreted as a measure of confidence in modern financial systems. Gold has held its value throughout history, so people tend to think of gold as the safest form of currency which can always be exchanged for goods, unlike paper (fiat) currencies which can be devalued, replaced, or made irrelevant through inflation. Also, no one can track gold transactions like they can with credit cards or SWIFT, which adds to its allure. Whenever people start losing faith in modern financial systems like banks or Wall Street, or when people anticipate an economic crash, gold prices go up since more people want to be prepared. As you can see in the chart below, gold prices rose pretty starkly from September 2018 to early January 2019, and this was followed by a huge, sudden jump in February. However, gold prices have returned to their end-of-2018 levels, and it looks like they’re continuing to drop significantly.
See the full chart here.
  • Investor Sentiment: This one is a bit harder to measure because it’s much more qualitative than quantitative, but there are several indexes which use tools like NLP A.I to gauge if investors are optimistic or pessimistic. One such metric is CNN’s Fear-and-Greed index, which measures how greedy or fearful investors are on a given day. I don’t recommend taking out a second mortgage on your house to invest based off of this index because it never tells the full story, but it is a very interesting concept to consider, as more greediness should always correlate with increases in market prices. Over the past few months, things have been rather greedy, but we’re getting closer to fearful territory. The 0-50 range is fearful and the 50-100 range is greedy, and we’ve been hovering around the mid-60s for the past month. This is nice, but it’s too close to fearful to draw any confident or meaningful conclusions.

So, with all of this in mind, the outlook is looking much more positive overall than it was in December 2018. Don’t forget that this early growth in the year was heavily influenced by the Federal Reserve’s decision to be more lenient and incremental with increasing Federal Fund Rates. The Fed’s announcement came early this year, clearly as an early attempt to abate qualms about the markets in response to what happened at the end of 2018. But the Fed can only pull this trick so many times, and if the recession bias (this is a really cool article which explains how when the majority starts to believe a recession is coming, it will most likely happen because people and businesses start hedging more rather than spending freely) hits again, there’s little room elsewhere to run (aside from maybe lowering rates). With that said, it should be easier to paint a picture of what the markets might look like going into summer 2019.

Or, rather than writing this entire post and spending time trying figure out what might happen based off economic indicators, we could use an embedding-based market analysis tool like the one I’m currently building. The usefulness of an embedding-based trading algorithm is that we can look up which periods in stock market history are contextually similar to the period we are observing right now. This is done through the power of dynamic NLP, which does not look at fundamental market data like cash flow or PE ratios, but rather looks at what contexts certain events appear in. So, if two historical series of price changes in the markets are similar, we can assume that their outcomes will be similar as well. With this knowledge, we can make predictions for what will likely happen in the market during a certain time window.

This may sound far-fetched, but think about how humans naturally go about making predictions: We observe what’s going on right now, we try to relate this to previous knowledge, and we use this previous knowledge to think of what will happen in the future. Think of a cup slowly sliding toward the edge of a table. You know this cup will fall, because you’ve seen what happens when an object above the ground loses support before through your personal experiences and the memories stemming from these experiences. These memories are really just embeddings themselves.

So think of this post as my independent projection for what will happen with the markets in the next few months which is not influenced by my algorithm, because if I was using my algorithm, then I wouldn’t have to write this. If my overarching projection continues to prove correct, then this will be a testament to my trading aptitude. But if I am completely wrong and the market continues to grow 10% every two months for the year, then at least I will have my algorithm to fall back on.

Finally, on an unrelated note, I’ve been reading Q is for Quantum by Terry Rudolph, which is supposed to be an accessible intro to the concept of quantum computing. I’m reading this because as I mentioned in my last post, there’s a lot of discussion around the topic of quantum computers being the logical next step in computing speed and power. Since most of trading nowadays focuses on beating competitors through efficiency, you could imagine that Wall Street is very interested in getting their hands on quantum computers. I’m reading this book to get on this wave before it’s too late, and I will share some findings if any of them are pertinent to my work on trading algorithms.

VCs, VPs, and P.h.Ds : Tweaking my Algorithm with Inputs from Various Sources

This past week, I have been trying to answer some of the most pressing technical questions about my trading algorithm. At the end of the day, all of these questions can only really be answered through research: Changing parameters, adjusting weights, removing factors, etc… and seeing whether or not this improves results. I still think it is foolish not to use the resources around me and to learn from others’ knowledge/experiences, as anyone can work on their own and bang their head against a wall all day.
The conversations themselves didn’t have enough new, raw information to stand alone like some of the previous interviews I’ve done, so I’ve decided to compile all four together.
First, I met with Andrew Merrill, my current computer science teacher:
S.A: My code works by receiving real-time market data and creating an embedding for this data. Then, I compare this real-time embedding to the data set of historical embeddings. This comparison yields a top-k list, wherein my code returns a list of historical embeddings which are closest to the real-time embedding (embeddings are compared with cosine distances, which calculate percent similarity). Once we find a match (where the top result in a topk list is above 90% similarity), we need to calculate certain parameters to decide whether or not we should proceed. 
A.M: Yes, that makes sense. So you’re actively cycling through real-time data and waiting to find a pair of embeddings which match. And you’re just ignoring the embeddings which don’t match?
S.A: Yes. 
A.M: Okay. Once you find two embeddings which match, you must examine the real-time embedding’s entire topK list. For instance, what if our real-time embedding matches a historical embedding whose expected outcome is +1.2%. This is good, but what if the second item in the topK list has an expected outcome of -2.6%? And the third item has an EO of -0.01%? This is rather meaningless information because the topK list is too volatile, and we don’t have definitive evidence that our real-time stock prices will go up or down.
S.A: So we need to calculate the volatility of the topK list before deciding to act on a certain investment? I’d imagine we have to come up with some ‘volatility score’ for each topK list, and use this volatility score into the Confidence Factor. 
A.M: Yes, you could definitely calculate volatility, or you could use weighted averages to gauge the significance of a topK list. You could use the percent match as the weight, and multiply that by the expected outcome. This way, you’d factor in both expected outcomes and match percentages into a single number which encompasses the significance of the topK list. So, my guess is that you would calculate this weighted average for each topK list, and if if is above a certain threshold — you would have to set this threshold — you have verified that this topK list is valid.
(Below is a visual representation of what each element in a topK list contains. A topK list is returned when querying an embedding, and contains an ordered list of closest matching embeddings.)

TopK List

Andrew’s comment on how we need to analyze the volatility of our topK lists was very insightful, as this never came to my mind. His point is completely valid and logical: If the embedding which matches closest to our query has an Expected Outcome of +2.3%, but all of the other embeddings in our query’s topK list have Expected Outcomes like -1.2% and -0.4%, then this investment is more questionable. If a query’s topK list consistently anticipates outcomes > +1.00%, then this is a much more valid investment opportunity, according to our Word2Vec model.
This is some simple code of a weighted average function which analyzes a single topK list, and returns a single weightedAverage:
This code (or, some variant of this code) will be used whenever our query matches some embedding, and we need to check if this query’s entire topK list is viable. By viable, I mean not too volatile, not too marginal, and low deviation.
In addition to speaking with Andrew Merrill, I got a chance to speak (briefly) with Vladimir Prelovac, former Vice President of GoDaddy and founder of ManageWP. Mr. Prelovac has expressed interest in the applications of NLP embeddings, and so I wanted to see his thoughts on embeddings playing a role in quant finance. The answers weren’t very long, but they were meaningful and to-the-point, and I managed to get useful insight from them. Here is the transcript:
S.A: From what I’ve seen, A.I in finance is reserved to high frequency trading which analyzes patterns on the micro-scale, normally a couple seconds or less. Why do you think A.I is not being used for more long-term patterns (patterns spanning across 1 day to a couple weeks)? Is this strategy not profitable?
V.P: Think of market as of the weather. It is easier to predict what will happen within next one second than one year.
S.A: When training my NLP models, is it better to create specific embeddings for specific stocks (i.e: AAPL embeddings and FB embeddings), or is it better to train my model on all stock data available (this would give us more data, but the embeddings would be more generalized)?
V.P: When doing word embeddings you want to use all available corpora as the results will get better. Id assume the same principle will apply.
S.A: Once I find a set of price shifts in the market which are contextually similar (the real-time stock market data matches some past price shift in the market), what else do you think I should consider when deciding if it’s worth investing in the stock? In other words, what are the most important factors in deciding whether or not a stock will go up within the next few days?
V.P: That is called factor modelling and there isn’t a single good answer. Ideally the A.I should discover this all by itself, i.e through reinforcement learning.

One interesting takeaway here is that Mr. Prelovac and I agree on question two. The idea that unsupervised NLP machine learning models improve when given more data speaks to the potential of NLP in quant finance, since there is a massive corpus of stock market data which grows every day.
I really liked Mr. Prelovac’s answer to the third question as well, since he brings up the idea of ‘overarching A.I’ which learns how to adjust the parameters of a trading algorithm on its own, rather than relying on an external source to change parameters and observe results. This shouldn’t be too hard to code, as the algorithm already knows whether or not it is profiting, and this knowledge is enough to infer whether or not parameters need to be adjusted for better performance.
Think of such a system (an overarching A.I system which can adjust trading algorithms to improve performance) working in a way comparable to Word2Vec, in that weights are adjusted based on three inputs: push, pull, or stay. If two words appear closely in the same context, their embedding vectors are pulled closer to one another. If two words appear in the same context but far away from one another, their embeddings are slightly pushed away from one another. If the overarching A.I notices the trading algorithm performing well, it will try to identify the changes which led to this good performance and amplify these positive changes. This amplification is done by changing certain parameters, like increasing training window on the Word2Vec model or lowering the match threshold when querying data. The alternative takes place when the overarching A.I notices that the trading algorithm is not performing well.
Finally, I spoke with Kenny Nguyen, my faithful math teacher and statistics P.h.D. I also asked him how I might analyze topK lists to decide whether or not a pair of matching embeddings is significant enough to invest in. Just for clarification, what I mean by this is that when our real-time query data matches a historical embedding, we can use what happened after this historical embedding to predict what will happen after our real-time data. The question is how to decide if we are confident enough in a pair of matching embeddings to do this.
According to Kenny, I need to use Bayesian Inference to consider all of the different factors involved when we find two similar embeddings (match percentage, expected outcome, expected outcome volatility, potential for profit, etc…). Bayesian Inference, to be concise, is a branch of statistics which focuses on using different input streams of data to calculate (and adjust) our expectations for what will happen following a certain event. As you might imagine, Bayesian Inference is hugely popular in quantitative finance, because it revolves around predicting the future using past data. In its simplest form, Bayesian Inference, or Bayes’s Rule, compares the probabilities of certain things occuring. In the end, we get a formula which looks something like this:

Here, P(A|B) is the probability of A happening given that B has happened, and P(A) is simply the probability of A happening. 

All in all, I received quite a bit of helpful information from these conversations. Looking ahead, I’ll definitely want to write a full-length post about quantum computing and Bayesian Statistics, as I believe these are two hugely important topics in the current world of quantitative economics.

Book Review: Christopher Steiner’s “Automate This”

As part of my independent research on using ML-generated embeddings to map patterns in market behavior, I have been reading various books which can provide insight into the field of quantitative finance. These books cover topics ranging from statistical modelling, mathematics, programming, or even general macroeconomics. So far, I’ve read Boomerang by Michael Lewis and The Black Swan by Nassim Taleb.
Last week, I finished reading Christopher Steiner’s bestseller Automate This. I originally planned on reading Marcos Lopez de Prado’s Advances in Financial Machine Learning, but with the influx of work during finals week, I wasn’t in an ideal place to study a dense mathematics textbook. So, I’ve postponed this as my next endeavor, and I will write a corresponding report later on.
In his book, Steiner covers various modern implementations of machine learning algorithms in different fields, spanning from traditional quantitative finance to music recognition to matching personality types for dating. Steiner seems to firmly believe in a future dominated by algorithms and machine learning, as he concludes with this powerful outro: “There’s going to be a lot of work in the future for those who can write code. If you can also conceive and compose intricate algorithms, all the better — you may just be able to take over the world. That is, if a bot doesn’t do it first” (220).

Wall Street’s Obsession With Speed

The book begins with the story of Thomas Peterffy, the Hungarian-born billionaire who famously took over Wall Street in the 1980s. This excerpt in and of itself is a fascinating tale, as Steiner claims Peterffy was the first person to ‘hack’ financial markets. It was Peterffy’s work which started the race to automatize trading, and is a primary reason why nowadays, more than 60% of all trading is done through digital algorithms. Peterrfy began by writing simple computer code which received real-time market data as an input, and through a series of calculations, decided whether or not an option was over/under valuated. From this information, Peterffy could purchase options before other traders (who were all working manually at the time) discovered this inefficiency. Those working without computers were always too late, and could never beat Peterffy to profit. Peterffy continued refining his algorithm, to the point where he integrated the newly-discovered Black-Scholes model for pricing options into his algorithm.
This was all happening 30+ years ago, but the basic principles behind Peterffy’s strategy remain relevant. Traders realized that speed was the new gateway to profit, and no one can work faster than computers. This incited a ‘digital revolution’ on Wall Street, where all the best trading firms began hiring engineers, mathematicians, and computer scientists in the hopes that they could create new, faster algorithms which outpaced the competition. Steiner elaborated on this concept by introducing the story of Daniel Spivey, the man who decided to build a brand-new dark fiber cable line to connect the Chicago Mercantile Exchange with Wall Street. Spivey correctly believed that having faster exchange of information between these two exchanges would allow for more time to spot inefficiencies in prices, and more time to profit from arbitrage. This reminds me of Tom Connerly, who I interviewed earlier this year, and his spiel on how HFT companies are now investing in shortwave trading lines in hopes that these can transfer information faster than fiber cables. I’ve seen this in the works of Alexandre Laumonier as well, on his blog Sniperinmahwah.
Image result for shortwave radio station
Steiner then goes on to explain the origins of algorithms, which he traces back to Carl Friedrich Gauss and Gottfried Leibniz. I found it interesting that “The mathematician [Leibniz] stipulated that cognitive thought and logic could be reduced to a series of binary expressions. The more complicated the thought, the more so-called simple concepts are necessary to describe it” (58). This is nothing new, as binary code in computer hardware has existed for more than half a century, but I think the concept of taking complicated tasks and partitioning them into hundreds of simple tasks is very powerful. This goes far beyond programming and investing, as this rule can be applied for tackling issues in one’s life or visualizing daunting problems, one step at a time.
Steiner also interweaves a handful of additional narratives about algorithms and their applications, but it would be more interesting to experience these on your own rather than have me repackage them poorly. One which stood out to me was the story of David Cope, who wrote a series of algorithms which could compose original classical music indistinguishable to the works of Bach and Beethoven.

The real difference between East and West Coast

Toward the end of his book, Steiner details the shifting dynamic between East and West coast United States. As is still the case, investors on Wall Street develop their quantitative strategies by recruiting the best engineering minds directly out of college, mainly students with mathematics, physics, and computer science degrees. What I didn’t know is that because of the vast surplus of wealth in financial industries, especially from 1995-2008, Wall Street firms would happily offer up to $200K starting salaries just to claim the most promising college students. This meant that from 2000 to 2008, firms in other industries, such as biotech, medical, and technological research institutions, had a shortage of new engineering employees. After the crash of 2008, when Wall Street lost the prestige and esteem it had reserved for the past decade and half, these quants went directly to Silicon Valley to work at startups. The rise of tech startups like Facebook, Amazon, LinkedIn is correlated to this influx of engineers moving from New York City and Connecticut to San Francisco and Palo Alto.
This was fascinating to me, as this is yet another pattern embedded in human behavior which can be used to make predictions. Right now, for instance, Wall Street is back on the rise, with record returns and a continued push for improved trading algorithms. At the same time, tech firms like Facebook, Google, and Tesla have all been receiving negative coverage in the media for a multitude of reasons. Specifically, I think of Facebook’s recent scandals in breaching user privacy, where some users have theorized that Facebook is secretly tapping into phone microphones to pick up on keywords which can be used to generate more specific ads. This is clearly a time when Silicon Valley’s most talented engineers might decide to make the move to the East coast.
With this knowledge, soon after the the next financial crash, it would be advisable to start investing in companies tech and biomedical companies, as they pick up all the employees who will inevitably ditch Wall Street and Wacker Drive.

Anecdotal Evidence for NLP in Quant Finance

Of all the anecdotes and stories in Steiner’s book which stood out to me, there’s one that I will surely remember for many years to come.
On page 179, Steiner references the work of Robert Mercer and Peter Brown, two computer scientists who worked as researchers at IBM in the 1990s. In an effort to create software which could accurately translate text from English to French without being hampered by the many obscure rules and counter-intuitive idioms in each language, “The men created machine-learning algorithms to look for patterns in twin-texts. Where others had tried to solve the problem with elegant code that attempted to reproduce the grammatical structures of different languages, Brown and Mercer employed ‘dumb’ software and brute force.

“Brown and Mercer then built a set of algorithms that tried to anticipate which words would come next based on what preceded it.”

Brown and Mercer’s breakthrough didn’t go unnoticed on Wall Street. They left IBM in 1993 for Renaissance Technologies, the hedge fund. Their work developing language algorithms could also be used to predict short-term trends in the financial markets, and versions of their algorithms became the core of Renaissance’s best funds. During a run powered by Brown and Mercer’s work, Renaissance went from $200 million in assets in 1993 to $4 billion in assets in 2001.
Renaissance Technologies is widely heralded as the best hedge fund in the world, in terms of performance, and I’ve been fascinated by this company for a while. Shrouded in secrecy (most hedge funds are, but this one is really secretive, as you can infer from their website), the only thing that’s well-known about their investing strategy is their adherence to mathematical and statistical models. This is the first time I’ve read that RenTec saw the potential of NLP in finance over 25 years ago.
Now, I’m not saying that just because I’m working on NLP embeddings in finance I will create the next Medallion Fund of the world. I’m referencing this because it’s nice to see affirmation for the work I am doing. It’s especially nice when this affirmation comes from the most successful hedge fund in the country (you could argue that colleges are more successful as hedge funds than anyone on Wall Street will ever be, but save this argument for another day). I truly believe the power of context-learning A.I, whether it be similar to NLP models or not, has untapped potential in finding patterns in all of the world’s many markets.

The True Future for Trading Algorithms?

This idea of defeating competition through pure speed remains true today. Yet, it appears as if we are reaching a ceiling. As Steiner himself affirms, Wall Street has been saturated with the best engineering minds on the planet who have cycled through thousands of mathematical and statistical strategies for the past 30 (going on 40) straight years. This means that today, there is little room for profiting off trading algorithms which rely solely on rigorous math-based models. This means two things:

  1. We need new, creative approaches which find patterns within the markets that cannot be reached through traditional mathematical/statistical modeling. This is part of why I am so curious in using NLP embeddings for mapping patterns in the price movements of stocks/ETFs. I believe that, much like how Word2Vec has surprised us in unearthing new patterns within familiar text, these embeddings can discover new similarities between sequences of price-shifts in the markets.
  2. The next step in speed and efficiency needs to be a leap, not a step. This is why I, as well as many others, believe that quantum computing will be at the core of trading algorithms within the next 10-20 years. This is a relatively new revelation for me, and I definitely won’t be using quantum computers to run my embedding-based trading algorithm, but I will definitely continue researching this topic.


A.I and the Stock Market: The Training Dilemma

As I create the embeddings which I will use in my upcoming trading algorithm, I am faced with yet another question. When training a recurrent neural network on stock market data, is it best to train your network on specific stocks/ETFs (embeddings tailored to a specific commodity, such as AAPL_embeddings for example) or to train your network on all data available?
If I train my network on data from all stocks and ETFs, this would give us more information to learn from, and would produce a more experienced neural network and more meaningful embeddings. For me, this would allow for more breadth in identifying patterns in the markets, since there are more historical embeddings in my data set. However, this could confuse things, as maybe there are patterns which are specific to particular stocks or ETFs, and these patterns would be lost if we train on thousands of other data points. If I train my neural net solely on historical AAPL stock prices, my trading algorithm would only be able to analyze current AAPL stock prices, and try to match these current prices to patterns in AAPL stock prices. This should mean that when a match is found, this match is more likely to be correct (because our embeddings are so specific to AAPL prices).
My opinion is that I should train my network on all of my data, since the market patterns I’m looking for are universal (e.g: debt-driven market cycles and short-squeezes are not isolated to single stocks, so these patterns would appear through the behavior of all stocks and ETFs). Also, the fact I’m using % price shifts as data points should make training even more universally applicable.
Essentially, there’s a trade-off between specificity and quantity of data. This question isn’t limited to stock market AI, as it is applicable to all forms of artificial intelligence and neural networks. It’s actually quite an interesting discussion. Some, such as Google’s Peter Norvig, argue that the algorithms of today are no better than the algorithms we had in the past, it’s just that we have more data and better technology to train these algorithms. Gottfried Leibniz detailed binary systems and logical deductions some 300+ years ago, it’s just that he didn’t have the right technology to implement his systems on. Nowadays, all modern computers use hardware based off of his binary calculating system.
In cases of supervised machine learning, you definitely want more specific data over more data, as supervised learning algorithms train on labelled examples which have to be carefully (and accurately) sorted. Consider an algorithm which aims to recognize the letter E. You would need to train this algorithm on thousands of pictures of the letter E, and thousands of pictures of letters that aren’t E. In each training example, you need to tell the computer which letter is/isn’t an E, so it knows what it is learning. Simply feeding millions more letters into your training set without carefully labeling them will only damage the accuracy of your system. Here’s one example of this, where Stanford researchers found that more data isn’t better for automatically classifying chest X-rays.
Thankfully, NLP A.I is almost always done in forms of unsupervised learning. Think of Word2Vec, which learns word relationships by reading through a corpus of text. It doesn’t matter which corpus of text you use, and you don’t need to label each word since Word2Vec learns through context. This means that the more data (text) you train your Word2Vec model on, the more accurate your algorithm will be for predicting which words appear in which contexts. And that’s not just me speculating:

This is taken from Scaling to Very Very Large Corpora for Natural Language Disambiguationa 2001 paper published by Microsoft NLP researchers. In their paper, they compare the performances of various natural language processing algorithms. In the graph above, we see how the accuracy of four (drastically) different algorithms all increase (almost linearly!) as more words are added to the training set. Word2Vec shares this property, as is acknowledged in various papers, such as Improving the Accuracy of Pre-Trained Word Embeddings for Sentiment Analysis : “The accuracy of the Word2vec and Glove depends on text corpus size. Meaning, the accuracy increases
with the growth of text corpus.”
Taking all of this into consideration, I still think it is best to train my Word2Vec embeddings on as much stock market data as possible, rather than creating specific embeddings which focus on one stock/ETF at a time. One thing I will mention is that I am shocked to see how little literature there is on the topic of training RNNs on stock market data, with specific reference to this question of ‘more data vs. better data’. Once again, I think this comes down to two possibilities: Either people aren’t training NLP machine learning models on long-term stock market data, or they just don’t want to share their findings. I still think it’s the latter.

New Applications for BERT, in Solving Sequencing & Topk List Problems

This year, I have been researching if context-learning A.I (Word2Vec in my case) can be used to match patterns in the stock market. The premise is this: Since Word2Vec can understand which words are similar by learning in which contexts they appear, it makes sense that Word2Vec could understand which events in the stock market are similar through the same learning process. At this point, I’ve defined a ‘stock market event’ as a % shift in the closing price of a stock over a day (From close to close). This is subject to change as I experiment with results, but I chose this because % shifts capture the magnitude of the price movement and also the direction of the price movement, which are both crucial in defining an event in the stock market. Another key advantage of this system is that % shifts will never have the problem of homonyms: Words, such as ‘store’, have multiple meanings (“I went to the store”, “I need to store this item”, “I have something in store for you”), and Word2Vec doesn’t know this. Instead, Word2Vec and similar embedding systems load all data into single embeddings and assume that the word ‘store’ possesses the same meaning the same in every context. With % price shifts, we do not have this problem, as there is no ambiguity as to what +2.4% means.
Image result for stock up 10%
Interestingly enough, Google AI’s new BERT system, which I referenced a couple posts ago and which I am continuing to learn about, handles the problem of homonyms by creating ‘layered’ embeddings. Homonyms are dealt with because BERT uses transformers rather than recurrent neural networks (like the RNNs used in Word2Vec). Transformers start by creating an embedding for each unique word in a sentence, like Word2Vec does. But, the same words can appear in different sentences (with different contexts), so to address this problem, the transformer then creates embeddings for each word pair in the sentence, taking into account how close these words are to one another in the sentence. For example, in the sentence “The mighty USS Nimitz blazed through the Pacific, leaving nothing but lead and destruction in its wake.”, one ‘word-pair’ embedding is: (mighty, Pacific). These words appear in the same sentence, so they are related, but they are not close to one another (meaning a low dependency factor), so the influence of this ‘word-pair’ embedding is rather weak. These ‘word-pair’ embeddings are then factored into the primary embedding for each word, so the embedding for (mighty) is adjusted based on the embedding for (mighty, Pacific). This offers a more in-depth learning approach for each word, which could be more insightful than the sequential Word2Vec model that only learns from the small windows surrounding words, rather than the entire sentences they appear in.
There’s still a lot to unpack within BERT and NLP Transformers, which is to be expected considering how new this model is, but there’s already people claiming BERT is paving ground for the future of NLP because of how versatile it is, and how it can extract more information from smaller datasets thanks to masking and transformers. Here’s a really great article which covers some of BERT’s technical components in more detail, and if you have any similar articles which might be helpful, please send them my way.

The Sequencing Problem (and Solutions)

So, back to my main point, which is that once I train my Word2Vec model on the stock price data, I will then have a data set of embeddings. This data set is static for training purposes, as we don’t have to keep updating embeddings for the historical stock data we have since these numbers aren’t changing. Another plus is that with every passing day, we acquire more and more data which can be factored into our embeddings, meaning a more experienced neural network.
Once I have this data set at my disposal, I can begin using real-time data to search for matches within my dataset. First off, there’s the problem of sequencing, which is: Are we trying to find matches over 3 day periods? Over 1 day periods? Over 10 day period? If we’re looking for similar % shifts in a stock’s price, it’s likely not significant enough if we find two single-day shifts which are contextually similar to one another, because this is too small of a time window to extrapolate a pattern from. At the same time, if we are only looking for strings of 15 consecutive embeddings which match one another (this would mean 15 days in this case, because each embedding represent a % shift over one day), a match would be much more significant, but we will also likely never find two fifteen-day period which match one another. That time period is just too large.
Ideally, we would want to loop through all the plausible possibilities so that we don’t miss any significant patterns. I addressed this problem of sequencing in one of my earlier posts, and there are now three clear solutions:

  1. Follow the concept of Fingerprints used by Shazam: Once you find two embeddings that match (call them Q and D), check to see if their neighbors match as well. That is, compare Q+1 to D+1, Q+2 to D+2, Q+3 to D+3, …, Q+n to D+n, where ‘n’ is the max window size you’re looking at (Note: We would also want to compare Q-1 to D-1, Q-2 to D-2, …). This way, we can gauge how significant of a match there is between the real-time data we are observing and the historical stock price embeddings, as a match spanning over 4 days would be more significant and unique than a match spanning over 2 days. The magnitude of the match will then factor into our Confidence Score, which our algorithm uses to decide whether or not we have found an alpha (opportunity for profit). I am working to implement this system into my algorithm because it is the most flexible, as you can easily change your parameters (n) and you aren’t making any changes to the embeddings as you go along. Also, all of my stock price shifts are linked to one another by date, so it is easy to access previous_embedding and next_embedding.
  2. Create different classes of embeddings: Create different classes of embeddings, which represent different time periods. So, you will have 1-day embeddings, 2-day embeddings, 3-day embeddings, 4-day embeddings, and so on. You can still compare these embeddings to look for matches, but you will have much more data to search through. With this approach, we also don’t have to worry about sequencing, since all potential sequences are covered. We can accomplish this using Universal Sentence Encoder to convert sequences of price shifts (‘sentences’) into a single input, or even BERT’s sentence-embedding capabilities. This a really interesting potential approach, which I will consider implementing, mainly because you can generate 10+ times more embeddings from the same amount of data. One problem with this approach, which I can see right away, is what happens when a 2-day embedding matches with a 5-day embedding? What does this mean? This essentially means a certain sequence of stock price changes over 2 days is similar to a sequence of stock price changes over 5 days. This is certainly interesting, but I don’t know how to treat this information (are these sequences really similar if they occupy different time windows?). To avoid unnecessary complication, I will avoid this approach in my algorithm’s first trials, but it is still definitely worth looking into.
  3. BERT Next_Sentence predictor: I’ve discussed this superficially in previous blog posts, but the BERT model offers a function which takes two sentences as inputs, and then decides whether or not the second sentence could logically follow the first one. This approaches the sequencing problem in reverse, by taking the real-time data as an input and looping through possible outcomes to figure out which one is most likely to happen.

These are the three solutions to the sequencing problem I’ve identified, and I’m sure there are other solutions out there, so I will keep researching this in the upcoming weeks. In the meantime though, I think solution #1 is most viable.

The TopK List Problem

Another problem to consider is how to interpret TopK lists.
topklistsWhen you input a query into my Word2Vec code, it returns a list of the top k (k is just a number which you can adjust, so it can be the top 10 list or just the top 1) embeddings which most closely match the query.
If you zoom into this image (sorry about the sizing — For some reason, Linux has terrible screenshot software. I’ll get better pictures up soon), you can see the top 3 passages which most closely match the word ‘debt’. My intuition was that when I input real-time stock data, it is only worth looking at the closest match (#1 in the topK list), since we’re trying to find similar patterns in the stock market’s behavior.
However, Andrew Merrill brought up a great point which I unfortunately neglected, which is to consider the whole top k list, rather than only the expected outcome of the closest match. See, if our program returns a topK list where the number 1 result suggests that the market will drop, the number 2 result suggests the market will go up immensely, and the number 3 result suggests the market will stay flat, then this is an overall uncertain topK list (even if the query embedding matches the data embedding with 95%+ accuracy). We need to assign more value to a topK list with more consistent results, because this inspires more confidence in the overall projection (If all ten of the top ten closest matches suggest the market will go up, then this is a stronger signal that the market will actually go up).
One way to solve this issue is to calculate the volatility of the topK list, and factor this into our Confidence Score. The Confidence Score is the final value assigned to a potential investment, which currently takes into account the volatility of a topK list, the strength of the match (what % similarity), magnitude of match (how many days in the sequence match one another).
Right now, my priority is figuring out what else should factor into the Confidence Score, and what’s a fair way to factor all of these parameters into a single score. I’m thinking of using Z-Scores, but I’m still learning.

A.I + Finance Startups on the Rise

Finally, I came across a very interesting research startup called, which, from my understanding, uses AI to actively search for new investment opportunities and to optimize their portfolios. Here’s an excerpt from their website: “Solving Intelligence for Investment Management entails designing new Machine Learning methodologies to automate the alpha exploration process, so as to give an irrevocable and unfair advantage to (our) machines over the best human experts (Quants), in finding and exploiting new alphas, in making markets more efficient.” They don’t explicitly mention which AI models and strategies they use for their ‘alpha exploration process’, but it is interesting to see new startups focused on finding investment opportunities strictly through AI. I’m hoping to build my algorithm into a more general startup-esque venture similar to what is doing, so I’m curious to see how things play out for them and which steps they take moving forward. If you’re curious to read more about some of their research, click here.

The Three Main Types of Trading: Thoughts from a Market Maker

Earlier today, I had the privilege to speak with Patrick Chi, a current student at Columbia University who has worked closely with a handful cryptocurrency and trading firms, such as Optiver. According to Mr. Chi, his experience rarely intersected with A.I in trading algorithms, since his work was primarily done with pricing options, cryptocurrency arbitrage, and market making. This was quite an interesting and useful perspective to learn from, as Mr. Chi is a statistics major who only recently began dabbling with financial instruments and trading (a 180 degree turn away from my path, which has been learning trading strategies first and then studying necessary statistical and mathematical models as they appear). Also, this conversation familiarized me with the concept of market making, which is explained later on through Mr. Chi’s own words.
So, without further ado, here is the transcript of our conversation:
S.A: Hi, thanks a lot for offering to talk, I really appreciate it. The reason I reached out to you is because this year, I have been working to get familiarized with quantitative finance in general, as this is something I’ve always been interested in and will likely pursue at college and beyond. As part of this, I’ve been trying to write my own trading algorithm as well. So, first off, I’m wondering how you got familiarized with this work in quant finance? Was it more from personal interest, or was it from school?
P.C: I actually became interested in trading because of the interview process most trading companies had, which was very focused on problem solving like you would see in math competitions and AMCs. A lot of their problems are similar to trading and probability problems, and the interviews are very interesting, and I’ve always liked this type of thinking and solving challenging math questions. Some examples were finding optimal trades to make given a certain scenario, making decisions to maximize expected value given certain constraints. Like, no trades = 0 expected value, good trades and hedging = positive expected value, but at the same time, a lot of trades which aren’t well calculated = negative expected value. 
S.A: Working at a company focused on algorithmic trading strategies, is there a stronger emphasis on mathematical modeling, such as statistical probabilities, or on A.I strategies when creating your algorithms?
P.C: In terms of actual building algos with A.I, I’m not the most experienced, because my work is mostly in market making and options. Also, it’s important to clarify what you mean by A.I, because most companies nowadays claim to use A.I because it’s such a buzzword. In reality though, A.I is not used as much as people think it is in trading. 
S.A: Could you tell me a bit about what market-making and arbitrage are?
P.C: From how I see things, there are three main types of trading: Hedge funds, HFTs, and market makers. Hedge funds trade long term positions and try to find bigger trends in the market as a whole. Hedge funds like Two Sigma are known for using statistical relationships. Another thing about these funds is that they try to find correlations which have not already been exploited, and there’s a variety of methods for finding these correlations. For example, and this is not true at all by the way, a hedge fund might notice that whenever Apple’s stock goes up, Amazon’s goes down, and then they will make investments based off this correlation. 
High frequency trading companies look at a lot of data, and establish a bunch of positions which they take in seconds and then sell a couple seconds later. They also retrain their models every couple days, so there’s a lot of active management going on. The reason why they retrain is because there’s so much of this intraday data, and the patterns might change or new patterns might emerge. An example of how a HFT would work would be: Oh, we’ve seen this pattern of behavior in intraday data before, and 90% of the time this will return profit. Mind you, the profits are usually a couple cents, but they’re in such high volume that there’s a lot of money to be made. Using current market conditions, they make this judgement, and this is all happening in a matter of seconds. 
Market making is where you are the market itself, so we set buy and sell prices based on what people are willing to pay. Essentially, the spread between two assets is calculated by market makers, and we are the ones who provide and enable liquidity. Prices are pretty much determined based off of expected value, and I set a price where there will be an equivalent number of buy and sells. This is probably one of the safest form of trading, because there’s less risk and dependence on volatility. 
S.A: So what’s one thing you guys consider most heavily when gauging the value of an option? My only experience with options is that they’re used to hedge other investments, so I’m wondering if you guys had other uses for them?
P.C: Options are much harder to price than stocks. We look at order books and other probability models. It kind of works by asking what do you think is the probability that an asset will be above/below a price at a certain time, and then we calculate a distribution of prices. It’s funny because the models we use are not very applicable to real life, as in reality, markets just keep going up long term. What’s harder to price is the probability that we finish above a threshold, which depends a lot on volatility. Nowadays, bigger companies mostly trade in options, because they’re way more profitable since everyone knows what the value of a stock is. 
As a novice investor, you would be taking the prices, rather than making them. There’s a lot less volume with a personal investor, so you can’t really have an influence on the market, which means you are kind of powerless in changing these prices like a bigger company might be. 
S.A: What do you think all of this says about the premise of trying to match larger patterns in the markets? 
P.C: From what you’ve described, which is using A.I to match patterns, it sounds like your work is more related to HFT. I’d recommend starting out trading with crypto, since you can start off with small amounts of money and get used to how the markets work. You can also learn a lot about expected value vs. current value doing this, which all boils back down to expected value and probability.
I don’t mean to discourage you, but macro scale is already priced-in, as long term trading is pretty much finding correlation between two instruments. This is primarily because long term doesn’t have enough data, and not nearly as much data as HFT does. There’s just too much noise with the data you’re working with. 
S.A: What are some good ressources you recommend I look at, as a beginner in quantitative trading? Or some other people I could reach out to?
P.C: Well, I do have some friends working in HFT, which would probably useful to you considering your focus on A.I and data analysis. But, it’s always hard with high frequency traders, as they’re the ones who need to hide trade secrets to preserve their advantages over competition. Also, you might not know this, but HFT is actually a much smaller, much more specialized field than most people think. There’s actually not that many companies out there doing HFT, just because it’s a pretty new thing and there’s so much competition over who receives data the fastest, among other things. I’ll definitely try to connect you with them, though.
Image result for takeawaysImage result for takeaways


There’s a lot of useful information to unpack in this conversation, so I’ll give a rundown of what I found most intriguing.
First off, Mr.Chi’s explanation of high frequency trading strategies was worth noting. According to him, HFT works by finding patterns in intraday market data, which makes sense. There must be some trends in this type of data, like the book pressure that Mr. Conerly mentioned in my interview with him. A simple example of such a pattern is whenever there are more buy offers than sell offers, the price will go up. This may seem self-evident, but it is still a form of repeating pattern which a neural network would probably pick up on. This is important because this acknowledges the existence of patterns in market behavior, and acknowledges that A.I is being used to identify these patterns. This is more affirmation than I can ask for in reference to my project. Like I’ve mentioned, my work with using embeddings for mapping market behavior depends entirely on the existence of repeating patterns in the markets.
A second intriguing takeaway is Mr. Chi’s take on the usefulness and applicability of options. According to him, most big traders nowadays rely on options because of several reasons. For instance, options are more versatile, in that you can purchase a put if you predict a downturn and you can purchase a call if you predict the opposite. To add, options allow you to choose what price you set your buy/sell point at. You don’t simply buy an option, you buy a call at $280 for March 22nd, for example. This means you agree to sell your call by March 22nd at the latest, and if the price is $280 or above, you will make profit. So, there’s plenty of factors that come into play when purchasing an option which give you more control over your investments. Purchasing stock is rather one-dimensional, as you only profit if the price goes up (unless you’re shorting), and all the investors in the world know this, meaning that you have much less edge over your competition.
With all that in mind, I might consider purchasing options rather than stocks through my trading algorithm, but only after I see positive results. Options, as Mr. Chi mentioned, can be much more profitable and versatile than simple stock positions, but at the same time, they are much riskier and more expensive. They’re expensive because they are sold in bundles of 100, meaning if an option is priced at $2.00, then the least you can expect to pay is $200.
Finally, the “no trades = 0 expected value, good trades and hedging = positive expected value, but at the same time, a lot of trades which aren’t well calculated = negative expected value” reference reminded me of my post on loud silence, and how most of the time in trading, it’s best to remain uninvolved until you find a great entry point.