Word2Vec is a public programming model which allows users to create shallow neural networks. Word2Vec gets its name (Words –> Vectors) from the fact that these neural networks can be trained on large data sets, and eventually create unique, weighted, multi-dimensional vectors for each word in the data set. The weights attributed to each unique vector are established by analyzing the context in which these words appear (given a window size), and adjusting this vector’s weights based on the vector weights of the words which it appears by.
In the image above, you can see a plot graph of the words found in a certain data set. (I ran the program for creating this plot graph, but the data is not mine, nor is it the data I want to use for my project. I solely used this as an initial example). The neural network created on this data only used 10,000 steps, which is not much at all. The idea of neural networks is that the more you train them, on more and more data, the more accurate and meaningful the results are. Despite this small step size, you can still see accurate word connections in the graph above. For example, take a look at the light-blue dot on the far-left which is labelled ‘island’. Right next to it (the nearest dot should be the word most closely associated) is the purple dot labelled ‘sea’. Island is not a synonym of sea, but those words are related by associated meaning.
This brings me to my bigger point: Word2Vec can be used not only to find related words, but also to identify bigger connections. What this leads me to believe is that Word2Vec can be used in trading algorithms, by analyzing current events/trends and seeing if they map to previously identified ones. Since we know the outcome of previous events, we can use this knowledge to predict what might happen as this current event’s outcome. Interestingly enough, this idea has been experimented with before, as I was able to find 2 (only two, because Word2Vec became available to the public after 2014 and therefore is relatively new technology) papers looking at Word2Vec for trading algorithms. One was written by Tal Perry and the other by Alexandr Honchar. Reading their research was insightful, and I would definitely want to interview at least one of them to learn more in-depth about their approaches and experiences.
This revelation gives me a clear understanding of the course I will be taking this year, in my independent research work. Below is the general back-end plan I have for my work:
- Write Word2Vec search
- Improve search by adding more focused datasets.
- Test my Word2Vec search for its capability for noticing trends
- Study trading algorithms
- Implement Word2Vec, Black Swan principle, hedging → algos
- See how non-Word2Vec trading algorithms compare to standard trading algorithms.
So far, I am well into part 1 of my plan, as I have successfully written a search algorithm which uses Word2Vec to find passages, words, events relating to a given query. The user provides a query and number of desired results, and my program returns the closest vectors to this query, ordered by relevancy. One interesting thing I have observed through my first round of testing is that oftentimes, results lower in relevancy are strikingly useful, and usually are way more relevant than results ranked higher up than it. This could mean that I need to rework some of my code, or just test the neural net on a broader data set. My intuition is that the latter is true, so I am currently working on finding useful training datasets. So far, I have been looking at Wikipedia articles, Quora posts, Investopedia pages, etc…
I am happy with how things are going on the programming end of things, but I still haven’t secured any interviews, which I don’t want to delay any further. My list of potential interviewees has gotten up to about 17, with my goal being 25+. If anyone knows of people working in original trading algorithms or Word2Vec AI who would be willing to talk for a few minutes (or even just answer some questions over email), please let me know in the comments below.