Logo

The Data Daily

How To Use Machine Learning To Possibly Become A Millionaire: Predicting The Stock Market?

How To Use Machine Learning To Possibly Become A Millionaire: Predicting The Stock Market?

How To Use Machine Learning To Possibly Become A Millionaire: Predicting The Stock Market?
Our confidence interval is somewhere between 50 and 70%
Aug 30 · 8 min read
When you’re so bored with your stacks
Working on Wall Street is just as intense and rewarding as you would imagine. Lots of suits and lots of sullen faces and lots of cigarette smoke. Amidst all of the craziness you’d expect from the literal financial center of the world, the actual underlying goal of everyone there is pretty simple. At risk of oversimplifying things, I’ll tell you right now that finance is simply using money (either your own or some you’ve borrowed) to get more money. The financial industry doesn’t actually create any value, rather it uses other factors to get returns on investments.
The stock market is one of the most well-known infrastructures through which anyone can potentially make a fortune. If anyone could crack the code to predicting what future stock prices are, they’ll practically rule the world.
There’s just one problem. It’s pretty much impossible to accurately predict the future of the stock market. So many analysts, so many researchers, so many super smart people have tried to figure it all out. No one has been able to garner consistent results. No one.
So what’s the point of this article? Why am I writing about using machine learning to possibly predict the stock market? Mostly just for fun, I guess. More importantly, however, it’s a great learning exercise for machine learning and finance.
Agenda
What You Should Do Instead
Areas of Improvement
Resources
If you want a more in-depth view of this project, or if you want to add to the code, check out the GitHub repository .
Using the Stocker Module
The Stocker module is a simple Python library that contains a bunch of useful stock market prediction functions. Upon initialization, they aren’t that accurate (better to just flip a coin). But with some tuning of parameters, the results can be a lot better.
First we need to clone the GitHub repository.
!git clone https://github.com/WillKoehrsen/Data-Analysis.git
We also need to import some libraries. Now that the repo is cloned, we can import the Stocker module as well.
!pip install quandl
import stocker
from stocker import Stocker
Let’s create a Stocker object. I chose Google as my company, but you’re not obligated to do the same. The Stocker module has a function called plot_stock() that does a lot by itself.
Google’s stock is very nice
If you pay attention, you’ll notice that the dates for the Stocker object are not up-to-date. It stops at 2018–3–27. Taking close look at the actual module code, we’ll see that the data is taken from Quandl’s WIKI exchange. Perhaps the data is not kept up to date?
We can use Stocker to conduct technical stock analysis, but for now we will focus on being mediums. Stocker uses a package created by Facebook called prophet which is good for additive modeling.
Now let’s test the stocker predictions. We need to create a test set and a training set. We’ll have our training set to be 2014–2016, and our test set to be 2017. Let’s see how accurate this model is.
Look how terrible this prediction is!
The results are quite horrendous, with the predictions being almost as bad as a coin flip. Let’s adjust some hyperparameters.
Here we can see the results of using different changepoints
Validating on the changepoints is an effective way to adjust the hyperparameters to better tweak the stock prediction algorithm.
Now we can evaluate the refined model to see if there are any improvements in the prediction estimates.
This is only SLIGHTLY better than the previous model
Now it’s time to do the ultimate test: try our luck in the stock market (simulated, of course).
Looks like it’s just better to buy and hold.
Even after all of that tweaking, it’s clear that simply buying and holding would produce better returns.
Preparing Data for Machine Learning
Now let’s move on to attempting to predict stock prices with machine learning instead of depending on a module. For this example, I’ll be using Google stock data using the make_df function Stocker provides.
Narrowing down the dataframe to get the stuff we care about
Moving Averages
In summary, a moving average is a commonly used indicator in technical analysis. It’s a lagging indicator, which means that it uses past prices to predict future prices. It’s effective in smoothing out any short-term fluctuations and finding the overall trend. We’ll use moving averages to see if we can do a better job of predicting stock prices.
Here’s the closing prices for Google stock
Let’s measure the accuracy of our model with RMS (Root Mean Squared Error).
Now let’s see our prediction plotted next to the actual prices.
Yikes
In terms of figuring out the general trend of the stock data, the moving average method did okay, but it failed to see the full extent of the increase in the price, and that is not good. We definitely wouldn’t want to use this method for actual algorithmic trading.
Simple Linear Regression
Let’s try using another method to predict future stock prices, linear regression.
First let’s create a new dataset based off of the original.
Now let’s add some more features to the dataset for the linear regression algorithm. We’ll be using some functions from the fastai module.
Now let’s do a train-test split.
Now we can implement the algorithm and get some results.
Once again, the prediction algorithm somewhat figures out the general trend, yet it fails to capture what we need the most.
k-Nearest Neighbors
Let’s move on to yet another machine learning algorithm, KNN.
Let’s go through the same process with the same data as the linear regression stuff. The only difference is that we’ll be implementing a different algorithm to the data. Let’s see which predictions are better.
What are our results?
What a horror story
Yikes! This is the worst prediction we’ve got so far! There’s a reason k-nearest neighbors is more useful for classification problems and small-scale regression. This appears to be a classic case of overfitting. Because KNN is really just calculating distances from each point to another, it was completely unable to figure out the trend of where the prices are going. What’s next?
Multilayer Perceptron
Let’s move into some deep learning, more specifically, neural networks. A multilayer perceptron is one of the simplest types of neural networks, at least simpler than convolutional neural networks and long short-term memory. We don’t need to get into the details on how the algorithm actually works. If you’re interested, check out the resources at the end of the article.
Let’s get our results.
This is even worse than KNN! There are a number of factors as to why the neural network is so bad at predicting the stock prices, and one of them is definitely the lack of meaningful features and data. Obviously there are many hyperparameters that can be tweaked as well.
Conclusion
What did we learn today? What did all of this technical analysis show us? The answer is quite simple: If you’re not someone like Ray Dalio or Warren Buffet or any of the great investors, it’s very risky and ultimately not as profitable to try to beat the stock market. According to some sources, a majority of hedge funds can’t even do better than the S&P 500! Therefore, if you want to make the best returns on your investments, do the buy and hold strategy. For the most part, simply investing in an index fund like the S&P 500 has yielded pretty good returns, even when there were several big drops in the economy. In the end, it’s up for you to decide.
Areas of Improvement
Thank you for taking the time to read through this article! Feel free to check out my portfolio site or my GitHub .
1. Use different stock data
I only used Google stock data and for a relatively small range of time. Feel free to use different data that can be pulled with Stocker or Yahoo Finance or Quandl.
2. Try out different machine learning algorithms
There are MANY machine learning algorithms out there that are very good. I only used a small subset of them and only one of them was even a deep learning algorithm.
3. Tweak more hyperparameters
This is pretty self-explanatory. More often than not, the default settings for any algorithm are not optimal, thus it’s useful for you to try out some validation to figure out which hyperparameters are most effective.

Images Powered by Shutterstock