How to Use Data Science on the Stock Market
Explaining Data Science Concepts by Focusing on Financial Markets
Photo by M. B. M. on Unsplash
Data Science is a popular subject nowadays. Everyone is all about data. What it can do and how can it help. Many times data is represented as numbers and these numbers can represent many different things. These numbers could be the amount of sales, inventory, consumers, and last but definitely not least — cash.
This brings us to financial data or more specifically the stock market. Stocks, commodities, securities, and such are all very similar when it comes to trading. We buy, we sell, we hold. All this in order to make a profit. The question is:
How can Data Science help us when it comes to making these trades on the stock market?
Data Science Concepts for the Stock Market
When it comes to Data Science, there are a lot of words and phrases or jargon used that many do not know. We are here to solve all of that. Inherently, data science involves knowledge of statistics, math, and programming. If you are interested in knowing more about these concepts, I will be linking some sources throughout the article.
Now let’s jump right in to what we all wanted to know — using data science to make analyses on the market. By analyses, we are determining which stock is worth the investment or not. Let’s explain some data science concepts centered on finance and the stock market.
Photo by Franck V. on Unsplash
In data science and programming, algorithms are used quite extensively . An algorithm is a set of rules in order to perform a specific task. You may have heard of algorithmic trading being a popular thing in the stock market. Algorithmic trading uses trading algorithms and these algorithms involve rules such as buying a stock only after it has gone down exactly 5% that day or selling if the stock has lost 10% of its value when it was first bought.
These algorithms all are capable of running without human intervention. They have often been referred to as trading bots since they are basically mechanical in their trading methods and they trade without emotion.
If you want to see an example of creating a trading algorithm, then check out the article below:
Photo by Meghan Holmes on Unsplash
This is not your typical training. With data science and machine learning, training involves using selected data or a portion of the data to “train” a machine learning model. The entire dataset is usually split in two different portions for training and testing . This split is usually 80/20 with 80% of the entire dataset held for training. This data is called the training data or training set. In order for the machine learning model to accurately make predictions, they would need to learn from past data (training set).
If we were to try to use a machine learning model to predict the future prices of a select stock, then we would give the model the stock prices from the past year or so to predict the next month’s prices.
Photo by Ben Mullins on Unsplash
After training a model with the training set, we would want to know how well our model is performing. This is where the other 20% of the data comes in. This data is usually called the testing data or testing set. To validate our model’s performance, we would take our model’s predictions and compare it to our testing set.
For example, let’s say we train a model on one year’s worth of stock price data. We’ll use the prices from January to October as our training set and November and December will be our testing set (this is an extremely simplistic example of splitting yearly data and should not be normally used because of seasonality and such). After training our model on Jan-Oct prices, we will have it predict the next two months. These predictions will then be compared to the actual prices from Nov & Dec. The amount of error between the predictions and the real data is what we are aiming to reduce as we mess around with our model.
Features & Target
Photo by NeONBRAND on Unsplash
In data science, data is commonly displayed in a tabular format like an Excel sheet or a DataFrame. These data points can represent anything. The columns play an important role. Let’s say we have stock prices in one column, P/B Ratio, Volume, and other financial data in the other columns.
In this case, the stock prices will be our Target. The rest of the columns will be the Features. In data science and statistics the target variable is called the dependent variable. The features are known as the independent variables. The target is what we want to predict future values for and the features are what the machine learning model uses to make those predictions.
Photo by DAVIDCOHEN on Unsplash
One thing that data science uses heavily is a concept called “ Modeling ”. Modeling usually uses a mathematical approach to take in past behaviors to forecast future outcomes. When it comes to financial data in the stock market, that model is usually a Time-Series model. But what is a time-series?
A Time-Series is a series of data, in our case this would be price value of a stock, indexed in order by a period of time which could be monthly, daily, hourly, or even minutely. Most stock charts and data is a time-series. So when it comes to modeling these stock prices, a data scientist would usually implement a time-series model.
Creating a time-series model involves using a machine learning or deep learning model to take in the price data. This data is then analyzed and fitted to the model. The model will then enable us to forecast future stock prices over a selected period of time. If you want to see this in action, then check out the articles below detailing both a machine learning and deep learning approach to forecasting Bitcoin prices:
Photo by Shane Aldendorff on Unsplash
Another type of model in machine learning and data science is called a Classification Model . Models that use classification are given certain points of data and then predict or classify what those data points represent.
For the stock market or stocks, we can give a machine learning model different financial data such as the P/E Ratio, Daily Volume, Total Debt, etc to determine if stock is fundamentally a good investment. The model may classify this stock as a Buy, Hold, or Sell depending on the financials we gave it.
Check out the article below if you want to see an example of classification models on stocks:
Photo by 青 晨 on Unsplash
While evaluating the performance of a model, the errors sometimes reach a point of being “too hot” or “too cold” when we are searching for “just right”. Overfitting happens when the model predicts too complexly to the point where it misses the relationship between the target variable and the feature. Underfitting happens when the model does not fit the data enough and the predictions are too simple.
These are issues that data scientists need to be aware of when evaluating their models. In financial terms, overfitting when the model cannot pick up on stock market trends and is incapable of adapting to the future. Underfitting is when the model basically starts predicting the simple average price for the entire stock history. In other words, underfitting and overfitting both lead to poor future price predictions and forecasts.
The topics we covered are common key data science and machine learning concepts. These topics and concepts are important for learning data science. There are many more concepts out there to be covered. If you have been familiar in the stock market and have an interest in data science we hope that these descriptions and examples have been useful and understandable.