A Financial Database for Beginners

STEP 1: Project Set Up

First and foremost, let’s create a main directory to house our project. You can call it whatever you would like, but for the purposes of this article I named it ‘FinanceApp’. Within this directory create two sub-directories and call them ‘tickerlist’ and ‘dailydata’. Now you’re ready for the next step, choosing the data source.

STEP 2: Data Source (Yahoo Finance API)

The best free alternatives I have found for high quality financial data in 2021 are the Yahoo Finance API and stooq.com. Alpha Vantage is an honourable mention as their data is technically free however you are limited to a certain number of API calls per day and in the long run, when building a comprehensive financial database, this can become quite a nuisance. For learning purposes we will be using the Yahoo Finance API through the ‘yfinance’ Python library. The Yahoo Finance public API has been deprecated since 2012, however, their information has been made available through sources like rapidAPI. In turn, yfinance is a python library which enables you to connect to the Yahoo Finance API and receive pre-formatted data back (in Panda’s DataFrames). This is a huge quality of life improvement and a major contributing factor as to why I personally cannot recommend this library enough for a personal project of this scale. Beyond the scope of this project however, I strongly recommend checking out the other features of this library and what it can do. In particular, how you can use this library to extend the capability of the basic system we make today.

STEP 3: Ticker List (Nasdaq Trader)

The next step in any financial data collection journey is finding a list of all the tickers or symbols we need to keep track of. Luckily, there are a few options to choose from here. We will be using nasdaqtrader.com. The symbol lists on this site get updated frequently and keep track of all the companies listed on the Nasdaq, NYSE, and more. Subsequently, this script will be tracking those exchanges, however you are welcome to get ticker lists for any exchange you please, so long as Yahoo Finance tracks the companies.

STEP 4: Saving the Tickers as JSON Data

Let’s write a python script called ‘tickers.py’. Inside this file put the code you see below.

def merge_tickers():
nasdaqtxt = ‘tickerlist/nasdaq.txt’
nysetxt = ‘tickerlist/nyse.txt’
tickers = {}

with open(nasdaqtxt,’r’) as file:
companies = file.readlines()
for company in companies:
company_data = company.split(‘|’)
tickers[company_data[0]] = {
‘ticker’ : company_data[0],
‘company’ : company_data[1],
‘exchange’ : ‘Nasdaq’,
‘ETF’ : False }
with open(nysetxt,’r’) as file:
def exchange_type(type):
if type in (‘A’,’N’,’P’): return ‘NYSE’
else: return ‘OTHER’
def is_etf(flag):
if flag == ‘Y’: return True
else: return False

companies = file.readlines()

for company in companies:
company_data = company.split(‘|’)
if '.' not in company_data[0] and '$' not in company_data[0]:
tickers[company_data[0]] = {
‘ticker’ : company_data[0],
‘company’ : company_data[1],
‘exchange’ : exchange_type(company_data[2]),
‘ETF’ : is_etf(company_data[4]) }
return tickers
import jsondef write_to_json(data):
file_location = 'tickerlist/tickers.json'
json_string = json.dumps(data)
with open(file_location,'w+') as file:
file.write(json_string)
companies = merge_tickers()
write_to_json(companies)

STEP 5: Extracting the Data

Believe it or not, once you have an array of company tickers to call, collecting the information you need isn’t difficult. This is thanks to the yfinance python library. Let’s create a script called ‘extract.py’ in the project’s main directory and in that file insert the code below.

import yfinance as yf"""
for this demo we are going to focus on end of day (C) pricing for these companies
"""
def get_tickers():
file_location = 'tickerlist/tickers.json'
with open(file_location,'r') as file:
string = file.read()
data = json.loads(string)
tickers = [ticker for ticker in data]
return tickers
def initialize_db(tickers):
for ticker in tickers:
try:
data = yf.download(
tickers = ticker,
period='max',
interval='1d',
auto_adjust=True,
prepost=True,
proxy=None )

d = {}
for index, row in data.iterrows():
d[str(index).split(' ')[0]] = {
'O' : row['Open'],
'H' : row['High'],
'L' : row['Low'],
'C' : row['Close'],
'V' : row['Volume'] }

json_string = json.dumps(d)
file_location = 'dailydata/{}.json'.format(ticker)
with open(file_location,'w+') as file:
file.write(json_string)

except:
continue
tickers = get_tickers()
initialize_db(tickers)

STEP 6: Visualizing Your Data

This is the step in which you get to see the fruits of your labour. For this section we will be using the python libraries seaborn, pandas, and matplotlib.pyplot to make a basic visualization of your data. Let’s create a python file called ‘visualize.py’ and put the code below.

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import json
def plot(ticker):
# retrieve the data
file_location = ‘dailydata/{}.json’.format(ticker)
with open(file_location, ‘r’) as file:
json_string = file.read()
data = json.loads(json_string)

# format the data
try:
stock_prices = [ data[date][‘C’] for date in data ]
dates = [ date for date in data ]
assert len(stock_prices) == len(dates)
except AssertionError:
print(‘Error in retrieving company data’)
return

if len(stock_prices) > 2000:
stock_prices = stock_prices[-2000:]
dates = dates[-2000:]

data = { ‘Dates’ : dates,
‘Price’ : stock_price }

df = pd.DataFrame.from_dict(data)

# change the dates parameter so they are of type date
df[‘Dates’] = pd.to_datetime(df[‘Dates’], format = ‘%Y-%m-%d’)

# plot the data
sns.lineplot(data=df,x=’Dates’,y=’Price’)
plt.title(ticker)
plt.show()
plot(‘TSLA’)
plot('GME')

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
BOLT Canada

BOLT Canada

13 Followers

Business tech bootcamps encouraging students to pursue innovation in a whole new light. 💻💡