如果唔想用database咁麻煩,可以試下咁寫
def read_time_series(ticker):
# step 1: try to read cached history if exists
try:
# if history already exists, read if into memory first
with open(f'cache/{ticker}.csv', 'r') as fload:
df = pd.read_csv(fload)
# only download the newest few data points, e.g. 5 days
df_new = yf.donwload(ticker, '5d')
# append new data to the old data
df = df.append(df_new)
# remove any duplicated rows
df = df.drop_duplicates()
except FileNotFoundError:
# if cache file doesn't exists, it will end up here
# then download whole dataframe for the first time
df = yf.download(ticker, 'max')
# step 2: up to here, you get a full dataframe in either ways
# always save the dataframe into cache for next time use
df.to_csv(f'cache/{ticker}.csv')
# step 3: return the dataframe for to the rest of your program
return df
咁寫已經cache到大部份嘅數據,唔使每次重頭download,理論上快好多
(啲code只係show大約logic,冇run過的)