Python 股市Screener 分享

carlam

469 回覆

288 Like 14 Dislike

尸廿山女田卜 2021-08-16 11:33:52

我試吓再filter 走etf 和其他野。有冇少少RS 嘅background，要用乜嘢去做ranking 或比較。我再加啲remark入個companylist 度，將來可以reuse 個list。

尸廿山女田卜 2021-08-16 18:12:28

揾左啲料講點用 threading & multiprocessing 加速.

python uses single core cpu by default. All threads use single core as well. multiprocessing can use more cores at the same time.

threading - for network, IO, GUI related program. i.e. web crawling. Download 10 stocks at the same time in our case.

multiprocessing - for cpu intensive program but consume more memory. Screening and generate 4 to 8 daily reports at the same time in our case.

Ref:-
https://blog.floydhub.com/multiprocessing-vs-threading-in-python-what-every-data-scientist-needs-to-know/

carlam 2021-08-16 18:18:11

要小心避開唔好screen 同一日

一次過同時screen 同一個星期？

Threading 需要先split tickers 先嗎？

尸廿山女田卜 2021-08-16 18:38:53

一條thread 下載一隻股票嘅(say)兩年data。只行一轉。（如有需要再行多轉。）
做一個下載fx（ticker, startdate,enddate)，放入thread 行。應該出result(ticker, date, close price…)。同一時間行幾條thread做一個同時下載多個股票嘅效果。

一個process 只做一個daily report，其間用日子＋股票做filename（png, jpg, pdf…)避免conflict。因為已經下載一年＋52周資料，可以瘋狂咁gen。

另外program 做番total_info.csv, breath.pdf 結合每日資料做整合報告。例如四日行一次（與multiprocess 數一樣）。等齊一轉先做，有需要加validation…

尸廿山女田卜 2021-08-16 18:44:04

十個screen 條件分十個fx，排RS都拆一個fx做，方便將來加減條件和功能。
寫一次就可以重複再用…

打工對沖 2021-08-16 22:54:08

如果用multiprocess就map完再join返就得

打工對沖 2021-08-16 22:55:15

睇緊用numba去compile function黎加速
但好似又唔compatible with pandas

hhhfff 2021-08-17 22:31:23

yf 啲資料...

threading 用loop try if 就做到每個thread可以獨立工作唔會重複做同1個任務

尸廿山女田卜 2021-08-18 00:36:12

Performed some tests as shown below,
**Download 1 yr data for 5500 stocks from yf & save in memory using one vcore

1) Original code (pdr.get_data_yahoo())
cpu: 4 - 10%
duration: 65 mins
Established connection: 1
thread: 1

2a) Threading (pdr.get_data_yahoo())
cpu: 100%
duration: 6 mins
EST connection: 11 - 19
thread: 27 - 40

3) native threading from yf.download(threads=True)
cpu: 4 - 10%
duration: 66 mins
EST connection: 1
thread: 1
cannot enable the native threading :<

~~~~~~~~~~~~~~~~~~~~
2b) Threading with 2 yrs data (pdr.get_data_yahoo())
cpu: 100%
duration: 6 mins
EST connection: 10 - 23
thread: 27 - 40

Remark:
No blocking during all tests.

尸廿山女田卜 2021-08-22 11:21:38

發現原來get_data_yahoo()有bug。佢本身支援multithread 和預設啟動，但又冇thread 到，仲有大量下載失敗，console冇任何error msg。

single core cpu ，Linux下，自己寫multithread 嘅failure rate:10-20%
multi-core, MacOS 下，自己加multithread 嘅failure rate 大過90%，同行唔到冇分別。猛咁出json error…

google到，disable 咗get_data_yahoo嘅threads後，行咗幾轉，linux 和MacOS 嘅failure rate 少於5% （剩番redis error & silent miss)。error msg 見到yfinance 嘅redis伺服器久唔久overloaded。似乎要加個loop 行多兩轉較安全。

ps:
用concurrent.futures 行8 threads
pdr.get_data_yahoo(…, threads=False)攞資料

carlam 2021-08-22 12:16:23

所以係加threads = False 行會好d?

尸廿山女田卜 2021-08-22 12:28:16

yes

飯主任 2021-08-22 12:38:37

股票有機會拆股導致過去歷史股價數值會變
除非你知個一日有邊隻股票拆股或者合股
如果唔係你用死cache股價會計錯數

有錢唔係罪 2021-08-22 13:10:56

只重新dl拆股合股嘅嗰幾隻就已經ok
只要detect到download落嚟嘅數
同前幾日database嘅數有好大嘅出入
就已經知道邊隻

飯主任 2021-08-22 13:17:24

都有d道理

尸廿山女田卜 2021-08-22 16:34:07

ching, 想請教點cache?有冇reference?

有錢唔係罪 2021-08-22 18:37:11

如果唔想用database咁麻煩，可以試下咁寫

def read_time_series(ticker):
    # step 1: try to read cached history if exists
    try:
        # if history already exists, read if into memory first
        with open(f'cache/{ticker}.csv', 'r') as fload:
            df = pd.read_csv(fload)
        # only download the newest few data points, e.g. 5 days
        df_new = yf.donwload(ticker, '5d')
        # append new data to the old data 
        df = df.append(df_new)
        # remove any duplicated rows
        df = df.drop_duplicates()
    except FileNotFoundError:
        # if cache file doesn't exists, it will end up here
        # then download whole dataframe for the first time
        df = yf.download(ticker, 'max')

    # step 2: up to here, you get a full dataframe in either ways
    # always save the dataframe into cache for next time use
    df.to_csv(f'cache/{ticker}.csv')

    # step 3: return the dataframe for to the rest of your program
    return df

咁寫已經cache到大部份嘅數據，唔使每次重頭download，理論上快好多
（啲code只係show大約logic，冇run過的）

尸廿山女田卜 2021-08-22 18:57:34

好清楚，唔該晒

想問csv 或sqlite好用？我之前寫九千幾隻股票寫落九千幾個csv file，ssd 都寫咗分幾鐘，我估open/close file太多所以慢。建唔建議用sqlite ？會否快啲？有冇其他問題？

尸廿山女田卜 2021-08-22 19:05:12

如果multithread update sqlite，會否炒左個db file?

尸廿山女田卜 2021-08-22 19:14:04

我先concat 曬全部df 再一次過落sqlite，今晚試吓速度先…

有錢唔係罪 2021-08-22 19:23:31

我試過MySQL同時20條thread去INSERT/REPLACE入去同一個table，基本上真係完全唔會炒，thread locking嗰啲麻煩嘢佢幫你完美handle晒，你只需要用同一條db connection去execute就成功

sqlite我未用過，但我估呢啲基本功能應該所有db都有，如果超過1000隻股票長遠用db係好啲

VibingCat 2021-08-22 20:29:18

如果一次過拎晒所有stock history price再用pandas做vectorised calculation
用一個新df去store啲condition
最後成個column一次過and晒佢會唔會快啲

尸廿山女田卜 2021-08-23 09:48:13

其實現時已經放曬全部股票及資料入一個大df到，比screener及reporter 讀。只係諗下點reuse 啲data做其他嘢。如tg or email alert…

尸廿山女田卜 2021-08-23 10:00:30

google左，sqlite 會快啲，但只可以得一個寫，多個讀，亦有機會寫野會lock曬個db file，但暫時已足夠，部機唔夠力起docker。

諗住每一個要write/下載嘅program起一個db file。放nas每日凌晨下載data(write),之後用cronjob 出tg/email alert。

尸廿山女田卜 2021-08-23 15:52:19

想請教有冇方法screen 到邊隻股票喺
-MACD 圖內出現黃金或死亡交叉？
- stoch rsi 圖內出現超買超賣等情況？
- 頭肩頂
有乜module 可以出到相關股票分析數據及圖表？
如果有url reference 就最好

希望有大師出手

第 1 頁第 2 頁第 3 頁第 4 頁第 5 頁第 6 頁第 7 頁第 8 頁第 9 頁第 10 頁第 11 頁第 12 頁第 13 頁第 14 頁第 15 頁第 16 頁第 17 頁第 18 頁第 19 頁

吹水台自選台熱　門最　新手機台時事台政事台 World 體育台娛樂台動漫台 Apps台遊戲台影視台講故台健康台感情台家庭台潮流台美容台上班台財經台房屋台飲食台旅遊台學術台校園台汽車台音樂台創意台硬件台電器台攝影台玩具台寵物台軟件台活動台電訊台直播台站務台黑　洞