Data Science之你問我地答

用戶1

680 回覆
30 Like 4 Dislike
利維亞的傑佬 2020-05-16 22:24:34
唔係架,我online sales/ marketing都好大應用,customer analytic本身數據多,條水食得幾應

大家幻想既finance field (security pricing)果d都有,但我覺得佢用ML做呢類野既基本上同傳統finance既skill未必太關事

始終傳統finance果d model係deductive model,即係我覺得股價同股息有關,同長遠growth有關,然後砌條好simple 好靚既formula出黎
但ML係stat係inductive,本身對個真正formula既assumption好少
不過d fund house到底依家係香港用幾多ML我就唔太知,大部份buy side mutual fund都係做fundamental,連quant 唔扯唔上
hedge fund就應該多d用,不過講真,真係做得好就自己炒左,唔駛幫人炒

入銀行既話,學左果d finance knowledge都無咩用,最用得多就係你知唔同product係做緊咩同背後點運作。
say咩係基金,債券。
始終銀行主要都係做sell product/loan。
我自己pass左cfa,覺得最有用都係佢level 1 d general concepts

但回應番頂置左果篇
知識上無咩大幫助,但可能幫到你過hr果關
呢樣野都值得你去投資
人在連登身不由己 2020-05-16 22:32:35
長知識了,唔該巴打
fullmetal 2020-05-16 23:04:39
咁Stat加少少programming都夠?
1729 2020-05-16 23:28:21
Neural network主要係唔知點解performance會咁好
其中兩個consider佢既justification係:
1. 模仿腦
2. Universal approximation theorem (只要network夠大,可以approximate任何continuous functions with compact support to arbitrary precision)

成個field死過一段時間因為D人發現如果你得2層既話
雖然上面第2點仲係啱
但要exponentially large既network
近年復活其中一個原因係computational resource勁左好多data又多
可以train到更大更deep既network
再加上有人show係例如classification方面outperform晒其他方法

一般黎講個model太多parameters你會擔心overfitting
會generalize (performance on test data) 得唔好
deep learning一個好大既mystery係parameter越多反而越好,at least experimentally
有人study theory去解釋,發現over-parameterized似乎連對training時要做既optimization都有幫助

不過總之以current knowledge黎講:
it works but we don't know why
用戶1 2020-05-17 00:25:51
我對越多parameters個performance越好有保留,我認為deep learning如果data唔夠多會更容易overfit,performance反而更差,因為model complexity太高。所以kaggle既tabular類似problem set好多人都係用boosting/ensemble等techniques多過neural network。而非tabular既problem, 例如computer vision等,越deep既network越好就似乎係真

BTW都有人用adversarial既方法去分柝做image既CNN點work,每個layer其實係做緊咩,幾有趣。可能因為係做image, 所以比較容易做依啲explanation visualization
"How to visualize convolutional features in 40 lines of code"
https://towardsdatascience.com/how-to-visualize-convolutional-features-in-40-lines-of-code-70b7d87b0030
DeepLearn 2020-05-17 00:32:15
High frequency 講求速度,天下武功維快不破。
Nasdaq整左條幾里長的電纜,就係因為JP/GS/MS既prop team已經將server搬到NAsdaq對面街,追求microsecond-nanosecond既優勢。為求公平,所以整咁長電線,用物理隔絕(光速)來限制。


AI in investment 係兩回事
1729 2020-05-17 00:39:50
係, 我諗data夠多係前提
我refer緊個d experiment係有人做,
你train兩個network, 兩個都夠多parameter
both training error係0, perfectly fit the training data
但係performance on test set反而係大個個好d
theory就例如呢d paperhttps://arxiv.org/pdf/1811.04918.pdf
其實我都冇仔細睇, 不過聽過d talk講
以我理解大約就係話當個network夠大夠多parameter既時候
d local minima有特別既property令到你用SGD去optimize既時候
會易d converge to "good" local minimum (those that generalize well)
用戶1 2020-05-17 00:52:11
我諗High frequency最多用係arbitrage上面,依到一定係越快越好,聽講香港啲data center都係搬近HKEX。AI可以apply係arbitrage,不過應該算係兩個domain
DeepLearn 2020-05-17 00:52:38
deeplearning 精髓係唔使唸。咩都掉落去計就好。要幾多layer?well 睇下你有幾多張GA100同幾耐時間。有好多paper嘗試理解,但係我覺得係唔work,準確來講係唔precise。如果你可以理解,證明你唔需要咁複雜既deep learning model。

CNN 可以预测three body equilibrium state,但係點解無人知。


有時都要跳出框框,例如genetic algo + neat可以做到 MNiST 99% +,就算deep learning都好難match
fullmetal 2020-05-17 01:26:57
以為IEX(收咗皮)先係咁,原來Nasdaq都有做
fullmetal 2020-05-17 01:28:25
所以中環啲租咁貴
用戶1 2020-05-17 01:31:07
HKEX data center 好似係將軍澳工業村~ 依加果到有好多data center
1729 2020-05-17 02:16:06
我諗佢做到physics d prediction都make sense既
純綷因為NN本身可以approximate到任何continuous functions with compact support
所以理論上只要你input-output relation係deterministic (which is the case for classical mechanics, even it can be chaotic) and continuous
都存在一個NN可以predict到
主要問題係要個network幾大同幾多data
同埋點解用佢個algorithm去train會搵到such a approximator姐

btw好奇問下有冇人試過quantum system得唔得?
用戶1 2020-05-17 02:23:37
其實啲activation function都係discontinuous, 理論上連所有discontinuous 既problem set應該都存在一個NN可以approximate到?
1729 2020-05-17 02:24:26
Btw你提到genetic algo + neat個example有冇link?
就咁google "genetic algo neat MNiST"唔知邊d先係
1729 2020-05-17 02:35:19
一舨activation只係nonlinear但都仲係continuous? (nonlinearity is necessary for deep NN since otherwise the layers collapse)
其實就算唔係continuous, 唔太奇怪既都可以搵continuous function approximate
神崎次郎 2020-05-17 02:45:20
成日覺得ds同p仔只係一線之差
DeepLearn 2020-05-17 02:45:57
但係宇宙本身就係non-determinstic,因為符合quantum theory instead of classic theory. 故此three body 只能approximate,不能determine。

quantum physic我都唔係好識,但係如果有日可以simulate到,即係可以解決同永遠預測呢D無解的physics simulation。


仲有就係自身問題(inception哈哈),classic physic情況下,可以simulate整個宇宙down to atomic level,但係quantum simulator要係要simulate埋自己,仲要simulate自己simulate緊個宇宙,infinite recursion。

靠下個Einstein了
1729 2020-05-17 03:01:16
個情況係classic mechanics既theory本身係deterministic
nonlinear dynamics可以有chaotic behavior, 但係仍然係deterministic
只係要d variables既precision夠高個prediction先會準
(當然可以relate去quantum, by uncertainty principle冇可能可以無限準所以unpredictable)
而我相信d人做實驗都只係用simulation搵結果
再同用NN predict出黎既做比較
呢個情況之下其實只係試緊個NN可以唔以mimic到個simulation

btw好奇quantum system有冇人做係因為quantum computing其中一個motivation就係simulate quantum system
Classically都simulate到(solve Schrodinger equation numerically?)
所以如果NN做到quantum我諗係好impressive
神崎次郎 2020-05-17 03:07:02
利維亞的傑佬 2020-05-17 03:19:39
DeepLearn 2020-05-17 03:33:40
哈哈,識野。

建議你睇下Open AI on EvolutionStrategy同Uber Lab既paper on genetic algorithms (好似係2017年,太夜我唔想搵了...搵唔到或者唔明先問我啦)。


GA/ES我覺得係好elegant,是可能SGD DL的alternative, 因為佢跳出gradient decent框框,避免local minimum。

然後就係Open AI 的Neuro-evolution,因為我係讀neuroscience的,所以果時真係好鍾意。Neat 可以evolution整個topology,也可以改變weight。簡單來說,可以在進化和mutate過程中hidden某D note,也可以改變connection,似足人腦。

幾年前我係一個Google Brain講座聽Francois Chollet講(佢係Keras創始人):除左DL佢覺得GA/ES都好promising。
1729 2020-05-17 03:37:59
DeepLearn 2020-05-17 03:43:20
最impactful果份應該係Tim Salisman既,好似早幾個月
DeepLearn 2020-05-17 03:45:45
吹水台自選台熱 門最 新手機台時事台政事台World體育台娛樂台動漫台Apps台遊戲台影視台講故台健康台感情台家庭台潮流台美容台上班台財經台房屋台飲食台旅遊台學術台校園台汽車台音樂台創意台硬件台電器台攝影台玩具台寵物台軟件台活動台電訊台直播台站務台黑 洞