我嘗試理解你個app做一隻multi-agent rag
簡單黎講寫個frontend 駁住隻aggregator ai call唔同市面上既models
技術上,基本條件:
1)iphone內置語音做i/o,
2)書要bring your own books - from user local book stores,
3)support content to text唔同種類既書例如pdf epub,laama index可以有pdf extractor,做page chunking
4)planning endpoint
5)chat endpoint 做generation,要 gen summarisation 、q&a
6)最後audio to speech,例如
- speechify api: end user 11usd一個月; api $10/1M characters
https://speechify.com/pricing-api/
- google gtts: Google Text-to-Speech: api 唔同選擇大概 $16/1M characters
https://cloud.google.com/text-to-speech/pricing
我覺得personal project太大諗頭,做到個product for fun 唔難,但你太多環節要做integration或者minimal development,我覺得成個thought of chain token cost/development effort唔justify個business value。
需求上:
一)audiobook 分google聲同真人聲, 起碼apple built in已經有機械聲,我諗你起碼都要用真人先吸引到客。
二)同類型產品,好聽audiobook product除左amazon既會用speechify,end users 11usd一個月,你一係做得好過佢,一係平過佢。
三)產業上,同埋audiobook/rag賺錢係要靠economies of scale,我transform/index好一本書,我可以儲落自己vector db serve曬所有client。你如果in-flight做會有performance issue同埋貴。
四)關於「全程語音控制」feature,你要靠個chat endpoint幫你refine content,用戶體驗唔一定會好,我寧願用手禁一個100%執行到既order,多過落幾次語音,我都係覺得無咩values。
五)關於「總結重點」、「出問題」feature,我覺得個use cases無咩values,無咩users會用,但你要做summarization 個endpoint要寫好多token cost,而出黎既問答又唔一定準。
我條team自己有做AI野,有搵big4領意見。總結係我覺得ai generation 唔難上到野,但好難件product做得好。
我地architectural instruction其實都只係當ai generation既use case係productivity improvement. 簡單黎講玩具黎