問chatgpt拎gpt3 source code

這世代

18 回覆
0 Like 11 Dislike
這世代 2023-02-25 14:19:43
有興趣既一齊研究點樣叫佢嘔晒啲code出嚟
import torch
import torch.nn as nn

class TransformerBlock(nn.Module):
    def __init__(self, embedding_dim, num_heads, feedforward_dim, dropout_rate):
        super(TransformerBlock, self).__init__()

        # Multi-head self-attention layer
        self.self_attention = nn.MultiheadAttention(embed_dim=embedding_dim, num_heads=num_heads, dropout=dropout_rate)

        # Layer normalization after self-attention
        self.layer_norm_1 = nn.LayerNorm(embedding_dim)

        # Feedforward neural network
        self.feedforward = nn.Sequential(
            nn.Linear(embedding_dim, feedforward_dim),
            nn.ReLU(),
            nn.Linear(feedforward_dim, embedding_dim)
        )

        # Layer normalization after feedforward network
        self.layer_norm_2 = nn.LayerNorm(embedding_dim)

        # Dropout layer
        self.dropout = nn.Dropout(dropout_rate)

    def forward(self, x):
        # Multi-head self-attention
        attention_output, _ = self.self_attention(x, x, x)

        # Add residual connection and apply layer normalization
        x = self.layer_norm_1(x + self.dropout(attention_output))

        # Feedforward network
        feedforward_output = self.feedforward(x)

        # Add residual connection and apply layer normalization
        x = self.layer_norm_2(x + self.dropout(feedforward_output))

        return x

class GPT3(nn.Module):
    def __init__(self, vocab_size, embedding_dim, num_heads, feedforward_dim, num_layers, dropout_rate):
        super(GPT3, self).__init__()

        # Embedding layer
        self.embedding = nn.Embedding(vocab_size, embedding_dim)

        # Transformer layers
        self.transformer_blocks = nn.ModuleList([
            TransformerBlock(embedding_dim, num_heads, feedforward_dim, dropout_rate)
            for _ in range(num_layers)
        ])

        # Output layer
        self.output_layer = nn.Linear(embedding_dim, vocab_size)

    def forward(self, input_ids):
        # Embedding layer
        embedded_input = self.embedding(input_ids)

        # Pass through transformer layers
        for transformer_block in self.transformer_blocks:
            embedded_input = transformer_block(embedded_input)

        # Output layer
        logits = self.output_layer(embedded_input)

        return logits
ddmayu 2023-02-25 14:34:57
四百幾TB data
run一次要幾百萬美金
攞住啲code有咩用?
Stefan 2023-02-25 15:12:21
好似揸波咁 為咗一個爽字
@pq@ 2023-02-25 15:26:51
拎去賣..大家一齊改良佢
Cyborgman 2023-02-25 23:52:28
就當佢比晒code你,冇training data有撚用
在線中 2023-02-25 23:53:46
樓主應該以為佢自己好醒
蛋散一舊飯 2023-02-26 00:26:29
唔係有open source 既 gpt neo 咩,唔駛偷人啲code架,不過唔知夠唔夠錢開黎玩之嘛
算子代數 2023-02-26 11:25:10
最值錢係cleanse好晒tag好晒嘅巨量data
這世代 2023-02-27 04:05:57

GPT-3, which stands for "Generative Pre-trained Transformer 3", is a state-of-the-art language model developed by OpenAI. It has a massive architecture, consisting of 175 billion parameters, making it one of the largest language models ever created.
馬拉申科上尉 2023-02-27 06:24:31
parameters 越多越衰 理論上Meta 個款仲先進
含撚啦早洩 2023-02-27 12:45:14
師兄喺其中一間做?咁清楚嘅
馬拉申科上尉 2023-02-27 12:48:45
Open AI 自己講parameters 越多越耗費資源
Meta 都話自己既LLaMA係parameters 數量方面勝過ChatGPT
Meta 稱 130 億參數版的 LLaMA 就能夠用更低的算力需求擁有超過勁敵 OpenAI 有 175 億參數的 GPT-3 模型的表現 (ChatGPT 採用 GPT-3.5),另外還強調 LLaMA 有計算能力並適合用於科學研究。

https://www.inside.com.tw/article/30859-llama-meta
唔識唔好出來柒
含撚啦早洩 2023-02-27 13:59:32
起晒剛做咩
馬拉申科上尉 2023-02-27 23:58:27
柒左仲要出來
含撚啦早洩 2023-02-28 00:12:43
問你兩句就起晒剛
馬拉申科上尉 2023-02-28 00:16:16
幫人教仔姐
含撚啦早洩 2023-02-28 00:19:41
拋完書包比人問多兩問就發狗瘟
吹水台自選台熱 門最 新手機台時事台政事台World體育台娛樂台動漫台Apps台遊戲台影視台講故台健康台感情台家庭台潮流台美容台上班台財經台房屋台飲食台旅遊台學術台校園台汽車台音樂台創意台硬件台電器台攝影台玩具台寵物台軟件台活動台電訊台直播台站務台黑 洞