I’ll stroll you thru making a next-word prediction mannequin utilizing a Transformer.
import tensorflow as tf
from tensorflow.keras.layers import Embedding, LSTM, Dense
from tensorflow.keras.preprocessing.textual content import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np
Rationalization:
tensorflow
→ Deep studying library.Embedding
→ Converts phrases into numerical vectors.LSTM
→ Lengthy Brief-Time period Reminiscence, used to recollect sequences.Dense
→ Totally related layer for output.Tokenizer
→ Converts textual content into tokens (numbers).pad_sequences
→ Ensures sequences have the identical size.
textual content = """The short brown fox jumps over the lazy canine The short brown fox could be very quick"""
tokenizer = Tokenizer()
tokenizer.fit_on_texts([text])
total_words = len(tokenizer.word_index) + 1 # Including 1 for padding
Rationalization:
- We use a pattern textual content.
- The
Tokenizer
assigns a novel quantity to every phrase. word_index
shops word-to-number mappings.total_words
shops the vocabulary measurement.
input_sequences = []
for line in textual content.cut up("."): # Splitting sentences (for actual datasets, use full textual content)
token_list = tokenizer.texts_to_sequences([line])[0] # Convert phrases to numbers
for i in vary(1, len(token_list)):
n_gram_sequence = token_list[:i+1] # Creating n-gram sequences
input_sequences.append(n_gram_sequence)
# Padding sequences to make them the identical size
max_seq_length = max(len(seq) for seq in input_sequences)
input_sequences = pad_sequences(input_sequences, maxlen=max_seq_length, padding='pre')X, y = input_sequences[:, :-1], input_sequences[:, -1] # Splitting into inputs and labels
y = tf.keras.utils.to_categorical(y, num_classes=total_words) # Convert to categorical
Rationalization:
- We convert textual content into n-gram sequences (progressively longer phrases).
- Instance:
"The short brown"
→[2, 3, 4]
- Sequences are padded to make sure equal lengths.
X
(options) accommodates phrases earlier than the final.y
(label) is the subsequent phrase to foretell.
from tensorflow.keras.layers import MultiHeadAttention, LayerNormalization, Dropout
class TransformerBlock(tf.keras.layers.Layer):
def __init__(self, embed_dim, num_heads, ff_dim, fee=0.1):
tremendous(TransformerBlock, self).__init__()
self.att = MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)
self.ffn = tf.keras.Sequential([
Dense(ff_dim, activation="relu"),
Dense(embed_dim),
])
self.layernorm1 = LayerNormalization(epsilon=1e-6)
self.layernorm2 = LayerNormalization(epsilon=1e-6)
self.dropout1 = Dropout(fee)
self.dropout2 = Dropout(fee) def name(self, inputs, coaching):
attn_output = self.att(inputs, inputs)
attn_output = self.dropout1(attn_output, coaching=coaching)
out1 = self.layernorm1(inputs + attn_output)
ffn_output = self.ffn(out1)
ffn_output = self.dropout2(ffn_output, coaching=coaching)
return self.layernorm2(out1 + ffn_output)
Rationalization:
- That is the Transformer Encoder Block.
MultiHeadAttention
permits the mannequin to deal with completely different phrases.LayerNormalization
ensures stability.Dropout
prevents overfitting.- The mannequin provides consideration outputs again to the enter to study relationships.
embed_dim = 64 # Phrase embedding measurement
num_heads = 2 # Variety of consideration heads
ff_dim = 128 # Hidden layer measurement
inputs = tf.keras.layers.Enter(form=(max_seq_length-1,))
embedding_layer = Embedding(total_words, embed_dim, input_length=max_seq_length-1)(inputs)
transformer_block = TransformerBlock(embed_dim, num_heads, ff_dim)(embedding_layer)
flatten = tf.keras.layers.Flatten()(transformer_block)
output = Dense(total_words, activation="softmax")(flatten)mannequin = tf.keras.Mannequin(inputs=inputs, outputs=output)
mannequin.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
mannequin.abstract()
Rationalization:
Embedding
converts phrases into dense vectors.TransformerBlock
processes sequences.Flatten
converts multi-dimensional knowledge right into a single vector.Dense
output layer predicts the subsequent phrase.softmax
provides chance scores for all phrases.
mannequin.match(X, y, epochs=50, verbose=1)
Rationalization:
- We prepare the mannequin utilizing
categorical_crossentropy
loss. epochs=50
runs the coaching 50 instances.
def predict_next_word(seed_text, tokenizer, max_seq_length, mannequin):
token_list = tokenizer.texts_to_sequences([seed_text])[0]
token_list = pad_sequences([token_list], maxlen=max_seq_length-1, padding='pre')
predicted_probs = mannequin.predict(token_list)
predicted_word_index = np.argmax(predicted_probs)for phrase, index in tokenizer.word_index.gadgets():
if index == predicted_word_index:
return phrase
return ""
# Instance Utilization:
seed_text = "The short brown"
next_word = predict_next_word(seed_text, tokenizer, max_seq_length, mannequin)
print(f"Predicted subsequent phrase: {next_word}")
Rationalization:
- Converts enter seed textual content into tokens.
- Pads it to match coaching sequence size.
- Predicts the phrase with the best chance.
- Converts predicted index again to a phrase.
If the coaching was efficient, operating:
print(predict_next_word("The short brown", tokenizer, max_seq_length, mannequin))
May output:
Predicted subsequent phrase: fox
- Preprocess textual content → Tokenization, sequences, padding.
- Construct Transformer mannequin → Embeddings, consideration, dense layers.
- Prepare mannequin → Predicts subsequent phrases.
- Generate predictions → Makes use of educated weights to generate textual content.