🔄 Recurrent Neural Networks (RNN)

What is a Recurrent Neural Network?

Unlike feedforward networks, Recurrent Neural Networks (RNNs) have memory! They process sequences by maintaining a hidden state that gets updated at each time step. This allows them to remember previous inputs, making them perfect for tasks involving sequential data like text, speech, and time series.

📚 Key Concepts

Architecture

  • Hidden State: Memory that persists across time steps
  • Recurrent Connection: Output feeds back as input
  • Temporal Dynamics: Process sequences step-by-step
  • Weight Sharing: Same weights used at each time step

How It Works

  • Process one element of sequence at a time
  • Update hidden state with current input
  • Hidden state carries information forward
  • Can handle variable-length sequences

Common Types

  • Vanilla RNN: Basic recurrent architecture
  • Bidirectional RNN: Process forward and backward
  • Deep RNN: Multiple stacked layers
  • GRU: Simplified LSTM variant

Applications

  • Language modeling and text generation
  • Speech recognition
  • Machine translation
  • Time series prediction

🎨 Sequence Processing Visualization

Watch how an RNN processes a sequence word by word

The
cat
sat
?

Hidden state (blue) accumulates information as it processes each word

🔑 Key Insight

The power of RNNs comes from their ability to maintain a "memory" through the hidden state. Each time step, the network considers both the current input AND what it remembers from previous steps. This makes them fundamentally different from feedforward networks that treat each input independently.

🌟 Real-World Example: Text Prediction

When your phone predicts the next word you'll type:

Input: "I love eating"
Step 1: Process "I" → Hidden state remembers subject
Step 2: Process "love" → Remembers positive sentiment
Step 3: Process "eating" → Combines all context
Output: Predict likely next words: "pizza", "ice cream", "sushi"

⚡ How RNNs Process Sequences

1. Initialize: Start with a zero or random hidden state.
2. First Input: Combine input with hidden state to produce new hidden state.
3. Subsequent Inputs: Each new input updates the hidden state, carrying forward information.
4. Output: At each step (or just the final step), produce an output based on hidden state.
5. Training: Use Backpropagation Through Time (BPTT) to learn patterns.

✅ Advantages

  • Can process variable-length sequences
  • Maintains memory of previous inputs
  • Shares parameters across time steps
  • Perfect for sequential data

⚠️ Limitations

  • Vanishing gradient problem
  • Difficulty learning long-term dependencies
  • Sequential processing (slow training)
  • Hard to parallelize
🎮 Play the RNN Game →