Designing YouTube's Recommendation System: A System Design Perspective

Meetapro Platform

Dec 10, 2024·

System Design

Technical Skills

Designing a system like YouTube’s video recommendation service is a challenging yet rewarding problem often posed in system design interviews. This blog offers a structured approach to tackle this question, highlighting the trade-offs and decisions involved at each stage.

1. Understanding the Problem

Before diving into the solution, it's crucial to clarify the problem scope:

Personalization: Recommendations tailored to users based on watch history, likes, and preferences.
Massive Scale: Must handle billions of users and videos efficiently.
Adaptability: The system should evolve with users’ changing preferences.

2. Functional Requirements

Return Relevant Video Recommendations:

The system should analyze user behavior, preferences, and interactions to provide personalized, high-quality video recommendations.

Data Management:

Store and manage user profiles, including watch history, subscriptions, likes, and demographics.
Maintain video metadata, including titles, descriptions, tags, categories, and upload dates.

User Feedback Integration:

Collect user interactions (e.g., likes, dislikes, watch time, "not interested" feedback) to improve the recommendation engine.

3. Back-of-the-Envelope Calculations

Users: 1 billion. Daily Active Users (DAUs): 500 million
Videos: 10 billion. Average Videos Watched per User per Day: 10
Daily Recommendations: 5 billion (500M users × 10 recommendations each).
Requests per Second:
Average RPS: ~58,000 (5 billion recommendations ÷ 86,400 seconds/day).
Peak RPS: Average RPS x 5 = 290,000 RPS

This highlights the need for a scalable, high-performance system.

4. Non-Functional Requirements

Scalability: The system should handle increasing traffic and data volumes without performance degradation.
Low Latency: The system should be highly available to ensure users always receive recommendations.
Availability: Recommendations should be generated quickly to provide a seamless user experience.
Fault Tolerance: The system should be resilient to failures.

5. API Design

Example APIs:

getUserRecommendations(userId): Fetch recommendations.
recordUserEvent(userId, eventId, videoId): Log user interactions.
updateUserProfile(userId, profileData): Modify user data.

6. Data Model

Entity

Attributes

User

UserID, Watch History, Subscriptions, Preferences

Video

VideoID, Title, Description, Tags, Metadata

Interaction

UserID, VideoID, Event Type (view, like, comment), Timestamp

7. High-Level Architecture

Key components:

Load Balancer: Distributes requests evenly across API servers to ensure high availability and scalability.
Recommendation API: Serves recommendations to clients.
Recommendation Engine:
Candidate Generation: Narrows down billions of videos to a smaller set of candidates.
Ranking: Ranks the candidates using richer features and predicts user engagement.
Online Feature Store:
Stores precomputed and real-time features (e.g., recent interactions, user preferences).
Provides low-latency feature retrieval for the recommendation engine.
Ingestion Service: Processes user events and video metadata, routing the data to storage systems and feature pipelines.
User Profile Store: Stores user data, including watch history, preferences, and subscriptions.
Video Metadata Store: Contains video attributes such as titles, tags, categories, and engagement metrics.
Interaction Data Store: Stores interaction data, such as views, likes, and comments, with a temporal focus for trend analysis.

YouTube recommendation architecture

8. Database Choices

User and Video Metadata: NoSQL (e.g., MongoDB, Cassandra) for scalability and schema flexibility.
Interaction Data: Time-series databases (e.g., InfluxDB) for temporal queries.
Feature Store: Redis for low-latency access to real-time features.

9. Recommendation Process

When a user opens YouTube or navigates to a page where recommendations are displayed, the system initiates a sequence of steps to generate and deliver personalized video suggestions. Here's a detailed breakdown of that process:

1. Request Received:

The user's request for recommendations is received by the Recommendation API. This request typically includes the user's ID and potentially some context information, such as the current page or video being watched.

2. User Data Retrieval:

The Recommendation API retrieves the user's data from the User Profile Store. This data includes the user's watch history, subscriptions, likes, demographics, and any other relevant information.

3. Candidate Generation:

The candidate generation component kicks in, utilizing a deep neural network to narrow down the vast pool of videos to a smaller set of candidates (hundreds).
This network analyzes the user's activity history, video context, and demographics to identify potentially relevant videos.
Collaborative filtering techniques are employed to identify videos watched by similar users.

4. Ranking:

The ranking component takes over, using a more complex neural network to assign a relevance score to each candidate video.
This network considers a richer set of features, including video metadata, user-video relationships, and video freshness.
The ranking algorithm predicts the likelihood of a user clicking and watching a video, prioritizing "expected watch time per impression" to capture user engagement.

5. Filtering and Diversification:

Before presenting the final recommendations, the system may apply filtering rules to remove inappropriate or irrelevant content.
Diversification algorithms may also be applied to ensure a variety of recommendations and avoid creating filter bubbles.

6. Recommendation Delivery:

The final set of ranked and filtered recommendations is sent back to the Recommendation API.
The API then delivers these recommendations to the user interface, where they are displayed on the YouTube homepage, sidebar, or other relevant sections.

7. Feedback Collection:

The system continuously collects user feedback on the recommendations, such as clicks, watch time, likes, dislikes, and "not interested" feedback.
This feedback is used to refine the recommendation algorithms and improve their accuracy over time.

This sequence ensures that YouTube users receive personalized video recommendations that are relevant, engaging, and diverse, contributing to a satisfying viewing experience.

10. Ethical Considerations

Filter Bubbles: Ensure diversity in recommendations.
Harmful Content: Implement content moderation and user feedback mechanisms.

Conclusion

Designing a YouTube video recommendation system is a complex task that requires careful consideration of various factors, from scalability and efficiency to user experience and data management. This article provides a structured approach to tackling this system design interview question from a general system design perspective. By understanding these concepts and practicing with similar problems, you can confidently approach system design interviews and demonstrate your ability to design scalable and efficient systems.

It's important to note that if this question were asked in a Machine Learning System Design interview, the focus would be different. The interviewer would likely delve deeper into the machine learning algorithms, model training, feature engineering, and evaluation metrics specific to recommendation systems. This blog post primarily focuses on the general system design aspects, providing a foundation for understanding the overall architecture and challenges involved in building such a system.

Potential Questions Asked by the Interviewers

How does the Ingestion Service work? What technology does it use?

Functions of the Ingestion Service:

Handling User Events: The Ingestion Service captures a wide range of user events, including video views, likes, dislikes, comments, shares, searches, subscriptions, and more. These events provide valuable insights into user behavior and preferences.
Processing Video Metadata: It processes new video uploads and updates to existing videos, extracting relevant metadata such as titles, descriptions, tags, categories, and thumbnails. This metadata is crucial for content-based filtering and ranking.
Data Validation and Transformation: The Ingestion Service validates the incoming data to ensure its accuracy and consistency. It may also transform the data into a suitable format for storage and processing by other components of the system.
Routing Data: It routes the processed data to the appropriate storage systems. User profiles are stored in the User Profile Store, video metadata in the Video Metadata Store, and interaction data in the time-series database.

Technology Used:

Message Queues: Message queues like Kafka are commonly used to handle the high volume of incoming data asynchronously. This allows the Ingestion Service to decouple from other components and maintain high throughput.
Data Processing Frameworks: Distributed data processing frameworks like Apache Spark or Apache Flink can be used to process and transform the data in parallel, ensuring scalability and efficiency.
APIs and SDKs: YouTube provides APIs and SDKs for uploading videos and associated metadata. The Ingestion Service utilizes these APIs to handle video uploads and extract relevant information.
Real-time Processing: For real-time personalization, the Ingestion Service may employ stream processing technologies like Apache Kafka Streams or Apache Flink to process user events in real-time and update recommendations accordingly.

Why do you choose Redis as the online feature store?

Here's why:

Low Latency: Redis excels at providing extremely fast read and write speeds due to its in-memory data storage. This is crucial for online serving, where recommendations need to be generated quickly to provide a seamless user experience.
Simplicity: Redis's key-value data structure is well-suited for storing and retrieving features efficiently. This simplicity makes it easier to manage and maintain the feature store.
Scalability: While Redis has limited storage capacity compared to persistent databases, it can still handle a significant amount of data. Moreover, it can be scaled horizontally by adding more Redis instances to the cluster, further increasing its capacity.

While persistent databases like Cassandra offer durability and higher storage capacity, they generally have higher latency compared to Redis. In a recommendation system where speed is paramount, Redis provides a better balance of performance and scalability for online feature serving.

However, it's important to acknowledge that a production-ready feature store would likely utilize a combination of Redis for online serving and a persistent database like Cassandra for offline storage and batch processing. This approach combines the low latency of Redis with the durability and scalability of Cassandra, providing a comprehensive solution for managing features in a YouTube recommendation system.

How does the Data Synchronization work from Cassandra to Redis feature store?

1. Change Data Capture (CDC):

Instead of relying on scheduled cron jobs, the system can leverage Change Data Capture (CDC) techniques to capture changes in Cassandra in real-time .
CDC tools, such as Debezium, can monitor Cassandra's commit log or other change logs to identify updates, insertions, and deletions of feature data .

2. Message Queue:

Once a change is captured, it's published to a message queue like Kafka. This decouples the data synchronization process from the feature store and allows for asynchronous processing .

3. Feature Transformation:

A dedicated component, often called a "Feature Transformer," consumes messages from the message queue.
This component retrieves the updated feature data from Cassandra, performs any necessary transformations or aggregations, and formats the data for Redis.

4. Updating Redis:

The Feature Transformer then updates the corresponding entries in the Redis feature store with the transformed feature data.
Redis's high write throughput allows for efficient updates without impacting performance.

5. Near Real-time Synchronization:

This approach enables near real-time synchronization of feature data from Cassandra to Redis.
The latency depends on factors like the CDC mechanism, message queue performance, and feature transformation complexity.

Step-by-Step Example:

A user watches a new video on YouTube.
The Ingestion Service captures this event and updates the user's watch history in Cassandra.
The CDC tool detects this change in Cassandra and publishes a message to Kafka.
The Feature Transformer consumes the message, retrieves the updated watch history from Cassandra, and generates new features (e.g., "videos_watched_last_week").
The Feature Transformer updates the user's feature set in Redis with the new features.
The next time the user requests recommendations, the system retrieves the updated features from Redis, including the recently generated "videos_watched_last_week" feature, to provide more relevant suggestions.

Benefits of this Approach:

Near Real-time Updates: Ensures that the Redis feature store has the latest data for timely recommendations.
Efficiency: Avoids unnecessary polling or scheduled jobs, reducing resource consumption.
Scalability: Handles high volumes of changes in Cassandra without impacting the performance of the recommendation system.
Fault Tolerance: The message queue provides buffering and ensures data delivery even if Redis is temporarily unavailable.

Are Interaction Data stored in the time-series database used directly as features? How will the corresponding feature look like in the Redis online feature store? Can you explain step by step with an example?

While the raw interaction data itself isn't directly stored in Redis, it's transformed into meaningful features that are then stored in the online feature store.

Here's a step-by-step example of how this works, focusing on a user's "like" interaction:

1. User Interaction:

A user watches a video about "Gardening Tips for Beginners" and clicks the "like" button.

2. Capturing the Interaction:

The Ingestion Service captures this "like" event and stores it in the time-series database, along with the timestamp and relevant information (user ID, video ID). [S0T1R]

3. Feature Generation:

Batch Processing: A Spark job runs periodically (e.g., every hour) to process interaction data from the time-series database. It aggregates the "like" events for each user and generates features like:
total_likes_gardening: Total number of videos liked by the user in the "Gardening" category.
recent_likes_gardening: Number of "Gardening" videos liked by the user in the last week.
Real-time Processing: A Kafka Streams application consumes the "like" event from the message queue and updates features in near real-time:
last_liked_category: Updates this feature to "Gardening" for the user.
like_frequency_gardening: Increments the count of "Gardening" likes for the user in the current hour.

4. Feature Storage:

The generated features are stored in both the offline feature store (Cassandra) and the online feature store (Redis).

5. Redis Feature Representation:

In Redis, these features are stored as key-value pairs, where the key is the user ID and the value is a data structure (e.g., a hash) containing the features. Here's how it might look:

Key: user:1234

Value:

{

"total_likes_gardening": "5",

"recent_likes_gardening": "2",

"last_liked_category": "Gardening",

"like_frequency_gardening": "1"

}

6. Recommendation Generation:

When the user requests recommendations, the system retrieves these features from Redis.
The candidate generation model uses features like total_likes_gardening and recent_likes_gardening to identify potentially relevant videos.
The ranking model uses features like last_liked_category and like_frequency_gardening to personalize the ranking of candidate videos, giving higher scores to videos related to "Gardening."

This example illustrates how interaction data is transformed into features and stored in Redis to provide personalized recommendations. The specific features generated and their representation in Redis can vary depending on the design and requirements of the recommendation system.