Designing a URL Shortening Service Like TinyURL
Let’s design a URL-shortening service like TinyURL. This service provides short aliases redirecting to long URLs.
Similar Services: bit.ly, ow.ly, short.io
Difficulty Level: Easy
Why Do We Need URL Shortening?
URL shortening creates shorter aliases for long URLs, known as “short links.” When users click these short links, they are redirected to the original URL. Short links save space when displayed, printed, messaged, or tweeted, and reduce the likelihood of mistyping URLs.
Example:
Original URL: https://www.educative.io/courses/grokking-the-system-design-interview/m2ygV4E81AR
Shortened URL: https://tinyurl.com/4nb2wpme
Short links are about one-third the size of the original URL, optimizing links across devices, tracking individual links, measuring ad campaign performance, or hiding affiliate URLs.
If you haven’t used tinyurl.com before, try creating a new shortened URL and explore their options to understand this chapter better.
Requirements and Goals of the System
Functional Requirements:
- Generate a short, unique alias (short link) for a given URL.
- Redirect users to the original URL when accessing a short link.
- Allow users to pick custom short links.
- Provide a default expiration time for links, with user-specified expiration options.
Non-Functional Requirements:
- Ensure high availability to avoid failing URL redirections.
- Achieve real-time redirection with minimal latency.
- Make short links non-guessable.
Extended Requirements:
- Track redirection analytics (e.g., number of redirects).
- Offer REST APIs for service access by other applications.
Capacity Estimation and Constraints
Read-heavy System: URL shortening services experience significantly more redirection requests compared to new URL shortening requests. Expect a 100:1 read-to-write ratio.
Traffic Estimation: Assuming 500 million new URL shortenings per month with a 100:1 read/write ratio, we can expect 50 billion redirections during the same period.
- 500 million new URL shortenings per month.
- 50 billion redirections per month (100 * 500 million).
Queries Per Second (QPS):
- New URLs: ~200 URLs/s
- URL redirections: ~20K/s
Storage Estimates:
- 500 million new URLs per month for 5 years = 30 billion objects.
- 500 bytes per object → 15TB total storage.
Bandwidth Estimates:
- Write requests: Assuming 200 new URLs every second, total incoming data for our service will be 100KB per second.
- 200 * 500 bytes = 100 KB/s
- Read requests: Since we expect ~20K URL redirections every second, total outgoing data for our service would be 10MB per second.
- 20K * 500 bytes = ~10 MB/s
Memory Estimates: Caching frequently accessed URLs can improve performance. Let’s assume the 80–20 rule applies, meaning 20% of URLs generate 80% of traffic. We want to cache these 20% hot URLs.
- Daily requests: 20K requests/second * 3600 seconds/hour * 24 hours/day = ~1.7 billion requests/day
- Memory for cache: Assuming 20% of requests are cached, we’ll need 170GB of memory.
- 0.2 * 1.7 billion * 500 bytes = ~170GB
System APIs
APIs (Application Programming Interfaces) define how external services can interact with our URL shortening service. We can use SOAP or REST APIs to expose functionalities such as creating and deleting URLs.
API Definitions:
createURL(api_dev_key, original_url, custom_alias=None, user_name=None, expire_date=None)
deleteURL(api_dev_key, url_key)
Parameters:
api_dev_key
: A unique identifier for a registered account used for throttling user requests based on allocated quotas.original_url
: The original URL to be shortened.custom_alias
: (Optional) Allows users to create a personalized short alias for the URL.user_name
: (Optional) Username associated with the shortened URL.expire_date
: (Optional) Sets an expiration date for the shortened URL.
Where:
url_key
: The shortened URL string to be deleted.- A successful deletion returns “URL Removed”.
Prevent abuse by limiting URL creation and redirection per api_dev_key.
Database Design
Data Observations:
- Billions of records need to be stored.
- Each record is relatively small (less than 1KB).
- No relationships exist between records (except for user-created URLs).
- Read-heavy system.
Database Choice:
Given the massive data volume and lack of relational needs, a NoSQL database like DynamoDB, Cassandra, or Riak is a better choice compared to a traditional SQL database. NoSQL databases also scale more efficiently.
Database Schema:
- Two tables are needed:
- URL Mapping Table: Stores information about URL mappings.
- User Data Table: Stores details about users who created short links (optional).
Basic System Design and Algorithm
Challenge: Generating unique and short keys for URLs.
Solution:
Short Key Generation:
- Encoding Actual URL: Hash the URL (MD5/SHA256), then encode (base36/base62/base64). Use first 6–8 characters to avoid collisions.
- Generating Keys Offline: Use a Key Generation Service (KGS) to pre-generate and store unique keys.
Concurrency Issues:
- Ensure unique key assignment by marking keys used in the database.
Key-DB Size: Base64 encoding allows for 68.7 billion unique six-letter keys (412GB storage).
Handling Failures: Use standby KGS replicas and cache keys on app servers for quick access.
Key Lookup: Perform DB lookup for full URL and issue HTTP 302 Redirect or 404 Not Found.
Custom Aliases: Impose a size limit (e.g., 16 characters) for consistency.
Data Partitioning and Replication
Partitioning Schemes:
- Range-Based Partitioning: Store URLs by the first letter of the hash key.
- Hash-Based Partitioning: Hash the key to determine the partition, using Consistent Hashing to avoid overload.
Cache
Caching Frequently Accessed URLs:
- Use Memcached to store URLs and hashes.
- Start with 20% of daily traffic (170GB memory).
- Use LRU eviction policy to manage cache.
Load Balancer (LB)
Load Balancing:
- Place LB between clients and application servers, application servers and DB servers, and application servers and cache servers.
- Use Round Robin or a more intelligent LB to adjust traffic based on server load.
DB Cleanup
Expired Links:
- Perform lazy cleanup to avoid DB pressure.
- Delete links on access or run periodic Cleanup service during low traffic.
Tracking and Monitoring:
Track Statistics:
- Monitor short URL usage, user locations, and access details.
- Store stats in a DB row to handle high traffic.
Security and Permissions
Private URLs:
- Store permission levels (public/private) with each URL.
- Create a table for UserIDs with access permissions.
- Return HTTP 401 for unauthorized access attempts.
By this article, you have an idea of a high-level design of URL shortening service.