This blog post is a coding and system design interview cookbook/cheatsheet compiled by me with code pieces written by me as well. There is no particular structure nor do I pretend this to be a comprehensive guide. Also there are no explanations as the purpose is just to list important things. Moreover, this blog post is work-in-progress as I’m planning to add more things to it.
As always, opinions are mine.
Binary Search
Breadth First Search
Depth First Search
Just replace the queue in BFS with stack and you are done:
Backtracking
Generally a recursive algorithm attempting candidates and either accepting them or rejecting (“backtracking”) based on some condition. Below is an example of permutations generation using backtracking:
Dynamic Programming
Technique to determine result using previous results. You should be able to reason about your solution recursively and be able to deduce dp[i] using dp[i-1]. Generally you would have something like this:
Dynamic Programming: Knapsack
Union Find
When you need to figure out if something belongs to the same set you can use the following:
Binary Tree Traversals
Inorder: left-root-right
Postorder: left-rigth-root
Preorder: root-left-right
Plus each one of them could have reverse for right and left
Below are recursive and iterative versions of inorder traversal
Segment Tree
Djikstra
Simplest way to remember Djikstra is to imagine it as BFS over PriorityQueue instead of normal Queue, where values in queue are best distances to nodes pending processing.
Multi-threading: Semaphore
Multi-threading: Runnable & synchronized
Strings: KMP
Hopefully no one asks you this one. It would be too much code here, but the idea is not that difficult: build LPS (example “AAABAAA” -> [0,1,2,0,1,2,3]); use LPS to skip when comparing as if you were doing it in brute-force.
Strings: Rolling Hash (Rabin-Karp)
Let’s imagine we have window of length k and string s, then the hash for the first window will be:
H(k) = s[0]*P^(k-1) + s[1]*P^(k-2) + … + s[k]*P^0
Moving the window by one, would be O(1) operation:
These are notes I took while reading “System Design Interview – An insider’s guide”:
Chapter 1: Scale From Zero To Millions Of Users
All of the most basic concepts introduced of a software design of a typical web application:
Client/Server communication and DNS
Load balancing and introduction of multiple servers
Database replication with single master and multiple slaves and also later on database scaling
Caching at the front of a database
Content delivery Network (CDN) use for static contentÂ
Stateless vs stateful server with clear preference for stateless
Datacenters for more scale and for better reliability
message queues for async operations and reducing spikes in load
all of the auxiliary considerations for large system: logging, metrics, monitoring,Â
Chapter 2: Back-of-the-envelope Estimation
This is not something I have top of my mind when thinking about system design, but I guess it is important in the context of the interview to understand the scale of the system you are building.
Some number:
Transatlantic network roundtrip: Approximately 150ms
Reading 1MB from memory: Around 250 microseconds
Reading 1MB from network: Roughly 10ms
Reading 1MB from disk: Typically 30ms
Processor thread synchronization (mutex): As little as 100 nanoseconds
Chapter 3: A Framework For System Design Interviews
understand the problem and establish design scope
propose high level design and get buy-in
Design deep dive: work with interviewer to understand what they would like you to dive into, also don’t be shy to propose your own area for deep dive
wrap up: you can reiterate over the design, mention other considerations you could have spoken about
Chapter 4: Design A Rate Limiter
using the framework from chapter 3 goes over designing rate limiter
explains few different algorithms to rate limiter:
token bucket – simple, used by Amazon, imagine a bucket, there is re-filler that just adds tokens periodically, if bucket is full tokens fall-out, every request takes out one token, if there are no tokens at given time request is rejected
leaking bucket – FIFO
fixed window counter
sliding window log (expensive on memory as we need timestamps)
sliding window counter
Important to send back HTTP 429 with X-Ratelimit-Retry-After
Talks about issues of using counters in distributed environment and how Redis distributed cache can be used as a solutionÂ
More advanced rate limiting techniques are only mentioned, like limiting in network Layer 3 IPTables, not just layer7 HTTP
Chapter 5: Design Consistent Hashing
Explains hash space and hash ring – we place all n servers into a ring and then once we have hash key we find the server that has it on the ring by going clockwise
Explains an issue with adding or removing servers (all data has to be copied over)
Introduces concept of virtual nodes, this is same as used by Amazon Dynamo – one physical server owns multiple virtual nodes, this archives more balanced distribution of keys (we don’t end in situation when one servers owns majority of keys)
Chapter 6: Design A Key-value Store
This is basically copy-paste-simplify from Amazon Dynamo WP but explained in more accessible way
Additionally starts with introducing the CAP Theorem
Additionally mentions:
gossip protocol – each server has heartbeat info about other servers and periodically “gossips” about it with other servers, once they see one of them is not really getting fresh heartbeats it is declared “dead”
hinted handoff – process of communicating to the server that is known to be healthy and then once the server the data was intended for is back online the data is transferred to where it belongs
Merkle tree is used to handle more permanent failures to reduce amount of data that has to be synchronized (we compare trees on two servers)
Chapter 7: Design A Unique Id Generator In Distributed Systems
Discusses how difficult it is to generate unique id in distributed environment if there are certain constrains, for instance we want ids to increment with time.
Twitter snowflake is 64 bit id: 0 + 41bit timestamp + 5bit datacenter id + 5 bit machine id + 12 bits sequence number (restarted every ms)
One core assumption of this is that we have distributed clock, which was outside of the scope of this chapter
Chapter 8: Design A Url Shortener
Well, we want a HashMap, but we want it distributed and persistent.
Important to try to estimate. For instance we can calculate we need 36TB of space for 10 years with 100million URLs written per day. We can also estimate that we need Hash value length of 7 [0-9, a-z, A-Z] for storing those 365 billion records
Discussed different hashing algorithms, CRC32, MD5, SHA-I and the issue that if we use them and then map into 7 value length we will run into collisions, thus we need to come up with technique where we first look into cache/db if the key is already present and if so, we need to re-hash
Base62 conversion is another potential method to use. Basically we take sequential integer number (base 10, obviously) and convert to base 62, we end up with things like: 11157 => “2TX”
URL redirect has to be 301 (permanent redirect) or 302 (temporary redirect). Temporary could be useful if we want analytics. Obviously for reducing load we might prefer to go with 301.
Chapter 9: Design A Web Crawler
Other than Google and other search engines crawling the internet there are numerous web archives that attempt to archive the internet. Web mining (news, knowledge) and web monitoring (copyrights, trademarks) are two other use-cases.
Interesting and important considerations:
content seen?
Caching of DNS (yes ! Roundtripping to figure out IP could take time when we need to visit the same site 1000s of times, each DNS could be (10-200ms), so we store a map site->IP on our own.
Blocklisting (sensitive content, etc)
DFS isn’t the best choice as we will get stuck on the same website for too long
BFS works for the most part but we still need to redistribute so that we don’t spend too much processing on a single site (like wikipedia)
“Politeness” – yeah, we don’t want to aggressively bombarde the website. Funnily enough I have a personal story of a website of a metal band when they said they hid some nugget presents on their website, so I wrote a web crawler for their site and left it running all night. Next day the site was down. I probably wasn’t the only “smart” guy listening to them. LOL.
Priority – we need to distinguish between important sites and not so important and also we need to think about frequency of updates of some sites (news)
Robots.txt – Robots Exclusion Protocol is the way of websites to tell robots how they should behave 🙂
Geography. Let crawlers located physically close to the website host crawl there.
Spider traps – yeap. There are intentional and unintentional ones.
Server side rendering. Since modern web often uses client side JS scripts to produce HTML we need to load the page and render before parsing.
Chapter 10: Design A Notification System
Notifications: SMS, email, push-notification. Is there more?
For iOS we need to use Apple Push Notification Service. For Android Firebase Cloud Messageing is often used. For SMS Twilio, Nexmo are used. For email Sendgrid and Mailchimp.
Normally a notification system is a component in a large system and services that perform some operations interact with it so that notifications are sent: [Services: 1-N]->[Notification System]->[ANPs, FMC, SMS, etc]->[User]
We introduce queue per channel and workers consuming messages from the queue before they connect to 3rd party services.
Message queues help with spike in volume of messages and also isolate different 3rd party dependencies (queue for SMS could be full but other queues get emptied)
We need to add a notification log so we have record of what was sent to 3rd party.
A key metric for monitoring is # of messages in the queue. Either capacity is not enough or 3rd parties aren’t keeping up/down.
Analytics Service is another component to consider. Data can be sent to it from different points in the lifecycle of the notification (say, user clicked on a link in notification).
Chapter 11: Design A News Feed System
Key to this design is to take posted item (say Facebook post) and then place it into different processing systems, specifically:
post service to properly store the post
notification service to notify “friends” of the poster
fanout service is to propagate the post to subscribers (“friends”)
Fanout service that delivers the news can be implemented in two variations:
on write (push model)
precomputed, fast
expensive if many friends or infrequent use of the app
on read (pull model)
A hybrid approach is usually best (think how Twitter does this for celebrities):Â
push model for regular users so its fast
pull model for subscribers of celebrities
We use CDN for static content (images, videos)
We use caching at multiple levels (news feed cache as people usually only interested in latest, user cache, post cache)
Chapter 12: Design A Chat System
One of the key concepts to know here is how clients connect:
Polling – client periodically asks for data
Long polling – client connects to data and server sends data back once it is available, connection maintained for longer period of time (30sec or something)
WebSocket – bidirectional communication; client establishes the connection but server can then send data. Starts as a HTTP connection and then converts to WebSocket; works on same ports: 80 or 443 so should work everywhere.
All of the authentication, metadata and other API work pretty much as in other systems.
1-1 chat works by two clients connecting to servers using WebSockets. Because one of the users might not be online we use message queue + keyvalue store to store messages and also a push notification is sent to another user about availability of new messages
Facebook processes 60 Billion messages a day.
Message synchronization is another important topic. Can be solved by client devices keeping max message id. Message ids should be sortable and represent a timeline.
For a small group chat we can maintain “inbox” for each user and then we add messages to that inbox from other users
Online presence is another interesting consideration. We shouldn’t be too eager to constantly know the status as network connection might not be stable.
Chapter 13: Design A Search Autocomplete System
This is basically a huge Trie, that is built on periodic basis (say weekly) which is loaded into memory but also persisted to storage.
Each node other than just having counter for occurrences also has cached list of top X results, in this way in order to provide autocomplete we just find the node and immediately have suggestions.
Caching is important in multiple places
clien browser. A header can be set “max-age=3600”. This is exactly what Google does.
Updating TrieNodes on-the-fly is very expensive.
A follow-up would be to support real-time search queries. An idea here could be to give more weight to more recent queries.
we need transcoding servers, which could break video into smaller chunks and encode it to different sizes
Storage for encoded videosÂ
CDN to have videos available for users
completion handler to notify systems of completion of transcoding
Streaming flow
MPEG-DASH, Apple HLS, MS Smooth Streaming, Adobe HTTP Dynamic Streaming are different kinds of streaming protocols over HTTP
Facebook is using directed acyclic graph (DAG) for processing videos. So, the root is the original video, children are: video, audio, metadata, Then children of “video” are: inspection, video transcoding, thumbnails, watermark, ect; Then child of video and audio is the re-assembled video
Considerations:
Cost of CDN (don’t keep long-tail videos in CDN as no-one is watching them)
safety of videos (copyright, sensitive)
Chapter 15: Design Google Drive
Consistency is extremely important here. We cannot lose data either.
Two types of uploading files: resumable for small files and non-resumable for larger files.
Files need to have revisions.
Files need to be split into smaller chunks (Dropbox does 4MB).
On upload we need to update only chunks that changed and can use checksums/hashes to verify what changed.
Sync conflicts have to be handled by a client that was late to the party. This works similarly to have Amazon Dynamo handles this. Client gets two versions of a file and has to arrive at a merged version.
Message database could be a proper relational SQL database. We will need to store users, files, file versions, chunks, devices.
Chapter 16: The Learning Continues
Reference material.
Resources
Well, internet is just full of all kinds of resources and blog posts on how to prepare for the technical interview plus there are numerous books and websites available. LeetCode is probably the most known of practicing websites. “Cracking The Coding Interview” is probably the most known book. I can also recommend “Guide to Competitive Programming”. As of system design, preparation resources are not so readily available. One good resource is “Grokking System Design Interview”, but it would be more valuable to practice designing large distributed systems. I also found the book “Designing Data-Intensive Applications” to be extremely helpful.
Asking for help
What are other “must know” algorithms and important system design topics to be listed in this post?
I was looking for exactly this, short define snippet of code, which explains the pattern.
Leetcode (industry interviewing resource standard), however, has much more patterns to study.
It would be great if all of them or as many as possible are included in this post.
This post has a great deal of describing all the patterns https://emre.me/categories/#coding-patterns
This website uses cookies. We'll assume you're ok with this, but you can opt-out if you wish.AcceptRead More
Privacy & Cookies Policy
Privacy Overview
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
I was looking for exactly this, short define snippet of code, which explains the pattern.
Leetcode (industry interviewing resource standard), however, has much more patterns to study.
It would be great if all of them or as many as possible are included in this post.
This post has a great deal of describing all the patterns
https://emre.me/categories/#coding-patterns