This blog post is a coding and system design interview cookbook/cheatsheet compiled by me with code pieces written by me as well. There is no particular structure nor do I pretend this to be a comprehensive guide. Also there are no explanations as the purpose is just to list important things. Moreover, this blog post is work-in-progress as I’m planning to add more things to it.

As always, opinions are mine.

Binary Search

Breadth First Search

Depth First Search

Just replace the queue in BFS with stack and you are done:

Backtracking

Generally a recursive algorithm attempting candidates and either accepting them or rejecting (“backtracking”) based on some condition. Below is an example of permutations generation using backtracking:

Dynamic Programming

Technique to determine result using previous results. You should be able to reason about your solution recursively and be able to deduce dp[i] using dp[i-1]. Generally you would have something like this:

Dynamic Programming: Knapsack

Union Find

When you need to figure out if something belongs to the same set you can use the following:

Binary Tree Traversals

  • Inorder: left-root-right
  • Postorder: left-rigth-root
  • Preorder: root-left-right
  • Plus each one of them could have reverse for right and left

Below are recursive and iterative versions of inorder traversal

Segment Tree

Djikstra

Simplest way to remember Djikstra is to imagine it as BFS over PriorityQueue instead of normal Queue, where values in queue are best distances to nodes pending processing.

Multi-threading: Semaphore

Multi-threading: Runnable & synchronized

Strings: KMP

Hopefully no one asks you this one. It would be too much code here, but the idea is not that difficult: build LPS (example “AAABAAA” -> [0,1,2,0,1,2,3]); use LPS to skip when comparing as if you were doing it in brute-force.

Strings: Rolling Hash (Rabin-Karp)

Let’s imagine we have window of length k and string s, then the hash for the first window will be:

H(k) = s[0]*P^(k-1) + s[1]*P^(k-2) + … + s[k]*P^0

Moving the window by one, would be O(1) operation:

H(k+1) = (H(k) – s[0]*P^(k-1)) * P + s[k+1]*P^0

Here is some sample code:

Unit Testing

JUnit User guide: https://junit.org/junit5/docs/current/user-guide/

Data Structures

Many interview problems can be solved by using right data structure.

System Design Topics

  • requirements gathering
  • capacity planning
  • high level design
  • components design
  • database design
  • API design
  • queue
  • map reduce
  • long poll and push models
  • load balancing
  • partitioning
  • replication
  • consistency
  • caching
  • distributed caching
  • networking

System Design Interview Considerations

These are notes I took while reading “System Design Interview – An insider’s guide”:

  • Chapter 1: Scale From Zero To Millions Of Users
    • All of the most basic concepts introduced of a software design of a typical web application:
    • Client/Server communication and DNS
    • Load balancing and introduction of multiple servers
    • Database replication with single master and multiple slaves and also later on database scaling
    • Caching at the front of a database
    • Content delivery Network (CDN) use for static content 
    • Stateless vs stateful server with clear preference for stateless
    • Datacenters for more scale and for better reliability
    • message queues for async operations and reducing spikes in load
    • all of the auxiliary considerations for large system: logging, metrics, monitoring, 
  • Chapter 2: Back-of-the-envelope Estimation
    • This is not something I have top of my mind when thinking about system design, but I guess it is important in the context of the interview to understand the scale of the system you are building.
    • Some number:
      • Transatlantic network roundtrip: Approximately 150ms
      • Reading 1MB from memory: Around 250 microseconds
      • Reading 1MB from network: Roughly 10ms
      • Reading 1MB from disk: Typically 30ms
      • Processor thread synchronization (mutex): As little as 100 nanoseconds
  • Chapter 3: A Framework For System Design Interviews
    • understand the problem and establish design scope
    • propose high level design and get buy-in
    • Design deep dive: work with interviewer to understand what they would like you to dive into, also don’t be shy to propose your own area for deep dive
    • wrap up: you can reiterate over the design, mention other considerations you could have spoken about
  • Chapter 4: Design A Rate Limiter
    • using the framework from chapter 3 goes over designing rate limiter
    • explains few different algorithms to rate limiter:
      • token bucket – simple, used by Amazon, imagine a bucket, there is re-filler that just adds tokens periodically, if bucket is full tokens fall-out, every request takes out one token, if there are no tokens at given time request is rejected
      • leaking bucket – FIFO
      • fixed window counter
      • sliding window log (expensive on memory as we need timestamps)
      • sliding window counter
    • Important to send back HTTP 429 with X-Ratelimit-Retry-After
    • Talks about issues of using counters in distributed environment and how Redis distributed cache can be used as a solution 
    • More advanced rate limiting techniques are only mentioned, like limiting in network Layer 3 IPTables, not just layer7 HTTP
  • Chapter 5: Design Consistent Hashing
    • Explains hash space and hash ring – we place all n servers into a ring and then once we have hash key we find the server that has it on the ring by going clockwise
    • Explains an issue with adding or removing servers (all data has to be copied over)
    • Introduces concept of virtual nodes, this is same as used by Amazon Dynamo – one physical server owns multiple virtual nodes, this archives more balanced distribution of keys (we don’t end in situation when one servers owns majority of keys)
  • Chapter 6: Design A Key-value Store
    • This is basically copy-paste-simplify from Amazon Dynamo WP but explained in more accessible way
    • Additionally starts with introducing the CAP Theorem
    • Additionally mentions:
      • gossip protocol – each server has heartbeat info about other servers and periodically “gossips” about it with other servers, once they see one of them is not really getting fresh heartbeats it is declared “dead”
      • hinted handoff – process of communicating to the server that is known to be healthy and then once the server the data was intended for is back online the data is transferred to where it belongs
      • Merkle tree is used to handle more permanent failures to reduce amount of data that has to be synchronized (we compare trees on two servers)
  • Chapter 7: Design A Unique Id Generator In Distributed Systems
    • Discusses how difficult it is to generate unique id in distributed environment if there are certain constrains, for instance we want ids to increment with time.
    • Twitter snowflake is 64 bit id: 0 + 41bit timestamp + 5bit datacenter id + 5 bit machine id + 12 bits sequence number (restarted every ms)
    • One core assumption of this is that we have distributed clock, which was outside of the scope of this chapter
  • Chapter 8: Design A Url Shortener
    • Well, we want a HashMap, but we want it distributed and persistent.
    • Important to try to estimate. For instance we can calculate we need 36TB of space for 10 years with 100million URLs written per day. We can also estimate that we need Hash value length of 7 [0-9, a-z, A-Z] for storing those 365 billion records
    • Discussed different hashing algorithms, CRC32, MD5, SHA-I and the issue that if we use them and then map into 7 value length we will run into collisions, thus we need to come up with technique where we first look into cache/db if the key is already present and if so, we need to re-hash
    • Base62 conversion is another potential method to use. Basically we take sequential integer number (base 10, obviously) and convert to base 62, we end up with things like: 11157 => “2TX”
    • URL redirect has to be 301 (permanent redirect) or 302 (temporary redirect). Temporary could be useful if we want analytics. Obviously for reducing load we might prefer to go with 301.
  • Chapter 9: Design A Web Crawler
    • Other than Google and other search engines crawling the internet there are numerous web archives that attempt to archive the internet. Web mining (news, knowledge) and web monitoring (copyrights, trademarks) are two other use-cases.
    • Interesting and important considerations:
      • content seen?
      • Caching of DNS (yes ! Roundtripping to figure out IP could take time when we need to visit the same site 1000s of times, each DNS could be (10-200ms), so we store a map site->IP on our own.
      • Blocklisting (sensitive content, etc)
      • DFS isn’t the best choice as we will get stuck on the same website for too long
      • BFS works for the most part but we still need to redistribute so that we don’t spend too much processing on a single site (like wikipedia)
      • “Politeness” – yeah, we don’t want to aggressively bombarde the website. Funnily enough I have a personal story of a website of a metal band when they said they hid some nugget presents on their website, so I wrote a web crawler for their site and left it running all night. Next day the site was down. I probably wasn’t the only “smart” guy listening to them. LOL.
      • Priority – we need to distinguish between important sites and not so important and also we need to think about frequency of updates of some sites (news)
      • Robots.txt – Robots Exclusion Protocol is the way of websites to tell robots how they should behave 🙂
      • Geography. Let crawlers located physically close to the website host crawl there.
      • Spider traps – yeap. There are intentional and unintentional ones.
      • Server side rendering. Since modern web often uses client side JS scripts to produce HTML we need to load the page and render before parsing.
  • Chapter 10: Design A Notification System
    • Notifications: SMS, email, push-notification. Is there more?
    • For iOS we need to use Apple Push Notification Service. For Android Firebase Cloud Messageing is often used. For SMS Twilio, Nexmo are used. For email Sendgrid and Mailchimp.
    • Normally a notification system is a component in a large system and services that perform some operations interact with it so that notifications are sent: [Services: 1-N]->[Notification System]->[ANPs, FMC, SMS, etc]->[User]
    • We introduce queue per channel and workers consuming messages from the queue before they connect to 3rd party services.
    • Message queues help with spike in volume of messages and also isolate different 3rd party dependencies (queue for SMS could be full but other queues get emptied)
    • We need to add a notification log so we have record of what was sent to 3rd party.
    • A key metric for monitoring is # of messages in the queue. Either capacity is not enough or 3rd parties aren’t keeping up/down.
    • Analytics Service is another component to consider. Data can be sent to it from different points in the lifecycle of the notification (say, user clicked on a link in notification).
  • Chapter 11: Design A News Feed System
    • Key to this design is to take posted item (say Facebook post) and then place it into different processing systems, specifically:
      • post service to properly store the post
      • notification service to notify “friends” of the poster
      • fanout service is to propagate the post to subscribers (“friends”)
    • Fanout service that delivers the news can be implemented in two variations:
      • on write (push model)
        • precomputed, fast
        • expensive if many friends or infrequent use of the app
      • on read (pull model)
      • A hybrid approach is usually best (think how Twitter does this for celebrities): 
        • push model for regular users so its fast
        • pull model for subscribers of celebrities
    • We use CDN for static content (images, videos)
    • We use caching at multiple levels (news feed cache as people usually only interested in latest, user cache, post cache)
  • Chapter 12: Design A Chat System
    • One of the key concepts to know here is how clients connect:
      • Polling – client periodically asks for data
      • Long polling – client connects to data and server sends data back once it is available, connection maintained for longer period of time (30sec or something)
      • WebSocket – bidirectional communication; client establishes the connection but server can then send data. Starts as a HTTP connection and then converts to WebSocket; works on same ports: 80 or 443 so should work everywhere.
    • All of the authentication, metadata and other API work pretty much as in other systems.
    • 1-1 chat works by two clients connecting to servers using WebSockets. Because one of the users might not be online we use message queue + keyvalue store to store messages and also a push notification is sent to another user about availability of new messages
    • Facebook processes 60 Billion messages a day.
    • Message synchronization is another important topic. Can be solved by client devices keeping max message id. Message ids should be sortable and represent a timeline.
    • For a small group chat we can maintain “inbox” for each user and then we add messages to that inbox from other users
    • Online presence is another interesting consideration. We shouldn’t be too eager to constantly know the status as network connection might not be stable.
  • Chapter 13: Design A Search Autocomplete System
    • This is basically a huge Trie, that is built on periodic basis (say weekly) which is loaded into memory but also persisted to storage.
    • Each node other than just having counter for occurrences also has cached list of top X results, in this way in order to provide autocomplete we just find the node and immediately have suggestions.
    • Caching is important in multiple places
      • clien browser. A header can be set “max-age=3600”. This is exactly what Google does.
    • Updating TrieNodes on-the-fly is very expensive.
    • A follow-up would be to support real-time search queries. An idea here could be to give more weight to more recent queries.
  • Chapter 14: Design Youtube
    • back of envelope for 5 million daily users
      • 5 M * 10% (uploaders) * 300 MB (avg video) = 150TB
      • CDN cost ~ $150K daily
    • For video uploading
      • we need transcoding servers, which could break video into smaller chunks and encode it to different sizes
      • Storage for encoded videos 
      • CDN to have videos available for users
      • completion handler to notify systems of completion of transcoding
    • Streaming flow
      • MPEG-DASH, Apple HLS, MS Smooth Streaming, Adobe HTTP Dynamic Streaming are different kinds of streaming protocols over HTTP
    • Facebook is using directed acyclic graph (DAG) for processing videos. So, the root is the original video, children are: video, audio, metadata, Then children of “video” are: inspection, video transcoding, thumbnails, watermark, ect; Then child of video and audio is the re-assembled video
    • Considerations:
      • Cost of CDN (don’t keep long-tail videos in CDN as no-one is watching them)
      • safety of videos (copyright, sensitive)
  • Chapter 15: Design Google Drive
    • Consistency is extremely important here. We cannot lose data either.
    • Two types of uploading files: resumable for small files and non-resumable for larger files.
    • Files need to have revisions.
    • Files need to be split into smaller chunks (Dropbox does 4MB).
    • On upload we need to update only chunks that changed and can use checksums/hashes to verify what changed.
    • Sync conflicts have to be handled by a client that was late to the party. This works similarly to have Amazon Dynamo handles this. Client gets two versions of a file and has to arrive at a merged version.
    • Message database could be a proper relational SQL database. We will need to store users, files, file versions, chunks, devices.
  • Chapter 16: The Learning Continues
    • Reference material.

Resources

Well, internet is just full of all kinds of resources and blog posts on how to prepare for the technical interview plus there are numerous books and websites available. LeetCode is probably the most known of practicing websites. “Cracking The Coding Interview” is probably the most known book. I can also recommend “Guide to Competitive Programming”. As of system design, preparation resources are not so readily available. One good resource is “Grokking System Design Interview”, but it would be more valuable to practice designing large distributed systems. I also found the book “Designing Data-Intensive Applications” to be extremely helpful.

Asking for help

What are other “must know” algorithms and important system design topics to be listed in this post?

If you haven't subsribed yet, you can subsribe below: