Видео 7
Просмотров 2 399 559

System Design Interview - Top K Problem (Heavy Hitters)

36:18

System Design Interview - Distributed Cache

34:34

System Design Interview - Rate Limiting (local and distributed)

34:36

System Design Interview - Notification Service

25:11

System Design Interview - Distributed Message Queue

26:28

System Design Interview Channel Introduction

0:47

System Design Interview – Step By Step Guide

Please check out my other video courses here: www.systemdesignthinking.com
Topics mentioned in the video:
- Stages of a typical system design interview: functional requirements (API), non-functional requirements, high-level design, detailed design, bottlenecks and tradeoffs.
- Why requirements clarification is so important.
- What questions to ask the interviewer.
- How to design API.
- Non-functional requirements to consider: scalability, performance, availability, consistency, cost.
- How to define a data model.
- How to scale a SQL database.
- Apache Cassandra high-level architecture.
- Data processing concepts: checkpointing, partitioning, in-memory aggregation, deduplication cache, dead-lette...

Видео

System Design Interview - Top K Problem (Heavy Hitters)

36:18

System Design Interview - Top K Problem (Heavy Hitters)

Просмотров 354 тыс.5 лет назад

Please check out my other video courses here: www.systemdesignthinking.com Topics mentioned in the video: - Stream and batch processing data pipelines. - Count-min sketch data structure. - MapReduce paradigm. - Various applications of the top k problem solution (Google/Twitter/RUclips trends, popular products, volatile stocks, DDoS attack prevention). Merge N sorted lists problem: leetcode.com/...

System Design Interview - Distributed Cache

34:34

System Design Interview - Distributed Cache

Просмотров 349 тыс.5 лет назад

Please check out my other video courses here: www.systemdesignthinking.com Topics mentioned in the video: - Functional (put, get) and non-functional (high scalability, high availability, high performance) requirements. - Least recently used (LRU) cache. - Dedicated cache cluster, co-located cache - MOD hashing, consistent hashing. - Cache client. - Static vs dynamic cache servers list configura...

System Design Interview - Rate Limiting (local and distributed)

34:36

System Design Interview - Rate Limiting (local and distributed)

Просмотров 287 тыс.5 лет назад

Please check out my other video courses here: www.systemdesignthinking.com Topics mentioned in the video: - Token bucket algorithm. - Object-oriented design of the rate limiting solution. - Load balancer max connections, auto-scaling. - Message broadcasting: full mesh network topology, gossip communication, distributed cache, coordination service. - Communication protocols: TCP, UDP. - Embedded...

System Design Interview - Notification Service

25:11

System Design Interview - Notification Service

Просмотров 248 тыс.5 лет назад

Please check out my other video courses here: www.systemdesignthinking.com Topics mentioned in the video: - Functional (create topic, publish message, subscribe to a topic) and non-functional (high scalability, high availability, high performance, durability) requirements. - High-level architecture of a notification service. - FrontEnd service host components (reverse proxy, local cache, logs a...

System Design Interview - Distributed Message Queue

26:28

System Design Interview - Distributed Message Queue

Просмотров 269 тыс.5 лет назад

Please check out my other video courses here: www.systemdesignthinking.com Topics mentioned in the video: - Synchronous vs asynchronous communication. - Functional (send message and receive message) and non-functional (scalability, high availability, high performance, durability) requirements. - High-level architecture of a distributed message queue. - VIP, Load balancer - FrontEnd service and ...

System Design Interview Channel Introduction

0:47

System Design Interview Channel Introduction

Просмотров 110 тыс.5 лет назад

Please check out my other video courses here: www.systemdesignthinking.com

@NishaUchil 21 час назад
For system Design, you are an unbeatable in the market right now.
@R0hanThakur 5 дней назад
Can someone explain me how CMS helps use say I got count of 100 videos in first 10 second, and the next 10 seconds there are totally new video count entries. and so on. Then when we use the cms to calculate the count heap for the minute, how does the hashKey and value in CMS translate to any of the videoId,
@user-xv8ru8ny2v 9 дней назад
@15:15 - when adding a new host "host 6" that takes cache items from host 4. Is there any stretegy to manage the requests for those items that are movied to Host 6 from Host 4 while the data is replicated?
@sumedhabirajdar 9 дней назад
Most system design interview answers contains high level design decisions. These videos explains how the data flows. Which is something I needed.
@sumedhabirajdar 9 дней назад
Most system design interview answers contains high level design decisions. These videos explains how the data flows. Which is something I needed.
@borisv8766 10 дней назад
I honestly don't fully understand the single host solution (6:16). It assumes that we already have a *fixed* list of events string and can iterate through the list to form the heap. But events are happening *in real time* even for single host. If I already got heap of heavy hitters formed, and a new event happens with a String identifier, then I have to check the heap (consisting of "HeabyHitter" elements) whether it already has HeavyHitter for same identifier, and update its frequency if needed. Which is a slow thing to do in real time. So to me it sounds not like a single host solution, but more like an algorithm exercise of picking heavy hitters from pre-existing list. P.S. silly me 🤦‍♂. This concern is addressed later in the video.
@sudhanshukumar-yu7fj 13 дней назад
I liked the confidence when he justified the performant system by saying that it would use TCP connection to connect between cache client and cache servers. Networks latencies always increase latency making it less performant compared to in-proc calls.
@drawingbook-mq6hx 13 дней назад
How this is different from kappa architecture in the slow path?
@antonchov736 13 дней назад
Why he call BackEnd FrontEnd? I cant get it
@ht1590 14 дней назад
I've been in the interview loops several times in my 8year software engineering career and as part of interviewing, I've followed multiple different system design interview preparation sources. This video is the best source of information that one needs when preparing for a system design interview. The way the concepts are put in a clear, concise and defined manner is commendable. Really appreciate the hard work put behind creating such an informative video.
@tdiggitydawg 14 дней назад
Why didn't you consider a multi-leader approach? Your approach seems to be a single-leader approach. Did you assume that you're optimizing for high reads but not high writes?
@amirphl7834 15 дней назад
What is the role of the red box (data partitioner) (22:26)? Data is already batched and partitioned in the first messaging system. Can you elaborate more on this?
@shubhamshah-sf1po 21 день назад
Merging top k lists from different partitions won't be entirely accurate right? so lets say if one video falls into k+1 position into two partitions, then we lose that video right. Or have we made sure that partitioning is such that each partition will get the same video, but then we might have the problem of hot partitions?
@AbdulSamadh-lg7lt 23 дня назад
I believe this can explain the reason why we cannot use 60 1-min stores of topK elements to create 1 hour store of topK elements. Imagine we only have 4 possible videos to watch on youtube(A, B, C, D) and we want to find top3 viewed videos for every 1-hour window given we have all the top3 video view counts for 60 1-min windows. Imagine in the first 59 1-min windows we get the following: 10 views of A, 10 views of B, 10 views of C, 9 views of D And in the 60th window we get the following: 10 views of A, 10 views of B, 5 views of C, 70 views of D. In one hour these were the actual views of the 4 videos: views of A = 10*60 = 600 views of B = 10*60 = 600 views of C = 10*59 + 5 = 595 views of D = 9*59 + 70 = 601 Top3 should return D,A and B But if we had only 60 1-min window information these are the views we can count: views of A = 10*60 = 600 views of B = 10*60 = 600 views of C = 10*59 = 590 (we dont count the 5 from the last 1-min window because top3 in last 1-min window was D, A and B) views of D = 70 (we dont count 9*59 from the first 59 1-min videos since top3 were A,B and C. So with this calculation we would incorrectly get A,B,C has the top3 instead of D,A,B. Hope this helped.
@aniketdali 25 дней назад
Found it insightful. Thank you for creating this video.
@sunlee79 25 дней назад
To the point, while still covering all aspects of the problem. 19:59 if you only wanted a summary
@digiday505 27 дней назад
Love the video. I know it's been a while, but I have a quesiton. In the fast path, when using the Count-Min Sketch, you mentioned that "memory isn't an issue thus we don't need partition". It is true that the CMS only answer the question of "how many counts we have for a key X" with fixed memory. But we still need to store all the keys that we have seen in order to rank it with a heap, right? In that case, we still have a memory issue the number of unique keys would grow, just like a hashmap
@tommyls4357 27 дней назад
I don't understand why do we need to fill the bucket at a constant rate? Can't we instead add a token to it when a request completes?
@andersondantas2010 Месяц назад
I think you can do better on the slow part by pre-computing chunks of powers of 2 sizes like in a segment tree. For example imagine you have data from timestamps 1 to 10. you would have the following chunks: [1-10] / \ [1-8] [9-10] / \ | \ [1-4] [5-8] [9][10] / \ / \ [1-2] [3-4] [5-6] [7-8] / \ / \ / \ | \ [1][2][3][4] [5][6] [7][8] Calculating each chunk would take up to log(n) steps. This approach also would eat more space, about 4 times more, but enables online data-processing and real-time accurate response. We can distribute this data structures into many machine using the approach in Shipeng Li's article Distributed Segment Tree: A Unified Architecture to Support Range Query and Cover Query. As the timestamps might yield very sparse segments, we can "compress" 10s or 100s of contiguous timestamp into a single value for faster access, but that would require merging each chunk individually in case a query looks for data that is not fully covered by the range.
@jianlyu700 Месяц назад
hmm, I have walked the 11:36 earlier for my previous interview, I got lost by flooding questions from interview in high level design. The whole interview went like a disaster
@dobrovidoff Месяц назад
Thank you, king.
@sowmyadhotre834 Месяц назад
Could you please also list out few distributed caches examples in description. Are they redid, memcached ?
@prashantkumardhotre5695 Месяц назад
What are the examples of distributed caches
@rohanchhokra Месяц назад
One brilliant thing about this video is that that the teacher is literally designing another Kafka when talking about partitions piece of the system(the one which is used by partition service). Rather than abstracting that piece by just calling it Kafka, the author is talking about the barebones structure of that piece thereby also teaching us how Kafka itself works(on some level).
@shreyamduttagupta7527 Месяц назад
I am getting started with System design, can someone explain what is frontend service after the load balancer? In a usual web development context, we use the frontend term for clients but I am assuming it means something different here? I love how detailed these videos are but struggling to understand this one and Distributed Message Queue because of this frontend service. Any help is appreciated.
@jianlyu700 Месяц назад
OMG, this is still the best system design video i've ever seen. it's not only for interview, but also for actual system solution design.
@nikhil199029 Месяц назад
25:20 some magic happening
@jivanmainali1742 Месяц назад
small doubt does LRU works with data in single machine @33:36 because if we have oldest data in machine 2 but chache is not filled and cache is filled in machine1 but we have recent data than machine 2 so lru will delete data from machine 1 but effective deletetion would have been to delete from machine 2 . Any idea on it
@unfiltered_motivation Месяц назад
Please come back man, the community needs you.
@nishantagrawal6244 Месяц назад
Don't understand at 14:22, how 2 more tokens were added in refill method call at t2? As per the algorithm, 5 more tokens should be added. Refilling need to be done at t2 since there are not enough tokens left. So Number of tokens to be added = (t2 - t0)*refillRate = (t0+300+200-t0)*[(1/100) tokens/ms) = (500 ms)*(1/100) = 5 tokens
@gitarowydominik 2 месяца назад
We're designing a notification service, so a pub/sub service. And Kafka is mentioned as a potential choice for a message queue and stream processing platform. But Kafka itself can be used as a pub/sub service, so maybe just use that functionality of Kafka instead? :)
@gitarowydominik 2 месяца назад
Although Kafka only supports subscribers to pull data, no push support, so something to talk about.
@barryliii3234 2 месяца назад
Some confusion around merging top K lists over a time interval. Are we losing a lot of data here? For example, k = 2, for first 1 min, top 2 are A = 100, B = 99, for the second min, top 2 are C = 100, D = 99, merging these two lists gives us A and B as the top 2. But there might be a E which had 98 views in the first min and 98 views in the second min. E should be the most viewed video in this case. In general, how does merging top K lists work?
@SG-rj8bc 2 месяца назад
Great video ! Minor comment @ 14:14 when second request came for 5 tokens, time is t0 + 500ms, bucket should be filled with 5 tokens instead of 2 token.
@AlexXPandian 2 месяца назад
I’ll be back.
@akankshamahajan9709 2 месяца назад
Wowww!!! These videos really helped me to prepare for my SD interview. Is there anything similar for ML System Design interviews?
@akankshamahajan9709 2 месяца назад
Wowww!!! These videos really helped me to prepare for my SD interview. Is there anything similar for ML System Design interviews?
@nitin_puranik 2 месяца назад
These video series are super amazing, no doubt. However, my opinion is that none of these system design solutions are intuitive. If you've watched these videos and remember some of the key points, then you can perform well during interviews. If you haven't watched these videos, then good luck. There is no way someone that hasn't watched these videos, or someone that hasn't worked on these things to randomly come up with suggestions like count-min sketch, fast and slow processor, etc. That's a little disappointing. I've watched all of Mikhail's videos, but if somebody asks me to design Tiktok or Airbnb, I'm sad that I will again feel clueless. I can talk a bit about load balancing, replication, partitioning/sharding, security, etc, but the core logic seems like you would require very specific domain knowledge. You either know it or you don't - you can't make it up on the spot :(
@yuemengyin4952 2 месяца назад
Thanks for the amazing video! It really helped me with preparing for the upcoming system design interview.
@kaushalagrawal6258 2 месяца назад
Explanation for why we cannot accurately merge top-k heavy hitters of 60 1-minute periods to get the top-k heavy hitters for 1-hour period: Take this instance for example, where we want top-1 heavy hitters, and the top 2 heavy hitters for the last 60 1-minute periods are as follows - Minute 1: video1 = 100 times video2 = 0 times Each of the Minute 2, Minute 3, ..., Minute 60: video2 = 2 times video1 = 1 time Merge these lists to get [video1] as the top-1 heavy hitter in the last 60-minute period. video1 = 100+59*1 = 159 times video2 = 0+59*2 = 118 times But merge only the top-1 lists and you get [video2] as the top-1 heavy hitter in the last 60-minute period. video1 = 100 times video2 = 59*2 = 118 times
@kaushalagrawal6258 2 месяца назад
simply the best.
@yaakovsnett1507 3 месяца назад
ruclips.net/video/kx-XDoPjoHw/видео.html The reason would be that because only the top k elements are saved for each minute, the other element counts are lost, and won't be added to the calculation to the hourly top k
@qhadj5387 3 месяца назад
Solid design. Given the title is system design, we can skip the implementation details and focus more on the trade offs
@HarpreetKaur-oj5eg 3 месяца назад
Brilliant content! Please start uploading again!
@AmolGautam 3 месяца назад
Thank you so much for this video.
@ch33zer 3 месяца назад
Around 13:15 there's a bug in the code. If tokensToAdd is 0 we still update the timestamp. This means if we have lots of requests close to each other with all tokensToAdd being 0 we may never refill the bucket. Solution is to only update the timestamp when tokensToAdd is non zero.
@charliebitme-zb3nv 3 месяца назад
epitome of system design
@runzhou3070 3 месяца назад
I laughed so hard at 42:24, and grab a cup of coffee then
@yuliiadzidzoieva2259 3 месяца назад
God, how wrong I was when dedicated only 2 hours for watching that. I spend couple weeks absorbing and digesting information, and just cannot express my gratitude enough. It is very useful, very well explained!
@shikharsharma6399 3 месяца назад
Very good explanation!

System Design Interview

Комментарии