[MUSIC PLAYING]
DAVID J. MALAN: This is CS50. And this is the start of week three. So we've got a lot of exciting things to cover today. A lot of opportunities for volunteers up on stage. And ultimately, today is not about code at all. But it's about ideas, and it's about algorithms, and actually bringing back some of the lessons learned from week zero, wherein recall, we introduced this monstrosity. And borrowing inspiration from that, to start to solve ever more sophisticated problems algorithmically. But first, a couple of announcements. So one, if you would like to join CS50's staff and classmates at lunch this Friday, both here and in Cambridge, and in New Haven, please visit the course's website, where a URL can be found. Lecture this Wednesday will not be here in Sanders. It will be online only, so tune in at CS50's website, whether here in Cambridge or New Haven as well.
And then problem set two is already in your hands. If you haven't dived in yet, allow me to offer the strongly worded suggestion that, especially now, as the problem sets advance, you really do want to start now, if not dabble a bit on the weekend or prior when they first go out on Fridays, because you'll find that they're not necessarily getting longer or more challenging per se. I think you'll find that, in general, they tend to take roughly around same amount of time. But it certainly depends on the student, and it depends on the mindset with which you approach it. But invariably, you're going to run up against some wall, and you're going to hit some bug, and you're just not going to be able to get over it at some point. And it's hugely valuable to be able to step away, come back the next day, go to office hours, post on CS50 Discuss or the like, to actually get unblocked. So keep that in mind. Starting earliest as possible is the best thing you can do. So here's where we started the class, over in week zero. And can we get a volunteer here to help me find mics? OK. Standing up already. Come on up. Guess that's how it's going to work. What's your name? ALAN ESTRADA: Alan Estrada. DAVID J. MALAN: Alan Estrada. Come on up. Nice to meet you. ALAN ESTRADA: Nice to meet you. DAVID J. MALAN: And you were here with us in week zero, of course. ALAN ESTRADA: I was. I was.
DAVID J. MALAN: So could you go ahead and find for us Mike Smith, as fast as you can? As fast as you can. Literally tearing the problem in half as you need to.
ALAN ESTRADA: Um. DAVID J. MALAN: Literally tearing the problem in half.
ALAN ESTRADA: Oh. Mm. Very good.
DAVID J. MALAN: OK. Good. Thank you. ALAN ESTRADA: Very good. OK.
DAVID J. MALAN: And so now, you've whittled it down to half the size of the problem. Now, we're down to a quarter. Are you paying attention to which side we're keeping?
[LAUGHING]
ALAN ESTRADA: Yes, I think--
DAVID J. MALAN: What section are we in? ALAN ESTRADA: Mufflers, so.
DAVID J. MALAN: OK. But Mike Smith is going to be after Mufflers. So--
[LAUGHING]
All right. ALAN ESTRADA: Where are we looking? DAVID J. MALAN: Mike Smith. ALAN ESTRADA: Mike Smith. DAVID J. MALAN: Now, we're in surgical. Now, physicians. Now--
ALAN ESTRADA: Let's- let's go with real. Real.
DAVID J. MALAN: Real. OK. If you need Real. Now, which way is Mike Smith?
ALAN ESTRADA: This way.
DAVID J. MALAN: Which way? ALAN ESTRADA: Wait. M is-- right? We started with--
DAVID J. MALAN: Yeah. They're left. Your right.
ALAN ESTRADA: Yeah.
DAVID J. MALAN: So Mike's in here. ALAN ESTRADA: What?
[LAUGHING]
Bad example, guys. Sorry. DAVID J. MALAN: This will teach you to leap out of your chair.
ALAN ESTRADA: Oh. Oh. I got you. I got you. Oh. Oh. This is-- OK, I got you. Smith right here?
DAVID J. MALAN: Smith, thank you. So I'll keep looking up Smith?
ALAN ESTRADA: Oh, yeah. No, no, no. Oh, no. This is mine.
DAVID J. MALAN: Oh, you got Smith. OK.
ALAN ESTRADA: Yeah, I got Smith right here. Sorry, guys. I thought Michael-- we were looking for Michael. Sorry.
DAVID J. MALAN: It's OK. All right, now we're into Paccini and Sons.
ALAN ESTRADA: Paccini and sons. DAVID J. MALAN: Only you and I are in on this. OK. Find us Mike Smith. Smith.
ALAN ESTRADA: Smith.
DAVID J. MALAN: Smith. We're in R for rubbish. ALAN ESTRADA: Rubbish. Oh. This is going to take a while.
[LAUGHING] DAVID J. MALAN: Shoes. We're in shoes.
ALAN ESTRADA: Now we're gonna--
DAVID J. MALAN: Nice. ALAN ESTRADA: Which-- [LAUGHING] Oh, this is great. [LAUGHING]
DAVID J. MALAN: It's OK.
ALAN ESTRADA: Oh, this is good. I don't think I'm going to have PSAT buddies after this. DAVID J. MALAN: Good. Sporting. ALAN ESTRADA: Sporting. Um, L, M, N, O, P. DAVID J. MALAN: OK. So let's tear this in half. It's OK. This ends poorly anyway, because Mike Smith won't be in the yellow pages.
ALAN ESTRADA: Aw.
DAVID J. MALAN: No, it's OK. But let's pretend like he's on this page. So now, you've whittled the problem down to one page, and we found Mike Smith.
[CHEERING] OK, thank you. OK. That was extraordinary. But it was still faster than linear search, wherein we start at the beginning of the book, and we move our way from left to right, eventually looking for Mike Smith. And so, if the phone book had maybe 1,000 pages, maybe it would have taken us 10 or so page tears.
But you may have leveraged touched an assumption during all of that, which is to say that the phone book in advance was what? AUDIENCE: Sorted. DAVID J. MALAN: It's sorted. Right? It's sorted alphabetically, so that all of those names and numbers are sorted from the A's to the Z's, and alphabetically in between. But today, we now ask the question, well, how did Verizon or the phone company get it into that state?
Because it's one thing to leverage that assumption, and therefore, solve a problem with an algorithm more efficiently. But we never really asked in week zero, well, how much did it cost Verizon or someone else to get that phone book in sorted order?
Right? It doesn't matter if looking up Mike Smith is super fast, if it takes you a year to sort the pages initially. Right? You might as well just sift through a randomized phone book, if it's going to be super expensive to sort it. So if we can have another volunteer. Let's take a look up here at how we might-- come on up-- how we might go about sorting these.
And if Jordan could actually join us up here on stage. Come on up for just a moment. What's your name? CAROLINE: Caroline. DAVID J. MALAN: Caroline, come on up. And you'll be joined by me and Jordan here. Caroline, thank you. All right. So what we have here for Caroline is 26 blue books that FAS uses to administer certain final exams. These are getting hard to find, but what we've done in advance is that we've put someone's name on the front of each of these, but we've kept it simple by then putting out full names. So we would put the person with the name L, D, J, B, all the way A through Z, but they're in random order. And so if you would, talking your way through the problem as you do it, can you go ahead and sort these for us, from A to Z.
AUDIENCE: OK, so L is like, the middle. C is beginning. B. J before L. B, Q.
DAVID J. MALAN: Hold that thought for one second. Because otherwise, this is only interesting to you, me, and Jordan. There we go. AUDIENCE: [INAUDIBLE]. R. DAVID J. MALAN: OK. What are you doing?
CAROLINE: M comes after O.
DAVID J. MALAN: OK.
CAROLINE: O. DAVID J. MALAN: O, Good.
CAROLINE: E.
DAVID J. MALAN: E, F. Yeah.
CAROLINE: T, U, V.
DAVID J. MALAN: V, T, U, V. So it looks like you're making-- keep going. It looks like you're making a big pile over here, and kind of a big pile over there. So the first half of the alphabet, second half of the alphabet. OK. Good. Kind of splitting the problem in two. M, N, X. Yeah. CAROLINE: K. DAVID J. MALAN: OK. K. So you're kind of selecting them one after another, putting it either left or right, or Z's going on the floor. OK.
CAROLINE: Z's going on the floor.
DAVID J. MALAN: OK. Y is going on the floor. Now we can put X.
CAROLINE: G. DAVID J. MALAN: G's going left. S is going right. All right, A is going all the way left.
CAROLINE: A, B, C, D.
DAVID J. MALAN: Now, good. We've got A, B, C. W's going down there. All right, T.
CAROLINE: H, I , J. DAVID J. MALAN: H, I, J. Good. CAROLINE: In the center, I'm gonna-- DAVID J. MALAN: OK. So now, we're going to kind of merge these various piles. So A through C, then I see D, and E, and F, and G, and H, and I. Nice. J, K. And then, this pile is upside down, but that's OK. Sure. We can cut some corners. OK. And then we need W, X, Y, Z.
CAROLINE: Yeah.
DAVID J. MALAN: Excellent. So a big thank you to Caroline for sorting these.
[CHEERING]
Thank you. Thank you very much. So now let's consider for a moment how Caroline went about doing that, and what exactly we were able to-- how we were able to solve that problem when we were just given a whole bunch of random inputs.
Well, it looks like there was a bit of a system there? Right. So the earlier letters in the alphabet, she was putting to the left, and the later letters in the alphabet, she was putting into the right. And as soon as she found some proximal letters, ones that go right next to each other, she would put those in order. And so we kind of had these small piles of sorted inputs occurring.
And so that's quite like what most of us humans would do. We would sort of sift through it, and we'd kind of have a mechanism. But it might be hard to write it down in a formula per se. It felt a little more organic than that. So let's see if we can now bound the problem with fewer inputs.
Instead of 26, let's do something far fewer with just say, seven, behind these doors, so to speak. Are there just seven numbers? And if the goal now at hand is to find a value, let's see how efficiently we can go about doing this. And let's see if we can now start to apply some numbers, or some formulas with which to describe the efficiency of our phone book algorithm, our exam book algorithm, and more generally, finding information.
So for this, let me go ahead, and on the touch screen over here, put up a web browser that has exactly these seven doors. And if we could get one other volunteer to come on over here, I've put these same doors over here. Quick volunteer.
This one-- demos are going to a quicker and quicker now. Come on down. What's your name? TREVOR: Trevor.
DAVID J. MALAN: Trevor? All right, Trevor, come on down. So Trevor has volunteered here to do a similar problem, but one that's narrower in scope, and that's going to allow us to try to formalize now the process for sorting these numbers.
So Trevor, nice to meet you. So here is an array, so to speak, a list of seven doors. Go ahead and find us the number 50. And then after the fact, tell us how you found it. Should be-- all right. Yeah, this is the one here? Uh-oh. OK. You clicked that one. Good.
And good. Now you clicked that one. And let me give you the microphone, so that you have it in just a moment. Go ahead and click the next door that you intend. Yes, good. TREVOR: Can I unclick a door? DAVID J. MALAN: No, you can't unclick. TREVOR: OK. This one. DAVID J. MALAN: Where do you want to go? Which one?
TREVOR: That one.
DAVID J. MALAN: No.
TREVOR: OK. This one.
DAVID J. MALAN: Yes. That was good. All right. So what was your algorithm or procedure for doing this, Trevor?
TREVOR: I just went through doors until I found a 50. DAVID J. MALAN: OK. Excellent algorithm. So that's fine. Because in fact, if I reveal what's behind these two other doors, what we'll find here is that we only have random input. So that was actually as good as you could get. And in fact, you got better than exhaustively searching the whole array, because it would have been really unlucky if you had hit the number 50 at the very last door. But what if we instead gave you an assumption. Suppose I sort all of these doors around, so that you have the numbers sorted this time, but this time it's actually a different-- this time, it's actually sorted for you. And now the goal at hand is to hit the number 50.
TREVOR: OK.
DAVID J. MALAN: What's your algorithm going to be?
TREVOR: Well, if it's sorted, it's either going to be-- if biggest to largest, descending, it'll be the first one, or if it's the opposite, it will be the last one. So I'll just tap this door, and then just tap the last door. DAVID J. MALAN: Excellent. All right. So we found the number 50. So as soon as you knew they were sorted, we were able to leverage this assumption. So they're too much like the phone book example. As soon as you have, even with a small problem like this, your inputs pre-sorted, we can actually find the value arguably more efficiently.
And I didn't tell you if it was sorted small to big, or big to small, and so it was very reasonable to start at one end or the other to actually find that target value. So thank to Trevor as well. And I'll propose-- nicely done. We have a little clip, actually, that is among our favorite moments in CS50, whereby sometimes these demos don't quite go according to plan. And indeed right now, I pulled up the wrong interface with which to use the touch screen. So that was my fault there.
So this will make for next year's clip as to why I was clicking on my own screen. But let's take a quick look at what happened last year with Jay, who came up, much like Trevor here, volunteered, and in this short clip, you'll see how this same demo didn't quite reveal the same lessons learned. [VIDEO PLAYBACK] -All I want you to do now is to find for me, and for us, really, the number 50 one step at a time.
-The number 50?
-The number 50. And you can reveal what's behind each of these doors simply by touching it with a finger. Damn it.
[LAUGHING]
[END PLAYBACK] DAVID J. MALAN: So that went very well. Those were the unsorted doors. And Jay, of course, found it all too quickly. Trevor did a much better job in terms of a teachable moment, so to speak, this year in taking longer to find it. Of course, then we gave Jay a second chance, whereby we sorted the doors, just as we did for Trevor, and Trevor did super well this time. But Jay did it half as quickly. [VIDEO PLAYBACK] -The goal now is to also find us the number 50, but do it algorithmically, and tell us how you're going about it.
-OK.
-And if you find it, you keep the movie. If you don't find it, you give it back.
-Man.
-Oh!
-[INAUDIBLE] OK. So I'm going to check the ends first to determine if there's-- Oh.
[APPLAUSE]
[END PLAYBACK] DAVID J. MALAN: OK. So sorting doors clearly leads to greater efficiency. And so, twice as fast is what I meant there. And so Jay got lucky both times. And he also got lucky in that last year, I ordered some Blu-ray discs to actually give out. I'm sorry this year, we didn't have the same, Trevor. But better still was a few years back. And some of you might know this fellow, Sean, who when he was in CS50, was challenged with the exact same problem, albeit in SD, as you'll soon see, back in the day. And you'll find that not only did he take a little longer than Jay, a little longer than Trevor, it was actually this wonderful opportunity to engage almost everyone in the crowd a la Price is Right, encouraging him to find the number we were seeking. Let's. take a quick look.
[VIDEO PLAYBACK]
-OK. So your task here, Sean, is the following. I have hidden behind these doors the number seven. But tucked away in some of these doors as well are other negative numbers. And your goal is to think of this top row of numbers as just an array, or just sequence of pieces of paper with numbers behind them. And your goal is, only using the top array here, find me the number seven. And we are then going to critique how you go about doing it. -All right. -Find us the number seven, please. No. Five, 19, 13.
[LAUGHING]
It's not a trick question. One.
[LAUGHING] At this point, your score is not very good, so you might as well keep going. Three.
[LAUGHING]
Go on. Frankly, I can't help but wonder what you're even thinking about, so--
[LAUGHING] Only the top row, so you've got three left. So find me seven.
[LAUGHING] 17. Seven.
[APPLAUSE]
All right.
[END PLAYBACK]
DAVID J. MALAN: So we could watch these all day long. And of course, some of this year's demos perhaps will now end up in next year's video as well. So now let's actually focus on the algorithms here, and see if we can't now start to formalize how we can go about getting our data into this state that it's sorted, so that ultimately, we can actually search it more efficiently. And even though we're going to use fairly small data sets, like the eight numbers we have here on the board, ultimately these same ideas could apply to 1,000 inputs, a million inputs, 4 billion inputs, because the algorithms are going to be fundamentally the same.
And so this is our last opportunity for volunteers today, but perhaps the most involved one, for which we need eight volunteers to come up and walk us through the process of sorting what will soon be on these music stands here. Let me start back here.
So one in the turquoise-- green is it? Are you committing? Two. Come on down. OK. Three. Four. Let me-- OK, five. You're being nominated by your friend. Six, seven, and eight. Come on up. All right. Thank you so much. Come on up. Come on up.
All right. So what we have here-- and this is among the more awkward ones, since this will require that you humor me for just a little bit of time. You shall be number one. What's your name?
ANNAN: Annan.
DAVID J. MALAN: Annan. David. What's your name?
JOSEPH: Joseph.
DAVID J. MALAN: Joseph, you are number two.
SERENA: Serena, number three. Stefan, number four. CYNTHIA: Cynthia. DAVID J. MALAN: Cynthia, number five. [INAUDIBLE] DAVID J. MALAN: [INAUDIBLE]. David, number six. MATT: Matt. DAVID J. MALAN: Matt's number seven. And?
WAVERLY: Waverly.
DAVID J. MALAN: Waverly, number eight. All right. If you could-- whoops. If you all, as your first challenge, there are eight music stands here facing the audience. If you could put your numbers on these music stands in such a way that they line up with the same numbers on the board. So make yourselves look like that by putting your numbers on these music stands here. Excellent so far.
Excellent. OK. So now, we're going to ask the question in a few different ways. How can we go about sorting these folks up here? Because we had a few approaches earlier, whereby we were kind of making two different buckets. And then we were generally piecing things together. As soon as we saw two numbers that belonged together, we put them together. Two letters that belong together.
But let's see if we can't formalize this, so that we ultimately have some pseudo-code you will, with which you can solve these problems. So now, I'm looking out at these numbers here. And I see a whole bunch of mistakes. Ultimately, I want one on the left and eight on the right.
And so I'm looking at these two, four and two. And what's the problem, obviously? Yeah. So. Two obviously comes before four, so you know what? Let me first take a greedy approach, if you will, much like problem set one-- if you recall from the Standard Edition of Problem Set One, where I just locally solve the problem that's right here in front of me and see where it leads me.
OK. So two and four, let me go ahead and just swap you two. If you can physically move yourselves and your paper, I seem to have gotten the list in a better state.
Now, they're good. I'm going to move on, four and six, looks good. Not a problem. Six and eight, OK. Eight and one, another problem. Because what's true about eight and one? One comes before eight, and so what should we do? Let's swap these two. One and eight. And now, I'm going to keep going. I'm going to keep looking ahead. And let's see what happens. Eight and three, of course, out of order. Let's swap. Eight and seven, of course. Out of order. Let's swap. Eight and five, of course, let's swap. All right. List is sorted. yes?
OK, obviously not. But it is a little better, right? Because notice what happened. Each time we performed a swap, a smaller number kind of percolated that way, and a bigger number percolated this way, or we'll start saying bubbled to the left or bubbled to the right.
Now, it's not enough, because at best a number might have moved one spot forward, or at worst, a number might have moved one spot further. So you know what, this kind of worked pretty well so far. Let me just try it again. Two and four, they're OK. Four and six, they're OK. Six and one, out of order. So let's swap you two. And now, notice the problem's starting to get a little better again. Six and three, out of order. Let's swap you two. Six and seven, you're good. Seven and five, of course, out of order. Seven and eight, in order. And now, I might need to do this a few more times. And in fact, think for yourselves perhaps how many times maximally might I have to walk back and forth?
We'll come back to that. So two and four are still OK. Four and one, nope. So, let's swap. And again, notice visually one is kind of bubbling to the left, where it should be. Four and three swap. Four and six. Six and five swap. Six and seven. Seven and eight are good.
Good. We're getting even better. So let's see. Now, we have two and one. Of course, swap. Two and three, three and four, four and five, six and seven, seven and eight. Good. And you know what? Because I made one change there, let me do one sanity check. Let me go all the way back to the beginning. OK. One, two-- yup, see? Something was wrong. Three, four, five, six, seven, eight. And in this last pass, are you comfortable with my now claiming it is sorted? OK. Visually, that's absolutely true. But functionally, what did also just happen in that last pass that allows you to confirm that this list is indeed sorted?
What did I do or not do this last pass? AUDIENCE: There were no changes. DAVID J. MALAN: Sorry? AUDIENCE: No changes. DAVID J. MALAN: There were no changes. So it would be stupid of me to do that same algorithm again if I didn't make any changes the first time. And the state has not changed. Surely, I'm not going to make any changes the second time. And so, it's safe now to say, list is sorted.
And indeed, this is now something that we'll generally call bubble sort, whereby pairwise, you correct mistakes again, and again, and again, and you keep going back and forth, and back and forth, until you make no such swaps, at which point you can be confident, yeah, I finished fixing all of the mistakes. Let's reset and try another approach. If you guys could go back into the order you were a moment ago, which looked like this. Now, let's take an approach a little more like the exam book, whereby we were constantly selecting the letter of the alphabet that we kind of wanted to deal with next. Maybe it was a high letter, like A, or a low letter Z.
So everyone's back in this order. And now let me do this. Let's see I know I have eight numbers here. I'm going to go ahead and just deliberately select the smallest elements. Right? This seems intuitive too. Why don't I find the smallest element, put it where it belongs, then get the next smallest element, put it where it belongs, and just repeat.
Because intuitively, that should work too. So four, that's a pretty small number. I'm going to remember where this is. Wait a minute. Two is smaller. Let me now remember where two is, and forget about four. We'll deal with that later. Six, I'm not interested. Eight, I'm not interested in. One is my new small number. So I'm going to remember where one is. Three, not interested. Seven, not interested. Five, not interested.
So without falling off the stage this year, I'm going to grab number one-- and what was your name again?
ANNAN: Annan.
DAVID J. MALAN: Annan. And if you could join me at the beginning of the list, let's put you where you belong. Unfortunately-- what's your name?
STEFAN: Stefan.
DAVID J. MALAN: Stefan is in the way. So before Stefan solves this problem, what should we do? What do we do with Stefan?
AUDIENCE: [INAUDIBLE]. DAVID J. MALAN: OK. So we could do that. We could sort of take Stefan and his four, and just put it in a variable and hold on to it for some amount of time, thereby making room for number one. And that's not bad. I could suggest, why don't we just put Stefan here? Why might this violate one of the ideas we started talking about last time, last week? Yeah?
AUDIENCE: [INAUDIBLE].
DAVID J. MALAN: There's no index for it. If you think of this, indeed, as an array, this is like negative one, so there's no memory actually here if this is indeed an array, like we declared last week in lecture. So we shouldn't do this. We might store it in a variable.
Or you know what? I heard someone else suggest it. What else could we do with Stefan? Why don't we just evict him and put him over where number one was. So if you want to go over there. And indeed, this is a pretty good solution. Now on the one hand, I've kind of made the problem worse. Four is now farther away from where it should be. It should be toward this half.
But you know what? That could have been bad luck. Maybe number eight was here. And so, maybe we would have gotten lucky, and pushed eight closer to the end. So in the end of the day, it kind of all averages out. We don't need to care about four. All I care about right now is selecting the smallest element.
And now, what I'm going to do is forget about number one permanently, because I know the list behind me is now sorted. So my list was previously size eight. Now, it's of size seven. So my problem is getting smaller, albeit linearly. So now, I'm going to select the current smallest element, two. Six, eight, four, three, seven, five. That was the smallest element. So what am I going to do with-- what was your name again?
JOSEPH: Joseph.
DAVID J. MALAN: Joseph? We're going to leave Joseph in place. Now, I'm going to pretend that these guys are-- well, I know that these two are already sorted. Let's now focus on the remainder of the list. Six is the current smallest. Eight is bigger. Four is now the current smallest. Three is now the current smallest. And so now, I'm going to select three, who is-- what's your name again? SERENA: Serena. DAVID J. MALAN: Serena, if you could grab your number and swap with-- KALSANG: Kalsang. DAVID J. MALAN: Kalsang. Come on back, and we're going to swap those two. And now, let's put this on autopilot. I'm going to go and leave it to you guys to select the next smallest elements. Dun, dun, dun, dun. Number four, what should you do? Excellent. Now, I'm going to make another pass. Dun, dun, dun, dun. I see five is the next smallest. Now, I'm going take another pass. Dun, dun, dun, dun. Six is the smallest. Good. Seven is the smallest. No change. Eight is the smallest. Done.
So what we've just done by iteratively selecting one element after the other is implement something that we're going to formalize as selection sort. And it's perhaps even simpler to explain, in that literally all you want to do is just keep going back and forth through the list selecting, the next smallest element, until you're done.
So it's even simpler, perhaps intuitively, than last. Let's try one last one. If you guys could reset yourselves into the following positions one final time, let's see if we can't now formalize one other approach. In fact, would someone out there like to propose how else we might go about doing this? Without tossing out buzzwords or sort of answers that are already known, just intuitively, what could we do?
AUDIENCE: [INAUDIBLE]. DAVID J. MALAN: Yeah. So there's some great intuition there. Good things seem to happen thus far in computer science when we divide and conquer the problem of dividing it in half and half and half. And so indeed, we could start to do that. And in fact, that's going to be, we'll see, one of our best solutions yet. But let's come back to that before long. In fact, we're going to do that a little later this week. What else might we do to solve this? So everyone here is in seemingly random order. You know what? Rather than go back and forth, back and forth, back and forth each time, this feels like I'm doing a lot of walking. Why don't I just start at the beginning of the list, and just put four where it belongs? So let me assume for the moment that my list is only this first element. Is four sorted at this moment in time, if all I care about is everything here? This is sort of trivially true, right? Like the list containing one number, and that number four is obviously sorted.
So let me just stipulate that this list is sorted. But now I have the rest of this list. So now, I encounter two. Where does two obviously belong with respect to four? Before four. So what can I do here? What's your name again?
JOSEPH: Joseph.
DAVID J. MALAN: Joseph, if you could step back for just a moment with your number. And now what should Stefan do here? Let's shift Stefan over here. And now, let Joseph come in here. And now, let me claim that everything here is sorted. So, similar result, but a fundamentally different approach. I haven't even looked what's down there. I just keep taking the elements as they're handed to me, and deal with them.
So now, I see number six. Where does number six belong? We have two, four, six. Exactly where she is right now. So let's leave that alone, and now claim that this part of the list is now sorted. And so, this feels fundamentally different in that I'm just moving through the list here linearly, and I'm never doubling back. Yes. All right. So eight, where do you belong? Right here. Perfect. So now, one. Uh-oh. This feels like it's going to be expensive. Now, in the previous algorithm, I just swapped people. So I might put him all the way at the beginning, but then moved Joseph. But if I move Joseph, now what's going to be wrong?
Now, I've sort of undone-- I've taken one step forward and then one step back, because now Joseph would be out of order. So let's do this. If you could take number one and step back for just a moment. How can we put-- what was your name again?
ANNAN: Annan.
DAVID J. MALAN: Annan in place? What needs to happen with respect to two, four, six, and eight? They all need to shift. So if eight would like to shift first, then six, then four, then two. And then Annan, if you'd like to come in here, good. But here, we've just kind of paid a price at a different point in the algorithm. Whereas last time with selection sort, and even bubble sort, I'm walking back and forth, back and forth, which is certainly adding up time-wise, and literally stepwise.
Insertion sort, at first glance, looks like it's super smarter, in that I'm just making slow, incremental progress, but I'm not going this back and forth. But if someone is indeed out of order, notice all of the work I just had to do. I had to move half of the list just to make room for number one. So it's the same amount of work thus far it feels, just a different type of work.
Let's continue. So now we know that everyone between one and eight are sorted. Here, I have number three. If you like to pick up number three, step back one. And what do you guys need to do? Yep. So that's another one, two, three steps. Three units of time that just cost me, so that three can now fit. Finally, seven.
Let's go ahead and have you take a step back. This is only going to cost us one unit of time, but that's OK. And now, five's going to be a little more expensive. If you'd like to step back. We need to move eight, and seven, and six. And then everyone is now sorted. So a big hand to our volunteers here. Thank you so much.
[APPLAUSE]
Thank you all. Thank you all. So let's see now just how costly all of that was. Let's consider perhaps the simplest of these, bubble sort. And I say simplest, only because you can solve it greedily by just fix the pairwise problem here. Fix the pairwise problem here, again and again and again, repeating as many times as you actually need to.
So it turns out that with a bubble sort, well, how many steps do I have to take on the first pass of that algorithm? I might take-- let's see-- one, two, three, four, five, six, seven. And there's eight elements here. So it's like n minus 1 steps to get from the beginning of the list to the end of the list.
But with selection sort, recall that I'm selecting the elements again and again and again that's smallest, I'm putting it in place, but then I'm not looking behind me again. So I think it's a little more clear then that the first time, I might have to take all n minus 1 steps to find the smallest element. Then I put them in place, and I evict whoever was here previously.
But then I don't have to keep looking at this element, because I know it's already the smallest. So now, I can look at just seven elements, then six elements, then five elements, then four elements. And so mathematically, if n is the number of elements or numbers that we started with, you can imagine that this is the same as n minus 1, plus n minus 2 steps, plus n minus 3 steps, plus n minus 4 steps, all the way down to just one step. And I'm on my last person.
And if you recall that a lot of stats books or math books have those formulas on the hardcover back or front of them, it turns out that this series can be expressed more simply as n times n minus 1 over 2. And it's fine if that's not at the forefront of your mind. But this is indeed true. That's just a simpler way of writing it. And then if you think back to grade school, when you just start multiplying things out, this of course, is just n squared minus n divided by 2. All I've done is expand the expressions there. And so let's rewrite this a little differently. That's n squared divided by 2 minus n/2.
So again, I'm just kind of applying some arithmetic rules there. But notice now that the biggest term in this expression, so to speak, is that n squared. So yes, it's n squared divided by 2, minus n/2.
But generally, if n is going to be a big value, I'm going to claim that n squared is going to be the dominant factor. It's just going to be a bigger contributor to the number of steps than n/2. So what do I mean by this? Let's try a simple example, even though the math gets a little big.
So suppose we had 1 million people on stage, or 1 million things that we want to sort. Let's plug a million into exactly that formula to see how many steps it takes total to sort a million elements using say, selection sort.
So we'd have the same formula as before. I'd plug a million, so that I get a million squared divided by 2, minus a million divided by 2. If I do that math in advance here, we have 500 billion minus 500,000, which gives us 499,999,500,000, which is pretty darn big.
In fact, if you compare now 499 billion, 999 million, 500,000 against our original value, 500 billion, it's so damn close. Right? n squared divided by 2 gives us-- or rather, n squared divided by 2 gave us 500 billion. That's pretty darn close to 499,999,500,000, which is to say subtracting off 500,000, or more generally, subtracting off n squared, not really a big deal. The n squared makes these numbers grow really fast.
Now, this is important only insofar as we, as computer scientists, are generally not going to care so much about the nuances of these formulas and exactly what the precise answers are. We care only that, you know what? At the end of the day, this formula is on the order of n squared.
Yes, we're dividing by 2 in there. Yes, we're subtracting off n minus 2. But at the end of the day, the term that really hurts us and costs us a lot of steps is that square term. And so what a computer scientist is going to generally do is ignore all of those smaller order terms, and just look at the one that contributes the most to the cost.
And this is nice, because we can now talk in much greater generality about algorithms, and can compare them. And the fact that I'm using this O is deliberate. When I say on the order of, I'm specifically referring to something called big O. And big O is a notation that a computer scientist uses to describe an upper bound on something.
So if you say that an algorithm is in big O of n squared, as I proposed just a moment ago, that means that in terms of its running time or its efficiency, it takes on the order of n squared steps. Maybe more, maybe less. But it's on the order of n squared. And that's the upper bound. It's not going to be more painful than that. It's not going to be n cubed, or 2 to the n, or something much bigger. This is an upper bound on whatever that cost is. So given that, let's consider just a few examples. And this is just a finite list of very common running times for algorithms that's meant to be illustrative of some things we've seen already.
So for instance, in the case of selection sort, what I'm claiming here is that selection sort's running time is on the order of n squared. In the worst case, I'm going to have a whole bunch of random numbers here. And as we saw mathematically, if I keep walking through the list, through the list, selecting the next smallest element again and again, if I actually write down all of the steps I'm taking as I proposed formulaically before, it's on the order of n squared steps that I'm taking.
And it turns out that bubble sort and insertion sort are just as slow in the worst case. Consider, for instance, insertion sort, the very last algorithm we dealt with, which had us look at the element, and then insert it where it belongs. And then we looked at the next element, and inserted it where it belongs.
So consider the best possible scenario. Suppose I had my volunteers line up literally like this, one through eight, already sorted. How many steps is insertion sort going to take to sort eight people, if they arrive on stage looking like this?
Eight people already sorted. And I use insertion sort. That last of the algorithms. Well, let's reenact real fast. So if I start here, I see one. Where does one belong? It belongs right here. I see two. Where does two belong? Right here. I see three. Where does three belong? Right here.
I see four. Right here. Five, six, seven, eight. There's no reason to repeat myself. And so, how many steps is that in terms of n? It's on the order of n steps, right? n minus 1. But I took a linear number of steps, and now I'm done. So that's the best case, though. What about the worst case? What eight were over there, and seven were down there, and one and two were over here, so that the list were truly reversed?
Well, what happens indeed if this is the number? And we'll do just a couple of examples. What if, indeed, the number eight is here, and the number-- whoops. So what if, indeed, the number eight is all the way over here, and I'm using insertion sort?
OK. I claim at the moment it's in place. But now, seven-- where does seven go? Of course, it goes over here. So I have to move eight over one place. Now six, where does it go? Well, all right. Now, I have to move eight over a place, and seven over a place, and then I plop down six.
So the first time, it cost me one step to fix things, then it cost me two steps to fix things. How many steps is it going to take to fix things to put five in the right place? Three. Because now I have to move one, two, three. How many steps is it going to take to put four in the right place? 4 plus 5, plus 6, plus 7.
And so it's mathematically identical to what we described for selection sort. We have this series that's just increasing. 1 plus 2 plus 3 plus 4, or conversely, 7 plus 6 plus 5 plus 4 adds up for today's purposes to on the order of n squared.
So let me stipulate too that bubble sort is also in n squared. Because with bubble sort, each time I go through the list, I'm taking roughly how many steps? Each time I literally walk from there to there? Roughly n steps. But how many times might I need to go through the list?
Well, roughly n time. Maybe n minus 1, but roughly n times. Well, why is that? Well, with bubble sort, if we start with bubble sort, with the list in the worst possible situation, which again is completely backwards, what's going to happen? I go through the list, and number one belongs all the way over there.
But with bubble sort, how far does one move on my first pass through the list? How many spots does he get closer to the correct place? Just one. So if you kind of reason through this, every time through this algorithm, David's taking roughly n steps. But how many passes through the list is it going to take for one to bubble to the left where it belongs?
He's got to move like, n spaces this way. So just to do the sorting of the list, I have to walk back and forth n times. And each time, I'm looking at n elements. So do n things n times on the order of n squared.
Now, we'll see in some of the shorts that are embedded in CS50's next problem set, another approach at these, but for now, let's just consider some other running times, especially if the sorting ones take a little bit of time to sink in. What's an algorithm we've seen already that takes on the order of n steps?
What should take a linear number of steps that we've seen thus far? What's that? The phone directory search. The first algorithm. Right? Where we're linearly searching for Mike Smith? Indeed. From week zero, when I started turning one page at a time, and I even said that it was kind of a linear feeling algorithm, and we had that picture on the board with the straight red line and the straight yellow line, those were indeed algorithms that are in big O of n.
Because to find Mike Smith in a phone book of n pages, in the worst case, might take me n steps. What about taking attendance? One, two, three, four, five, six. What's the running time of this algorithm for taking attendance? Big O of n, because in theory I have to point everyone in the room.
Now as an aside, what about the other optimization from week zero? Two, four, six, eight, 10, 12. A computer scientist would realize, wait a minute, that's on the order of n divided by two steps. Right? Because I'm doing two people at a time. But we're going to ignore those lower order terms, and we're just going to throw away the divide by 2, and just say, big O of n for that algorithm as well.
What about this one? We'll skip over some of these, but what was an algorithm that was log of n? That took roughly log n steps? The divide and conquer. Exactly. Like the phone book example in week zero and earlier today, where we divided the problem again and again and again. We drew it on the board in week zero as a curved green line, and we said that day it was a logarithmic algorithm.
And indeed, the number of steps it takes to perform divide and conquer, or binary search as we'll start calling it, as in the phone book, is on the order of log and steps. And this is a bit of a weird one.
What takes one step, or more specifically a constant number of steps? Maybe it's two, maybe it's three, but a computer scientist just simplifies it as big O of 1, some constant number of steps. What's something you could do that takes a constant number of steps?
What's the running time of clapping? Constant time. Right? Like, what's the running time of doing anything that takes just one operation, like print F Hello World. That might be said to be constant time, unless less corner case with print F, what might the running time of print F actually be? And why? What is n measuring in that case?
AUDIENCE: [INAUDIBLE]. DAVID J. MALAN: Exactly. The number of characters we want to print. So it's very context-sensitive. Today, we've been focusing a lot on letters and numbers here on the board. But it might also be characters in an actual string. So it turns out there's another measure that will start caring about, and that's the opposite of big O, so to speak.
That's omega notation. Whereas big O means what's, the upper bound on your running time? Maximally, how much time might something take? Omega-- sorry this keeps coming up-- is the opposite of that, whereby it's a lower bound on the amount of time something might take.
So. for instance, what's an algorithm that takes always n squared steps? Well, one of the algorithms we've seen today, in fact, might be that as well. Selection sort. Selection sort's pretty stupid. Even if the algorithm-- sorry, even if the array is already sorted, selection sort is going to keep walking through the list to make sure it has the smallest element again and again and again. And even though you humans in the audience know that, wait a minute, you already passed the smallest element, the computer doesn't know that until it looks all the way through the list. Similarly, a lower bound that might also be taken into account might be linear time.
How much time does it take to sort n elements in the best case using something like bubble sort? Suppose your list is already sorted. We said bubble sort takes on the order of n squared steps. But what if it's already sorted? What if you realize after one pass through the array that you've made no swaps? Do you need to keep making more passes?
No. So a lower bound on bubble sort might be said to be linear. Omega of n. And we can look at others of these as well. So let's take a quick look at just a visualization here to see how these distinguish themselves. I'm going to go down here into this page that's available on C50's website, but it will be a pain to get working, since it uses a technology called Java applets, which is a largely unsupported these days, at least by Chrome and certain others.
And let me go ahead and speed this up and explain what's going on. This is a demonstration of bubble sort, the first algorithm we looked at. And it's a visualization in that each of these bars represents a number. The bigger the bar, the bigger the number. The smaller the bar, the smaller the number. And what you can see visually, even though this is going super fast, is that the red bar is like me, walking back and forth fixing problems. You can see that the bigger elements are indeed bubbling up to the right, and the smaller elements are bubbling up to the left. And down here, if we actually look more closely, we can actually count the number of comparisons and swaps that were being made.
But instead, let's look at the second algorithm we looked at earlier with our volunteers, selection sort. Visually, it has a very different effect. But it's, again, very intuitive, in that we keep selecting the next smallest element, and we got a little lucky. That felt fundamentally faster. But if we ran this again and again and again with lots of inputs, we would see that it's indeed still in big O of n squared.
Let's do one last one here, insertion sort, which was the third algorithm we looked at, and recall that this one deals with the elements as it encounters them, but then it maybe shifts things over to make room, inserting elements where they belong.
And this too ends up giving the final result. Now all three of those felt pretty fast. And indeed, I ran them at a pretty good clip. But fundamentally, they're all pretty horrible, to be honest. All of these algorithms thus far that run in big O of n squared take quite a bit of time to run in the end.
And indeed, we can see and feel this lastly if I pull up this third and final demo. This is another visualization that's going to show bubble sort on the left, selection sort in the middle, and something, as one of our hand raises earlier suggested, merge sort on the right. A divide and conquer strategy on the right. And that's, in fact, what we're going to look at on Wednesday. But let's time these to run in parallel. It's roughly the same number of elements, all running at the same time. Bubble sort vs selection sort vs merge sort.
Now, they're all running in theory at the same time. The CPU is running at the same speed, but you can feel how boring this is very quickly going to become, and just how fast when we inject a bit of week zero's algorithms can we speed things up.
And now let's compare these in one last form. I'm going to go ahead to CS50's website, where we have this final link for today, where someone on the internet kindly put together a video that captures what different sorting algorithms sound like. This is insertion sort.
[BEEPING]
Whereby you're applying a frequency based on the height of the bar bar. This is bubble sort.
[WARPED BEEPING]
Coming up next is-- coming up next is-- selection sort, where again, we're selecting the next smallest element, and we can see it growing from left to right.
Merge sort, our winner thus far today. Notice how it's dividing things into [INAUDIBLE] half and quarters. Gnome sort, which we have not talked about, and creates visually and audally a bit of a different shape and sound. Going back and forth, cleaning things up. Also check out heapsort on this guy's website.
And that's it. We will see you next time.
[WHOOSHING AND MUSIC]