1
00:00:00,000 --> 00:00:03,346
>> [MUSIC PLAYING]
2
00:00:03,346 --> 00:00:05,258
3
00:00:05,258 --> 00:00:06,220
>> DOUG LLOYD: All right.
4
00:00:06,220 --> 00:00:08,140
So binary search is an
algorithm we can use
5
00:00:08,140 --> 00:00:10,530
to find an element inside of an array.
6
00:00:10,530 --> 00:00:14,710
Unlike linear search, it requires a
special condition be met beforehand,
7
00:00:14,710 --> 00:00:19,020
but it's so much more efficient if
that condition is, in fact, met.
8
00:00:19,020 --> 00:00:20,470
>> So what's the idea here?
9
00:00:20,470 --> 00:00:21,780
it's divide and conquer.
10
00:00:21,780 --> 00:00:25,100
We want to reduce the size of
the search area by half each time
11
00:00:25,100 --> 00:00:27,240
in order to find a target number.
12
00:00:27,240 --> 00:00:29,520
This is where that condition
comes into play, though.
13
00:00:29,520 --> 00:00:32,740
We can only leverage the power of
eliminating half of the elements
14
00:00:32,740 --> 00:00:36,070
without even looking at
them if the array is sorted.
15
00:00:36,070 --> 00:00:39,200
>> If it's a complete mix up,
we can't just out of hand
16
00:00:39,200 --> 00:00:42,870
discard half of the elements, because
we don't know what we're discarding.
17
00:00:42,870 --> 00:00:45,624
But if the array is sorted,
we can do that, because we
18
00:00:45,624 --> 00:00:48,040
know that everything to the
left of where we currently are
19
00:00:48,040 --> 00:00:50,500
must be lower than the
value we're currently at.
20
00:00:50,500 --> 00:00:52,300
And everything to the
right of where we are
21
00:00:52,300 --> 00:00:55,040
must be greater than the value
we're currently looking at.
22
00:00:55,040 --> 00:00:58,710
>> So what's the pseudocode
steps for binary search?
23
00:00:58,710 --> 00:01:02,310
We repeat this process until the
array or, as we proceed through,
24
00:01:02,310 --> 00:01:07,740
sub arrays, smaller pieces of
the original array, is of size 0.
25
00:01:07,740 --> 00:01:10,960
Calculate the midpoint
of the current sub array.
26
00:01:10,960 --> 00:01:14,460
>> If the value you're looking for is
in that element of the array, stop.
27
00:01:14,460 --> 00:01:15,030
You found it.
28
00:01:15,030 --> 00:01:16,550
That's great.
29
00:01:16,550 --> 00:01:19,610
Otherwise, if the target is
less than what's at the middle,
30
00:01:19,610 --> 00:01:23,430
so if the value we're looking
for is lower than what we see,
31
00:01:23,430 --> 00:01:26,780
repeat this process again, but
change the end point, instead
32
00:01:26,780 --> 00:01:29,300
of being the original
complete full array,
33
00:01:29,300 --> 00:01:34,110
to be just to the left
of where we just looked.
34
00:01:34,110 --> 00:01:38,940
>> We knew that the middle was too high,
or the target was less than the middle,
35
00:01:38,940 --> 00:01:42,210
and so it must exist, if it
exists at all in the array,
36
00:01:42,210 --> 00:01:44,660
somewhere to the left of the midpoint.
37
00:01:44,660 --> 00:01:48,120
And so we'll set the array
location just to the left
38
00:01:48,120 --> 00:01:51,145
of the midpoint as the new end point.
39
00:01:51,145 --> 00:01:53,770
Conversely, if the target is
greater than what's at the middle,
40
00:01:53,770 --> 00:01:55,750
we do the exact same
process, but instead we
41
00:01:55,750 --> 00:01:59,520
change the start point to be
just to the right of the midpoint
42
00:01:59,520 --> 00:02:00,680
we just calculated.
43
00:02:00,680 --> 00:02:03,220
And then, we begin the process again.
44
00:02:03,220 --> 00:02:05,220
>> Let's visualize this, OK?
45
00:02:05,220 --> 00:02:08,620
So there's a lot going and on here,
but here's an array of the 15 elements.
46
00:02:08,620 --> 00:02:11,400
And we're going to be keeping track
of a lot more stuff this time.
47
00:02:11,400 --> 00:02:13,870
So in linear search, we were
just caring about a target.
48
00:02:13,870 --> 00:02:15,869
But this time we want to
care about where are we
49
00:02:15,869 --> 00:02:18,480
starting to look, where
are we stopping looking,
50
00:02:18,480 --> 00:02:21,876
and what's the midpoint
of the current array.
51
00:02:21,876 --> 00:02:23,250
So here we go with binary search.
52
00:02:23,250 --> 00:02:25,290
We're pretty much good to go, right?
53
00:02:25,290 --> 00:02:28,650
I'm just going to put down
below here a set of indices.
54
00:02:28,650 --> 00:02:32,430
This is basically just what element
of the array we're talking about.
55
00:02:32,430 --> 00:02:34,500
With linear search, we
care, inasmuch as we
56
00:02:34,500 --> 00:02:36,800
need to know how many
elements we're iterating over,
57
00:02:36,800 --> 00:02:40,010
but we don't actually care what
element we're currently looking at.
58
00:02:40,010 --> 00:02:41,014
In binary search, we do.
59
00:02:41,014 --> 00:02:42,930
And so those are just
there as a little guide.
60
00:02:42,930 --> 00:02:44,910
>> So we can start, right?
61
00:02:44,910 --> 00:02:46,240
Well, not quite.
62
00:02:46,240 --> 00:02:48,160
Remember what I said
about binary search?
63
00:02:48,160 --> 00:02:50,955
We can't do it on an
unsorted array or else,
64
00:02:50,955 --> 00:02:55,820
we are not guaranteeing that the
certain elements or values aren't
65
00:02:55,820 --> 00:02:57,650
being accidentally
discarded when we just
66
00:02:57,650 --> 00:02:59,920
decide to ignore half of the array.
67
00:02:59,920 --> 00:03:02,574
>> So step one with binary search
is you must have a sorted array.
68
00:03:02,574 --> 00:03:05,240
And you can use any of the sorting
algorithms we've talked about
69
00:03:05,240 --> 00:03:06,700
to get you to that position.
70
00:03:06,700 --> 00:03:10,370
So now, we're in a position where
we can perform binary search.
71
00:03:10,370 --> 00:03:12,560
>> So let's repeat the process
step by step and keep
72
00:03:12,560 --> 00:03:14,830
track of what's happening as we go.
73
00:03:14,830 --> 00:03:17,980
So the first we need to do is calculate
the midpoint of the current array.
74
00:03:17,980 --> 00:03:20,620
Well, we'll say we're, first of
all, looking for the value 19.
75
00:03:20,620 --> 00:03:22,290
We're trying to find the number 19.
76
00:03:22,290 --> 00:03:25,380
The first element of this
array is located at index zero,
77
00:03:25,380 --> 00:03:28,880
and the last element of this
array is located at index 14.
78
00:03:28,880 --> 00:03:31,430
And so we'll call those start and end.
79
00:03:31,430 --> 00:03:35,387
>> So we calculate the midpoint by
adding 0 plus 14 divided by 2;
80
00:03:35,387 --> 00:03:36,720
pretty straightforward midpoint.
81
00:03:36,720 --> 00:03:40,190
And we can say that
the midpoint is now 7.
82
00:03:40,190 --> 00:03:43,370
So is 15 what we're looking for?
83
00:03:43,370 --> 00:03:43,940
No, it's not.
84
00:03:43,940 --> 00:03:45,270
We're looking for 19.
85
00:03:45,270 --> 00:03:49,400
But we know that 19 is greater
than what we found at the middle.
86
00:03:49,400 --> 00:03:52,470
>> So what we can do is
change the start point
87
00:03:52,470 --> 00:03:57,280
to be just to the right of the
midpoint, and repeat the process again.
88
00:03:57,280 --> 00:04:01,690
And when we do that, we now say the
new start point is array location 8.
89
00:04:01,690 --> 00:04:07,220
What we've effectively done is
ignored everything to the left of 15.
90
00:04:07,220 --> 00:04:09,570
We've eliminated half
of the problem, and now,
91
00:04:09,570 --> 00:04:13,510
instead of having to search
over 15 elements in our array,
92
00:04:13,510 --> 00:04:15,610
we only have to search over 7.
93
00:04:15,610 --> 00:04:17,706
So 8 is the new start point.
94
00:04:17,706 --> 00:04:19,600
14 is still the end point.
95
00:04:19,600 --> 00:04:21,430
>> And now, we go over this again.
96
00:04:21,430 --> 00:04:22,810
We calculate the new midpoint.
97
00:04:22,810 --> 00:04:27,130
8 plus 14 is 22, divided by 2 is 11.
98
00:04:27,130 --> 00:04:28,660
Is this what we're looking for?
99
00:04:28,660 --> 00:04:30,110
No, it's not.
100
00:04:30,110 --> 00:04:32,930
We're looking for a value that's
less than what we just found.
101
00:04:32,930 --> 00:04:34,721
So we're going to repeat
the process again.
102
00:04:34,721 --> 00:04:38,280
We're going to change the end point to
be just to the left of the midpoint.
103
00:04:38,280 --> 00:04:41,800
So the new end point becomes 10.
104
00:04:41,800 --> 00:04:44,780
And now, that's the only part of
the array we have to sort through.
105
00:04:44,780 --> 00:04:48,460
So we have now eliminated
12 of the 15 elements.
106
00:04:48,460 --> 00:04:51,550
We know that if 19
exists in the array, it
107
00:04:51,550 --> 00:04:57,210
must exist somewhere between element
number 8 and element number 10.
108
00:04:57,210 --> 00:04:59,400
>> So we calculate the new midpoint again.
109
00:04:59,400 --> 00:05:02,900
8 plus 10 is 18, divided by 2 is 9.
110
00:05:02,900 --> 00:05:05,074
And in this case, look, the
target is at the middle.
111
00:05:05,074 --> 00:05:06,740
We found exactly what we're looking for.
112
00:05:06,740 --> 00:05:07,780
We can stop.
113
00:05:07,780 --> 00:05:10,561
We successfully completed
a binary search.
114
00:05:10,561 --> 00:05:11,060
All right.
115
00:05:11,060 --> 00:05:13,930
So we know this algorithm
works if the target is
116
00:05:13,930 --> 00:05:16,070
somewhere inside of the array.
117
00:05:16,070 --> 00:05:19,060
Does this algorithm work if
the target is not in the array?
118
00:05:19,060 --> 00:05:20,810
Well, let's start it
again, and this time,
119
00:05:20,810 --> 00:05:23,380
let's look for the element
16, which visually we can see
120
00:05:23,380 --> 00:05:25,620
does not exist anywhere in the array.
121
00:05:25,620 --> 00:05:27,110
>> The start point is again 0.
122
00:05:27,110 --> 00:05:28,590
The end point is again 14.
123
00:05:28,590 --> 00:05:32,490
Those are the indices of the first and
last elements of the complete array.
124
00:05:32,490 --> 00:05:36,057
And we'll go through the process we just
went through again, trying to find 16,
125
00:05:36,057 --> 00:05:39,140
even though visually, we can already
tell that it's not going to be there.
126
00:05:39,140 --> 00:05:43,450
We just want to make sure this algorithm
will, in fact, still work in some way
127
00:05:43,450 --> 00:05:46,310
and not just leave us
stuck in an infinite loop.
128
00:05:46,310 --> 00:05:48,190
>> So what's the step first?
129
00:05:48,190 --> 00:05:50,230
Calculate the midpoint
of the current array.
130
00:05:50,230 --> 00:05:52,790
What's the midpoint
of the current array?
131
00:05:52,790 --> 00:05:54,410
Well, it's 7, right?
132
00:05:54,410 --> 00:05:57,560
14 plus 0 divided by 2 is 7.
133
00:05:57,560 --> 00:05:59,280
Is 15 what we're looking for?
134
00:05:59,280 --> 00:05:59,780
No.
135
00:05:59,780 --> 00:06:02,930
It's pretty close, but we're looking
for a value slightly bigger than that.
136
00:06:02,930 --> 00:06:06,310
>> So we know that it's going to
be nowhere to the left of 15.
137
00:06:06,310 --> 00:06:08,540
The target is greater than
what's in the midpoint.
138
00:06:08,540 --> 00:06:12,450
And so we set the new start point to
be just to the right of the middle.
139
00:06:12,450 --> 00:06:16,130
The midpoint is currently 7, so
let's say the new start point is 8.
140
00:06:16,130 --> 00:06:18,130
And what we've effectively
done again is ignored
141
00:06:18,130 --> 00:06:21,150
the entire left half of the array.
142
00:06:21,150 --> 00:06:23,750
>> Now, we repeat the
process one more time.
143
00:06:23,750 --> 00:06:24,910
Calculate the new midpoint.
144
00:06:24,910 --> 00:06:29,350
8 plus 14 is 22, divided by 2 is 11.
145
00:06:29,350 --> 00:06:31,060
Is 23 what we're looking for?
146
00:06:31,060 --> 00:06:31,870
Unfortunately, no.
147
00:06:31,870 --> 00:06:34,930
We're looking for a value
that is less than 23.
148
00:06:34,930 --> 00:06:37,850
And so in this case, we're going
to change the end point to be just
149
00:06:37,850 --> 00:06:40,035
to the left of the current midpoint.
150
00:06:40,035 --> 00:06:43,440
The current midpoint is 11, and
so we'll set the new end point
151
00:06:43,440 --> 00:06:46,980
for the next time we go
through this process to 10.
152
00:06:46,980 --> 00:06:48,660
>> Again, we go through the process again.
153
00:06:48,660 --> 00:06:49,640
Calculate the midpoint.
154
00:06:49,640 --> 00:06:53,100
8 plus 10 divided by 2 is 9.
155
00:06:53,100 --> 00:06:54,750
Is 19 what we're looking for?
156
00:06:54,750 --> 00:06:55,500
Unfortunately, no.
157
00:06:55,500 --> 00:06:58,050
We're still looking for
a number less than that.
158
00:06:58,050 --> 00:07:02,060
So we'll change the end point this time
to be just to the left of the midpoint.
159
00:07:02,060 --> 00:07:05,532
The midpoint is currently 9,
so the end point will be 8.
160
00:07:05,532 --> 00:07:07,920
And now, we're just looking
at a single element array.
161
00:07:07,920 --> 00:07:10,250
>> What's the midpoint of this array?
162
00:07:10,250 --> 00:07:13,459
Well, it starts at 8, it
ends at 8, the midpoint is 8.
163
00:07:13,459 --> 00:07:14,750
Is that what we're looking for?
164
00:07:14,750 --> 00:07:16,339
Are we looking for 17?
165
00:07:16,339 --> 00:07:17,380
No, we're looking for 16.
166
00:07:17,380 --> 00:07:20,160
So if it exists in the array,
it must exist somewhere
167
00:07:20,160 --> 00:07:21,935
to the left of where we currently are.
168
00:07:21,935 --> 00:07:23,060
So what are we going to do?
169
00:07:23,060 --> 00:07:26,610
Well, we'll set the end point to be just
to the left of the current midpoint.
170
00:07:26,610 --> 00:07:29,055
So we'll change the end point to 7.
171
00:07:29,055 --> 00:07:32,120
Do you see what just
happened here, though?
172
00:07:32,120 --> 00:07:33,370
Look up here now.
173
00:07:33,370 --> 00:07:35,970
>> Start is now greater than end.
174
00:07:35,970 --> 00:07:39,620
Effectively, the two ends
of our array have crossed,
175
00:07:39,620 --> 00:07:42,252
and the start point is
now after the end point.
176
00:07:42,252 --> 00:07:43,960
Well, that doesn't
make any sense, right?
177
00:07:43,960 --> 00:07:47,960
So now, what we can say is we
have a sub array of size 0.
178
00:07:47,960 --> 00:07:50,110
And once we're gotten to
this point, we can now
179
00:07:50,110 --> 00:07:53,940
guarantee that element 16
does not exist in the array,
180
00:07:53,940 --> 00:07:56,280
because the start point
and end point have crossed.
181
00:07:56,280 --> 00:07:58,340
And so it's not there.
182
00:07:58,340 --> 00:08:01,340
Now, notice that this is slightly
different than the start point and end
183
00:08:01,340 --> 00:08:02,900
point being the same.
184
00:08:02,900 --> 00:08:05,030
If we had been looking
for 17, it would have
185
00:08:05,030 --> 00:08:08,870
been in the array, and the start point
and end point of that last iteration
186
00:08:08,870 --> 00:08:11,820
before those points crossed,
we would have found 17 there.
187
00:08:11,820 --> 00:08:15,510
It's only when they cross that we can
guarantee that the element does not
188
00:08:15,510 --> 00:08:17,180
exist in the array.
189
00:08:17,180 --> 00:08:20,260
>> So let's take a lot fewer
steps than linear search.
190
00:08:20,260 --> 00:08:23,250
In the worst case scenario, we had
to split up a list of n elements
191
00:08:23,250 --> 00:08:27,770
repeatedly in half to find the target,
either because the target element
192
00:08:27,770 --> 00:08:33,110
will be somewhere in the last
division, or it doesn't exist at all.
193
00:08:33,110 --> 00:08:37,830
So in the worst case, we have to
split up the array-- do you know?
194
00:08:37,830 --> 00:08:40,510
Log of n times; we
have to cut the problem
195
00:08:40,510 --> 00:08:42,610
in half a certain number of times.
196
00:08:42,610 --> 00:08:45,160
That number of times is log n.
197
00:08:45,160 --> 00:08:46,510
>> What's the best case scenario?
198
00:08:46,510 --> 00:08:48,899
Well, the first time we
calculate the midpoint,
199
00:08:48,899 --> 00:08:50,190
we find what we're looking for.
200
00:08:50,190 --> 00:08:52,150
In all the previous
examples on binary search
201
00:08:52,150 --> 00:08:55,489
we've just gone over, if we had
been looking for the element 15,
202
00:08:55,489 --> 00:08:57,030
we would have found that immediately.
203
00:08:57,030 --> 00:08:58,321
That was at the very beginning.
204
00:08:58,321 --> 00:09:01,200
That was the midpoint of
the first attempt at a split
205
00:09:01,200 --> 00:09:03,950
of a division in binary search.
206
00:09:03,950 --> 00:09:06,350
>> And so in the worst
case, binary search runs
207
00:09:06,350 --> 00:09:11,580
in log n, which is substantially better
than linear search, in the worst case.
208
00:09:11,580 --> 00:09:16,210
In the best case, binary
search runs in omega of 1.
209
00:09:16,210 --> 00:09:18,990
So binary search is a lot
better than linear search,
210
00:09:18,990 --> 00:09:23,270
but you have to deal with the process of
sorting your array first before you can
211
00:09:23,270 --> 00:09:26,140
leverage the power of binary search.
212
00:09:26,140 --> 00:09:27,130
>> I'm Doug Lloyd.
213
00:09:27,130 --> 00:09:29,470
This is CS 50.
214
00:09:29,470 --> 00:09:31,070