1
00:00:00,000 --> 00:00:00,499
2
00:00:00,499 --> 00:00:04,290
ZAMYLA CHAN: Let's find
the needle in the haystack.
3
00:00:04,290 --> 00:00:07,920
Today in find.c we're going to
help implement a program that
4
00:00:07,920 --> 00:00:10,590
prompts the user for
numbers to fill the haystack
5
00:00:10,590 --> 00:00:13,270
and then searches the
haystack for a needle.
6
00:00:13,270 --> 00:00:17,300
Now, find.c calls sort and
search, which are both functions
7
00:00:17,300 --> 00:00:19,010
to find in helpers.c.
8
00:00:19,010 --> 00:00:20,880
It's actually helpers.c
where we're going
9
00:00:20,880 --> 00:00:23,980
to be doing our programming today.
10
00:00:23,980 --> 00:00:25,690
So what do we have to do?
11
00:00:25,690 --> 00:00:27,700
Well, first we need to
implement search, which
12
00:00:27,700 --> 00:00:31,670
is a function that returns true if
the value is found in the haystack
13
00:00:31,670 --> 00:00:33,530
and false otherwise.
14
00:00:33,530 --> 00:00:37,820
And then we also have to do sort,
which sorts the values in the array
15
00:00:37,820 --> 00:00:39,510
in incrementing order.
16
00:00:39,510 --> 00:00:41,500
So first let's talk about search.
17
00:00:41,500 --> 00:00:44,220
If you look at the
distribution code in helpers.c
18
00:00:44,220 --> 00:00:47,750
you'll find a preexisting
linear search implementation
19
00:00:47,750 --> 00:00:50,700
but we know that we can
do much better than this.
20
00:00:50,700 --> 00:00:55,500
Linear search is in O(n) time,
which means that it's quite slow.
21
00:00:55,500 --> 00:01:00,320
Instead of slowly iterating over
every element of the list up until n,
22
00:01:00,320 --> 00:01:02,630
let's talk about binary search.
23
00:01:02,630 --> 00:01:06,560
Now, linear search can search any
list, whereas binary search requires
24
00:01:06,560 --> 00:01:08,850
that the list is sorted already.
25
00:01:08,850 --> 00:01:13,600
But as long as you have a sorted list
then binary search operates in O(log n)
26
00:01:13,600 --> 00:01:15,900
time, which is quite efficient.
27
00:01:15,900 --> 00:01:17,850
Let's take a look at an example.
28
00:01:17,850 --> 00:01:22,270
Here I've populated an integer array
of size nine with nine integers
29
00:01:22,270 --> 00:01:25,000
and let's say we're
looking for the number 17.
30
00:01:25,000 --> 00:01:27,720
Now in this example, if we
were to do a linear search
31
00:01:27,720 --> 00:01:32,570
then it would take us 7 checks starting
from index 0 all the way till index 7
32
00:01:32,570 --> 00:01:34,120
to find that number.
33
00:01:34,120 --> 00:01:36,940
So let's see how we can implement
this using binary search
34
00:01:36,940 --> 00:01:39,380
and find that needle a lot faster.
35
00:01:39,380 --> 00:01:42,780
In binary search we start by
looking at the whole array.
36
00:01:42,780 --> 00:01:46,910
So from the left side of the array to
the right side, we look at the entirety
37
00:01:46,910 --> 00:01:48,390
and find the middle.
38
00:01:48,390 --> 00:01:50,560
In this case, the
middle is that index 4.
39
00:01:50,560 --> 00:01:52,310
The value is 10.
40
00:01:52,310 --> 00:01:55,470
So the assumption under binary
search of the list is sorted.
41
00:01:55,470 --> 00:01:59,000
So while 10 is not 17 and
because the list is sorted,
42
00:01:59,000 --> 00:02:03,310
we know that we don't need to look at
the left half of the array anymore.
43
00:02:03,310 --> 00:02:05,360
So similar to the way
that we just throw aside
44
00:02:05,360 --> 00:02:08,850
one half of the phone book in
the beginning of this course,
45
00:02:08,850 --> 00:02:13,500
we can just move our left
bound all the way to the middle
46
00:02:13,500 --> 00:02:17,470
but we've already checked the middle so
we can get a little bit more efficient
47
00:02:17,470 --> 00:02:19,660
and say, well since we've
already checked the middle
48
00:02:19,660 --> 00:02:22,300
we know that's not part of the
array that we're looking for
49
00:02:22,300 --> 00:02:26,380
so let's push they're bound to
just beyond where the middle was.
50
00:02:26,380 --> 00:02:31,420
So now we have our left bound at index
5 and our right bound stays the same.
51
00:02:31,420 --> 00:02:35,020
Let's calculate the middle and
we find, truncating the integer,
52
00:02:35,020 --> 00:02:37,460
that the middle is at index 6.
53
00:02:37,460 --> 00:02:39,850
So we check that value there, 16.
54
00:02:39,850 --> 00:02:43,640
16 is not 17, in fact
it's less than 17, which
55
00:02:43,640 --> 00:02:48,410
means that we can discard the left half
again and just look at the right side.
56
00:02:48,410 --> 00:02:53,970
So then we move our left bound all the
way to index 7 and the right bounds
57
00:02:53,970 --> 00:02:55,400
stays at 8.
58
00:02:55,400 --> 00:02:58,750
Calculating the middle between
7 and 8 brings us to 7.
59
00:02:58,750 --> 00:03:03,460
We check that value and
indeed we found our needle.
60
00:03:03,460 --> 00:03:06,990
So let's talk about some
pseudocode for this.
61
00:03:06,990 --> 00:03:09,060
Essentially, while
the length of the list
62
00:03:09,060 --> 00:03:12,240
is greater than zero, that is
we still have things to look at,
63
00:03:12,240 --> 00:03:14,550
we want to look at
the middle of the list
64
00:03:14,550 --> 00:03:18,550
and if that middle is our needle if
the number is found then that's great
65
00:03:18,550 --> 00:03:20,260
and we can return true.
66
00:03:20,260 --> 00:03:23,660
Otherwise we want to either
search left or search right
67
00:03:23,660 --> 00:03:27,230
depending on whether the number is
higher or lower than our needle.
68
00:03:27,230 --> 00:03:30,780
And then finally, if
we've executed all of that
69
00:03:30,780 --> 00:03:33,050
and we don't have any
more list remaining then
70
00:03:33,050 --> 00:03:35,360
that means that we
return false because that
71
00:03:35,360 --> 00:03:39,770
means that the needle is definitely
not in that particular haystack.
72
00:03:39,770 --> 00:03:40,780
Congratulations.
73
00:03:40,780 --> 00:03:45,780
You've implemented a fast search,
and now we can move on to sort.
74
00:03:45,780 --> 00:03:47,090