DataAnalysis--FullPrograms.mp4

Algebra Nspirations

Data Analysis and Statistics

A recent study by the Environmental Protection

Agency found that Americans waste an estimated 27% of the

food available for consumption each year.

And Teen Health and the Media posted on their website that

the average American teen spends about 20 hours a week

watching television.

That's equivalent to 43 24-hour days, or 12% of an

entire year.

According to the National Lightning Safety Institute,

the odds of being struck by lightning is

about 1 in 300 million.

And according to the National Highway Safety Administration,

the probability of a teenage driver in the US having an

accident in any 6-month period is 30%.

The first two facts are statistics.

The latter two are probabilities.

We're confronted daily with quantitative information of

this kind, be it in the form of words, numbers,

percentages, graphs, charts, or table.

And we make our decisions based on these data

about our health, our education, our vacation

activities, our political choices, our

environment, and much more.

In Introductory Algebra, data analysis is almost synonymous

to statistics.

In this lesson, we'll take a look at some of the basic

notions of probability and statistics, and investigate

some real-world situations.

Probability and statistics are related but different.

I'll begin with probability.

The field rose from the dice tables of France when the

Chevalier de Mere brought his question about a gambling game

to the attention of Blaise Pascal and Pierre de Fermat.

In their ensuing written correspondence, these famous

French mathematicians laid the foundation for the theory of

probability.

The word "probability" mean likelihood, odds, or chance,

like in the chance of a thunderstorm tonight is 70%.

As a field of study, however, probability theory is a branch

of mathematics concerned when analyzing random phenomena.

So uncertainty and randomness are central ideas of

probability.

The rise of statistics to the status of a science is less

precise and more recent.

Early applications revolved around the needs of state

governments to base decisions on demographic

and economic data.

This explains the common root "stat" found in state and

statistics.

Statistics in the plural form are numerical facts or data

such as averages.

In the singular form, statistics is the science that

uses data to find out things about our world.

So while both probability and statistics are concerned with

the frequency of events, they go about it in different ways.

In probability, we begin with a model based on

mathematical theory.

And from there, we make predictions about events will

occur and assign them probabilities.

In statistics, we begin by representing, analyzing, and

interpreting data collected from the real world.

And from there, draw conclusions about the rules of

the underlying model.

To illustrate this difference, a probabilist would theorize

that given the mathematical structure of a cube, rolling

any one of these numbers is the same.

She would therefore assign that probability 1/6 to

obtaining any one of the numbers 1 through

6 on any one roll.

A statistician, on the other hand,

takes nothing for granted.

He'd begin by setting up an experiment that consists of

rolling this number cube many, many times.

He'd then record those observations, analyze the

information.

And based on the data, conclude if this is

a fair die or not.

Finally, he'd use probability theory to make predictions.

During World War II, while confined to a prison camp, the

English mathematician John Kerrich flipped a coin 10,000

times to prove the common sense notion that repeatedly

flipping a coin results in heads up about half

or 50% of the time.

His common sense notion has a mathematical name.

It's called the law of large numbers.

First, a word on randomness.

Gravity is an example of something that's

certainly not random.

If I drop this ball, it will fall at the same speed and in

the same direction every single time.

If I flip this coin, on the other hand, I can't predict

what I will get on each trial.

It's not like dropping a ball.

But while I can't predict the outcome of any individual

trial, I can predict a long-term pattern.

That's the signature of randomness.

Random doesn't mean completely haphazard or chaotic, it means

there exists a regularity that only appears over many, many

repetitions.

And it's this type of order or regularity that we describe

mathematically using probability.

Let's use the Nspire to simulate the theorem of

probability called the law of large numbers.

Turn on the TI-Nspire.

Press the Home key for a new document.

If a document is open, you'll be prompted to save it.

Decide, then select 1 to create a calculator page.

Press Menu.

And under Probability, select Random.

And then select Integer.

RandInt

short for random integer

appears.

Enter 0 comma 1 comma and the right parenthesis.

This command will randomly display numbers

ranging from 0 to 1.

But since there are integers, there are only two

possibilities.

Press Enter.

And indeed, you obtain 0 or 1.

I got a 1.

Press Enter several more times to see what happens.

You get a list of 0's and 1's randomly

generated by the computer.

If we let 1 equal heads and 0 equal tails, this then can

also be viewed as the list of coin-tossing outcomes after

several random trials of your experiment.

Let's add another feature.

This time, we'll use another path to access randInt.

Press the Catalog key.

Press 1 for the first tab.

An alphabetical list of commands appears.

Press R to move to those beginning with the letter R.

Scroll down a bit with the Down Arrow and select randInt.

Press Enter.

Input 0 comma 1 comma 15.

Then press Enter as the right paren is

automatically inserted.

As you can see, inserting a third integer, n, tells the

handheld to execute n trials.

Enclosed in braces, you have the 15 outcomes of the

simulated coin toss.

The next step will be to simulate flipping a coin 500

times for each of 50 trials, for a total of 50 times 500,

or 25,000 flips.

And on each trial, we'll record the number of heads.

To generate the trial numbers 1 through 50 in column A,

we'll use the fact that each consecutive counting number

equals the previous one plus 1.

To generate the number of heads on each trial, we'll

flip the coin 500 times using randInt, and then add up the

500 numbers.

Since heads are 1 and tails 0, the sum will yield the sum of

the 1's, which is the sum of all heads.

Here we go.

Press Home to insert a list and spreadsheet page.

Scroll to the top of column A and type t

for the trial number.

Press Enter.

In the formula line

the gray row

we need a formula that will generate numbers 1 through 50

for the 50 simulated trials.

Here's one way of doing this.

Press Menu.

And under Data, select Generate Sequence.

You will then use the n-th term of the sequence.

In this case, the n-th trial.

Each trial number u of n is one more than

the previous one.

So type n for u of n.

As usual, we use Tab to scroll down.

Enter 1 for the starting value.

Enter 50 for the maximum number of trials.

Then, Tab Down and click OK.

You've generated the trial numbers.

Use the nav pad to move to the top of column B and type N-O-H

for number of heads.

In cell B1 not the formula cell

you're going to type in sum of randInt of 0, 1, and 500,

which will give us the number of heads.

So here we go.

Type in the equal sign S-U-M for sum.

Then press Catalog and select randInt.

0 comma 1 500 and the right paren twice.

Now we have the number of heads from the first trial.

To do the same for each of the 50 trials, press Menu.

And under Data, select Fill down.

Now press the Down Arrow until cell B50.

Then press Enter.

Notice the time clock, indicating that the Nspire is

executing all 50 trials of 500 coin tosses, and recording the

corresponding number of heads for each trial.

They now appear in the B cell.

So far in column B, we have the number of heads in 500

flips that correspond to each of the 50 trials.

Our last step is to show that as we move from 500 flips to

1,000 to 1,500 and so on, all the way to 25,000 flips, the

ratio of the cumulative heads to the total number of flips

closer and closer to 1/2, or 0.5, or 50%.

This chart will help you understand the

formula we will be using.

We'll create a third column C of the ratios of cumulating

heads over cumulating flips.

The first ratio is simply H1 over 500.

In the second, it's the heads H1 plus H2 over 500 times 2,

or 1,000 flips.

After three trials, the ratio of heads is the sum of H1 plus

H2 plus H3 over 500 times 3, or 1,500 trials and so on.

One more comment before we resume.

In the formula we'll use on the Nspire, the variable A

stands for the numbers in column A. And the variable B

stands for the numbers in column B. I think we're ready.

Use the nav pad to move to the top of column C and type r for

ratio of total heads to total flips.

Then move down to the formula row.

We'll enter the ratio you see here.

So first press Control and the division symbol for a fraction

placeholder.

Type C-U-M-, short for cumulative, followed by S-U-M,

left paren B and right paren.

In the denominator, enter 500 times A,

the column A variable.

Finally, press the Right Arrow and press Enter.

You now have the ratio of the total number of heads to the

total number of flips.

So 500 flips, a 1,000 , 1,500, and so on, all the way to

25,000 on the 50th trial.

Pretty powerful machine.

To complete this simulation, let's make a scatter

plot of this data.

Press the Home key and select Data & Statistics page.

Again, it will take a few seconds.

Now you have what appears to be a mess.

Press the Down Arrow and you'll see

Click to Add Variable.

Click to select t for the trial number.

Then, with the nav pad, navigate over to the y-axis

and click to Add r for ratio when you see the same box.

As expected, as the number of flips increases from 500 on

trial 1 to 25,000 on trial 50, the ratio of the total number

of heads to the total number of flips approaches the

horizontal line of 0.5 or 50%.

In his prison cell, John Kerrich carried out a similar

experiment, but he flipped his coin 10,000 times without the

help of a computer simulation.

He did it all by hand.

If you count about 6 seconds for flipping and recording

each outcome, without counting the rests in between, that

alone is over 40 hours.

After 10 flips, he got 4 heads.

After 30, his heads to flips ratio was 56.7%.

And after 10,000 flips, he got heads 50.67% of

the time, or 0.5067.

His ratio also approached 0.5.

Let's finish with a little vocabulary.

In this experiment, we have two possible

outcomes, heads and tails.

The sample space S is the set of all possible outcomes.

Here, the sample space has two elements.

We use braces to denote a set.

Any combination of outcomes is an event.

For example, obtaining heads when tossing one coin or

obtaining two heads when tossing two coins.

To every event E, the probability function P of E

assigns a number between 0 and 1.

For example, the probability of obtaining heads when

tossing one coin is 0.5 for 50%.

But the probability of obtaining two heads when

tossing two coins is 0.25 for 25%.

See if you can figure that one out.

Finally, the probability of an impossible event, such as

obtaining heads and tails when tossing one coin is 0.

And the probability of a certain event is 1 for 100%.

On to part two for a closer look at statistics.

In order to analyze, interpret, and draw inferences

from real-world observations, statisticians must first

collect, organize, and represent their data in

meaningful ways that make them easier

to analyze and interpret.

In the case of univariate data analysis, or data analysis of

a single variable, they can create dot charts, bar graphs,

box-and-whisker plots, pie charts, and other visually

effective representations.

In the case of bivariate data analysis, or data involving

two variables, statistical regression is a staple of

statistics.

I will show you three representations available to

you for univariate data sets.

Suppose you jotted down the breed of dog each student in

your class owns.

Let's use the Nspire to organize this data in ways you

may not have seen on other technology.

We'll just use the first letter of each breed.

And we'll use the letter n for none.

Press the Home key for a new document.

Save the previous one if you wish, then create a list and

spreadsheet page.

Scroll to the top of column A and type B-R-E-E-D for our

x-variable.

Press Enter, or the Down Arrow, twice.

Pause the video to enter the dog data in cells A1 through

A20, then resume viewing.

You don't need to select the column data, because no matter

what command you enter, there's no ambiguity about

which variables you want to use.

So press Menu and under Data, select Quick Graph.

Instantly, you have a bar graph where each bar is made

up of dots, sometimes called a dot chart.

Now for the fun part.

Press Menu.

And under Plot Type, select Pie Chart.

And there you have it.

To magnify the pie chart, under Page Layout, select

Custom Split.

Arrow left until you're about 3/4 of the way over.

Then, press Enter.

Use the nav pad to move the pointer to a

particular pie slice.

Click and hold the segment to see the

summary of that category.

The summary displays the number of cases in that

category as well as the percentage that the category

represents among all the cases.

Lastly, press Menu.

And under Plot Type, select Bar Chart.

This is the classical-looking bar graph.

The count is indicated on the vertical axis.

And if you click and hold any one bar

again, you get the same type of summary we just saw.

I'm sure you've done a fair amount of work with univariate

data, meaning data involving one variable.

So we'll move onto bivariate data.

Bivariate data means data pairs of two variables.

The independent variable denoted by x and the dependent

variable denoted by y.

Regarding the relationship between the two variables, we

have three possible cases.

If y depends on x, we have a causal relationship.

If x and y are related but do not depend on each other,

there is no causal relationship.

Finally, there may be no relationship at all between

the two variables.

The example we're about to investigate

belongs to case two.

Gray wolves, also known as timber wolves, or simply

wolves, originated about 300,000 years ago.

They are known for their intelligence and adaptability.

In the past, wolves endured revenge killings for attacks

on livestock.

By the 1930s, they were completely extinguished from

the northern Rocky Mountain states.

Eventually, with the passage of the Endangered Species Act

in 1973, public attitudes changed and wolves received

legal protection.

We're going to use data from the 2007 Rocky Mountain Wolf

Recovery Report, plot them, analyze them, find the

mathematical model that best fits the growth pattern, and

discuss the prediction of future population numbers.

Turn on the TI-Nspire.

Press the Home key for a new document.

Save the previous document if you wish, then create a lists

and spreadsheet page.

Scroll to the top of column A and type Y-E-A-R for the

x-variable.

Press Enter or the Down Arrow.

We'll use the Fill down feature to insert consecutive

years through 2007.

Scroll down beyond the formula line.

And in cell A1, type 1979.

Move down and in cell A2, type 1980.

Press Enter.

Go back up to A2, press and hold the Shift key as you move

up to A1, and both cells will become darkened, which means

they've been selected.

Press Menu.

And under Data, select Fill down.

1979 to 2008 is a 30-year span, so our data through 2007

spanned 29 years.

Press the Down Arrow until the dotted box

includes the 29th row.

Press Enter.

You can see the final years of the

year list you've generated.

Pressing the Right Arrow takes you back up to cell B1.

Scroll to the top of column B and type wolf pop for the

annual wolf population, our y-variable.

Arrow Down twice to cell B1.

Pause the DVD now to enter the data from the chart.

Stop at 1,513 in cell B29 before resuming.

To select both column, Press Menu and follow this path

Actions to Select.

And then, to Select Column.

Now, press and hold the Shift key as you Arrow Left to

select both columns.

To create a scatter plot of these data, press Menu.

And under Data, select Quick Graph.

Scatter plot and data lists are now side by side.

Observe the shape of this graph.

It's ascending, which was expected.

But it also reveals a rapid growth rate.

So we suspect an exponential function.

To perform an exponential regression, press Control Tab

to switch back to the list still highlighted.

Next, press Menu.

Under Statistics, select Stat Calculations.

And under that, select Exponential Regression.

A dialog box appears.

Press the Down Arrow to select year for the X List.

Tab down to Y. Do the same, but select wolf pop this time.

Notice the regression equation, which the Nspire

will compute, will be saved to the f1 function.

Tab down to first result column and note that the

results will be stored starting in column C. Tab down

to OK and press Enter.

To see more of this spreadsheet, under Page Layout

select Custom Split.

Arrow Right to move the partition to the right, then

press Enter.

Column C contains the words and symbols.

Column D gives their values.

Notice the exponential function general

form in cell D2.

Scroll down to line 6.

r is called the correlation coefficient.

Its value tells us how good a fit we have.

An absolute value of 1 means a perfect fit.

This r-value of 0.988 implies an excellent fit.

We're going to introduce a new graphs and geometry page where

we'll superimpose the continuous graph of the F1

regression equation over the scatter plot of data pairs.

Then, we'll see how well we can use this mathematical

model to predict future wolf population numbers.

Insert the Graphs and Geometry Page by pressing Control-I and

selecting 2.

Press Menu.

And under Graph Type, select Scatter Plot.

The x-variable is highlighted.

Press Enter and choose Year.

Press Enter again.

Tab over to y.

Press Enter and this time choose wolf pop.

To adjust the window, since our data begins

at 1979, press Menu.

And under Window, select Zoom Data.

There's our scatter plot in a full screen.

To graph the regression equation over this plot, press

Menu again.

And under Graph Type, choose Function.

f2 is displayed at the bottom.

To access f1, where our regression equation is stored,

press the Up Arrow once and f1 is displayed.

Press Enter to graph it.

As you can see, while f1 is a good fit, meaning the function

f1 is a good mathematical model for the wolf population

growth, it's not perfect.

For example, there's one period of time where the

model's growth is slower than the actual

wolf population growth.

And later, it's the opposite.

To use f1 to predict future wolf population

numbers, press Menu.

And under Trace, select Graph Trace.

A point's coordinate appears.

If the y-value is integer, it's on the scatter plot.

Press the Down Arrow to access the f1 graph.

Type in 2012.

The screen pans over to show the point on f1's graph.

7,000 wolves seems rather excessive.

And that's only because the model is theoretical.

It's modeled with a 98.9% accuracy, the population

growth over 29 years The timespan during which wolves

were protected under the Endangered

Species Act, or ESA.

As the NRM gray wolf population reaches the desired

levels, the ESA protection may be lifted and other variables

will then come into play, such as the numbers of wolves

killed by humans.

The growth rate will, therefore, slow down and a new

mathematical model will have to be developed.

This process of starting with data from the real world,

developing a theoretical model in mathematics, and then

returning to the real world better equipped to explain and

predict phenomena is called mathematical modeling.

But predicting the future behavior of variables in any

phenomenon is tricky business.

Scientists must combine information from theoretical

models with common sense knowledge to make the best

prediction.

Understanding probability is important because life is full

of uncertainty.

And we're better off making choices based on objective

probability than on our own subjective beliefs.

Understanding statistics is equally important because it

provides the tools for predicting and forecasting

future events based on observational data.

Now that you know how to carry out probability simulations

and the process of mathematical modeling with the

TI-Nspire and it's built-in statistics, try your hand at

other interesting data sets.

Good luck.

[MUSIC PLAYING]