Unit 1: Probability and Statistics

Relative frequency • Theoretical probability and experimental probability • Probability calculations with Venn diagrams, tree diagrams and sample spaces • Mutually exclusive events, Combined events • Probability calculations for dependent and independent events using addition and multiplication rules • Conditional Probability • Discrete and Continuous data • Sampling techniques and Response rates • Data manipulation and misinterpretation • Limitations and context in statistical enquiry (reliability of data, bias in sampling and estimation in calculations) • Graphical representations (including: pie charts, bar charts, stem and leaf plots, pictograms, bivariate graphs, scatter graphs, box plots, cumulative frequency graphs) • Lines of best fit • Data processing: including mean, median (measure of central tendency) and mode, for continuous data, and quartiles and percentiles for discrete, grouped and continuous data • Measures of dispersion: Interquartile range (including application and relationship with the median), Standard deviation (including application and relationship with the mean) • Describing correlation, including positive, negative, none, strong and weak • Use of technology to find the numerical value for correlation and its meaning.

Probability

Probability is the likelihood of an event occurring. For any event A, the numerical value of P(A) is assigned, which is a measure of the likelihood of it occurring. Probability must satisfy a list of axioms, or rules:

Probability can be illustrated through a tree diagram:

Tree diagram showing outcomes of flipping a coin twice

This tree diagram shows the possible outcomes of flipping a coin twice. The first flip is the first “event”, which has two outcomes, getting heads or tails, of which there is a ½ probability of getting either. The second flip is the second “event”, which again has two outcomes, getting heads or tails, for each outcome of the first event.

If we want to find the probability of getting two heads in a row, we multiply the probabilities of the desired outcomes against each other.

P(heads, heads) = ½ (probability of getting heads the first flip) x ½ (probability of getting heads the second flip) = ¼

So, there’s a ¼ chance that we’ll get two heads in a row.

Theoretical probability

Theoretical probability is the probability of an event occurring based on mathematical predictions. The formula for theoretical probability of an event A is:

P(A) = No. of favourable outcomes / Total no. of possible outcomes

For example, if you’re flipping a fair coin, the theoretical probability for the coin to land on ‘heads’ is P(Heads) = ½ = 0.5. Similarly, the theoretical probability for the coin to land on ‘tails’ is P(Tails) = ½ = 0.5. This means that according to theoretical probability, the chances of landing a heads or tails per coin flip are equal.

Experimental probability

Experimental probability is the probability of an event occurring based on the number of times the event occurred during an experiment and how many times the experiment was repeated. The formula is:

P(A) = No. of times desired event occurred / Total no. of times experiment was conducted

For example, if you flip a fair coin ten times, you might get ‘heads’ 8 times and ‘tails’ 2 times. So the experimental probability for the coin to land on heads is P(Heads) = 8/10 = 0.8. Similarly, the experimental probability for the coin to land on tails is P(Tails) = 2/10 = 0.2.

Despite the theoretical probability predicting that the results would be a 50% chance of heads or tails, experimental probability showed us that for flipping the coin 10 times, the chances of getting heads were 0.8 and 0.2 for tails.

Hence, experimental probability ≈ theoretical probability.

However, repeating the experiment multiple times will bring the result closer to theoretical probability.

Dependent and independent events

Dependent and independent events refer to the influence one event occurring has on the probability of the other event occurring.

If a coin is flipped and it lands on heads, it is equally as likely to land on heads or tails if flipped again. Landing on heads does NOT affect the probability of landing on heads or tails later. This makes a coin landing on heads and a coin landing on tails independent events, because the probability of one event occurring does NOT affect the probability of the other event occurring.

To calculate the probability of a pair of independent events occurring, simply multiply the probabilities. For example, continuing with our coin toss example (Heads = H; Tails = T):

P(H and T) = P(H) * P(T)

If a bag of chocolate containing 5 pieces of dark chocolate and 5 pieces of white chocolate is opened, then the probability of selecting a piece of dark chocolate is 5/10, and that of selecting a piece of white chocolate is also 5/10.

However, if you open the bag, pick a piece of dark chocolate, and eat it, then the probability of selecting a piece of dark chocolate again is now only 4/9, because you ate 1 piece of dark chocolate (5 - 1 = 4) and one piece of chocolate from the bag overall (10 - 1 = 9). This also affects the probability of choosing a piece of white chocolate, which is now 5/9.

Taking a piece of chocolate from the bag and eating it is a dependent event, because the outcome of one event affects the probability of the possible outcomes of another event.

Continuing with our chocolate example (Dark chocolate = D; White chocolate = W):

P(D and W) = P(D) * P(W, given D)

“W, given D” can be rewritten as:

P(D and W) = P(D) * P(W|D)

Because the probabilities of the events depend on one-another, the probabilities intersect. So, if we presented this as a Venn diagram, P(D) and P(W) would be two intersecting circles, or P(DW)

P(DW) = P(D) * P(W|D)

Mutually exclusive and combined events

Mutually exclusive and combined events refer to whether or not the chances of one outcome occurring overlap with another outcome occurring.

If, for two events A and B, A cannot occur if B occurs, then the events are mutually exclusive. On a Venn diagram, this would show up as two circles that DO NOT intersect. Therefore BA = nothing.

This can be written in Venn diagram form as:

P(AB) = P(A) + P(B)

Card examples

Let’s move on to cards. The probability of picking a heart card from a complete deck of cards is 13/52. The probability of picking a king card from a deck is 4/52. But within the heart cards, there is a king of hearts! And within the king cards, there is the king of hearts again!

Because the outcome of picking the king of hearts card from the deck is common between both events, these events are combined events, or ones where for two events A and B, A and B can occur simultaneously, or at the same time.

Remember, the probability of the union of two or more events can never be more than 1.

So, the probability of choosing a heart card is 13/52, or P(H) = 13/52.

And the probability of choosing a king card is 4/52, or P(K) = 4/52.

If we wanted to find the probability of both events occurring at the same time, we would write “the probability of choosing a heart card or a king card”. In Venn diagram notation form, this would be:

P(HK) = P(H) + P(K)

= 13/52 + 4/52

= 17/52

!!!But this is wrong!

There are 13 desirable outcomes where a heart card is picked, and one of these outcomes is picking a king of hearts.

There are 4 desirable outcomes where a king card is picked, and one of these outcomes is picking a king of hearts.

So by adding 13 + 4 directly, we’re REPEATING the outcome of picking a king of hearts, which isn’t possible because we only have a single king of hearts card! And because if a king of hearts card is picked, then both outcomes are fulfilled at the same time! Which would make it P(H and K), or P(HK), which is not what we’re looking for!

Instead, we’re going to subtract the probability of picking the common outcome from P(HK):

P(HK) = P(H) + P(K) - P(H overlap K)
P(HK) = P(H) + P(K) - P(HK)

OR vs AND

If you wanted this in a visual format, this graphic from Math Is Fun should help:

Visual explanation of OR vs AND

To remember what “” and “” signify, remember that “OR holds more than AND”: “” is shaped like a cup, which can hold more water than “”.

Data Handling

Discrete and Continuous Data

Discrete data is data that can only take specific, finite values. It cannot take on any values between these specific values in an interval. It is discrete because there is a limited amount of “answers” or values that fall into the interval.

If you’re being surveyed on how many pet cats you have, you can only say 1, 2, 3, or any other whole number. You can’t say (and it would be horrifying if you did) that you have 0.5 of a cat.

For example, in the interval [1, 10], which includes numbers from 1 to 10, discrete data may only consider whole, natural numbers, so 1, 2, 3, 4… 10. These values are finite, meaning they can be counted.

Continuous data is data that can take infinite values in an interval or between two distinct, given points.

For example, in the interval [1, 10], continuous data may include whole numbers like 1, 2, and 7; decimals like 1.69, 4.20, 8.88903123; or fractions like 3/6. Meaning, it can take an infinite and endless number of values. It is continuous because there is an infinite range of “answers” or values that fall into the interval.

If you’re being surveyed on your height, there’s a broad range of answers you can give. You can be 182 centimeters, 183.5, 190.5839 cm (sorry Americans, no feet here).

Sampling techniques

Data can be collected through surveying populations, which is the total number of people regarding whom conclusions are to be drawn. However, in the case of extremely large or varied populations, data is collected from samples, which are a small group of people from the total population from whom data will be collected.

Samples can be chosen through probability and nonprobability sampling methods:

Response rates

A response rate is the percentage of people who completed the survey out of all the individuals who were sent the survey or were part of the sample, or:

Response Rate = (No. of people who completed the survey / No. of people who were sent the survey) * 100

High response rates are ideal for surveys because they reduce the chances of errors such as nonresponse bias. Furthermore, the greater your response rate is, the more applicable the results of the sample are to the greater population, because more people have participated in the survey, making the results more accurate to how they would be if the entire population participated in the survey.

Nonresponse bias is when a specific group or demographic fails to respond to the survey, resulting in a biased and inadequate response.

For example, an entire group of people who follow a certain religion may be unintentionally excluded from the survey if it is sent out on a religious holiday when they are unavailable.

Data Manipulation and Misinterpretation

Data manipulation refers to the way data is selected, presented, or processed in order to influence interpretation. Misinterpretation occurs when data is read incorrectly or conclusions are drawn without sufficient justification.

Common Forms of Data Manipulation

Examples of Misinterpretation


Limitations in Statistics

Statistics cannot provide absolute certainty. All statistical conclusions are subject to limitations related to data collection, analysis, and interpretation.

Key Statistical Limitations

Limits of Conclusions


Required Context in Statistics

Context is essential for interpreting statistical results correctly. Without context, numerical values can be misleading.

Important Contextual Factors

Why Context Matters

In IB MYP 5, students are expected to critically evaluate statistical information by identifying manipulation, acknowledging limitations, and interpreting results within an appropriate real-world context.

Graphs

Types of Graphs

Pie charts show what fraction of the total different categories of responses make up. They involve drawing a circle and dividing it into appropriate wedges depending on the percentage of responses that fall into each category.

Example pie chart

However, pie charts are hard to interpret properly, can’t be scaled, hard to draw, and make it difficult to compare responses side-by-side.

To find the angle of a wedge in a pie chart:

(Number of desirable responses / Total number of responses) * 360

For example, for 22 responses out of 50 that fall into one wedge:

(22/50) * 360 = 158.4 degrees

Bar Charts

Bar charts are used for data which is categorical, or has no order, like eye colour. They are usually used to show comparison between categorical data collected of two different sample groups, or to show the response breakdowns within a sample group.

Stem and Leaf Plots

Stem and leaf plots are used to organize data so it can be read quickly, or to find quartiles.

Stem and leaf plot example

Here, 2 is the stem, as it represents the range of all values with a 2 in the tens place. Meanwhile, the leaf includes actual values like 22, 26, and 29.

The stem, or the value on the left, has the range, while the leaf, or the value on the right, has the actual values. They only reorganize raw data that is already given and can’t be used for analysis, only for organization.

Pictogram Graphs

Pictogram graphs use symbols or pictures to represent data. Each symbol represents multiple pieces of data; for example, for a pictogram showing the number of Girl Scout cookie sales per year, a picture of a single cookie can represent 200 cookies sold.

Bivariate Graphs

Bivariate graphs are graphs that display the relationship between two variables. These variables can be continuous or discrete. Bivariate data is used to determine whether one variable influences another or to identify patterns, trends, or correlations.

For example, if you record the hours studied and the corresponding test scores of students, hours studied would be one variable and test scores the second variable. A bivariate graph allows you to visually inspect the relationship between these two variables.

Scatter Graphs

Scatter graphs, or scatter plots, are a type of bivariate graph that uses Cartesian coordinates to display values for two variables for a set of data. Each point on the graph represents one observation, with the x-coordinate representing the first variable and the y-coordinate representing the second variable.

Scatter graphs can reveal trends, correlations, or outliers in data. For example, plotting hours studied (x-axis) against test scores (y-axis) may show that as study hours increase, test scores also tend to increase.

Advantages: Easy to identify relationships and outliers. Limitations: Does not show the exact numerical value of the correlation.

Box Plots

Box plots, or box-and-whisker plots, are graphical representations that summarize data using five key values: minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. They are useful for identifying the spread, central tendency, and outliers in a dataset.

Construction:

Uses: Compare distributions, detect skewness, visualize range and quartiles.

Cumulative Frequency Graphs

Cumulative frequency graphs (or ogives) are used to display the cumulative frequency of a dataset. They show the running total of frequencies up to each class interval.

Construction:

Applications: Determine median, quartiles, and percentiles; visualize distribution of data.

Lines of Best Fit

The line of best fit is a straight line drawn on a scatter graph that best represents the trend of the data. It minimizes the distance between the line and all points on the scatter plot, often using the method of least squares.

Uses:

Steps to construct:

  1. Plot all data points on a scatter graph.
  2. Estimate a line that has approximately equal number of points above and below it.
  3. Use the line equation (y = mx + c) to calculate predicted values.

Mean, Median, and Mode

Measures of central tendency describe the average or typical value of a dataset.

Quartiles and Percentiles (Discrete, Grouped, and Continuous Data)

Quartiles divide data into four equal parts. Percentiles divide data into 100 equal parts.

Quartiles:

Finding quartiles and percentiles:

Measures of Dispersion

Measures of dispersion describe how spread out the data is. Common measures include:

Correlation

Correlation describes the strength and direction of the relationship between two variables.

Correlation coefficient (r) quantifies the strength and direction of linear relationship between variables:

-1 ≤ r ≤ 1, where r > 0 positive, r < 0 negative, r ≈ 0 no correlation

Use technology (like calculators or software) to calculate r for precise numerical value and interpretation.

Frequency, Data, and Cumulative Frequency

Let’s say you’re doing a survey on the number of children people have in their basement. You ask 20 people and get the following responses:

Responses: 1, 4, 5, 15, 8, 2, 5, 9, 7, 6, 12, 11, 2, 3, 4, 6, 6, 8, 12, 10

Whew, that’s a lot of data! Kind of intimidating to look at. If we wanted to do anything with this data, like plot it on a graph, or draw any kind of meaningful connections from it, we need to organize it first.

Notice how some of the values repeat? You could say they appear frequently. Frequency means how often something happens or appears.

Frequency Table

We organize this data based on the type of response given (number of children) and the number of people who gave that response:

Number of Children in Basement □ Frequency
11
22
31
42
52
63
71
82
91
101
111
122
151

This table represents discrete data (data consisting of separate, distinct values). It is impossible to have 7.5 children or negative children.

Plotting the Data

If we were to plot this data on a graph:

Each bar is separate because the data is discrete. It’s either a specific number of children or not.

Mean, Median, Mode, and Range

We can calculate the mean, median, mode, and range from the frequency table.

Mean

The mean is the average value of a dataset:

Mean = (Sum of all values) / (Total number of values)

Or, using frequency:

Mean = (Σ f x) / (Σ f)

Where f is the frequency and x is the data value. Sigma (Σ) refers to summation of all entries.

Example for 12 children row: 12 * 2 = 24; divided by frequency 2 = 12. This matches manually adding each 12 individually (12 + 12 = 24, 24/2 = 12).

Median

The median is the middle value of the data set. For an ordered list:

Example: 5, 3, 6, 3, 9 → sorted: 3, 3, 5, 6, 9 → median = 5

Example with even numbers: 5, 3, 6, 3, 9, 2 → sorted: 2, 3, 3, 5, 6, 9 → middle numbers: 3 and 5 → median = (3+5)/2 = 4

For large datasets (n terms), use formula for median position:

Median position = (n + 1) / 2

This gives the position in the ordered dataset to find the median.

Mode

The mode is the value that occurs most frequently. In this frequency table, 6 occurs 3 times, making 6 the mode.

Range

Range is the difference between the maximum and minimum values:

Range = Maximum value - Minimum value

From the table: 15 - 1 = 14

Median, Mode, and Range (Continued)

So we have (5 + 1) = 6; 6 / 2 = 3.

BUT WAIT! THAT’S NOT OUR ANSWER! Remember that since we did all the shabang with the positions of the numbers, the answer we get is the position of the median. In this case, the median is the number that occupies the 3rd position in the data set. Counting, we find that 5 occupies the third position, making it the median!

What if we have an even number of numbers?

Example: 2, 3, 3, 5, 6, 9 → N = 6

(6 + 1) = 7 → 7 / 2 = 3.5

The median will be the number that occupies the 3.5th position. Since there is no 3.5th position, we take the mean of the numbers in positions 3 and 4: 3 and 5 → (3+5)/2 = 4. So the median is 4!

Important: The median does not necessarily have to be a value that exists in the data set; it must simply be the value in the middle of the dataset.

Mode

The mode is the most frequently occurring value. A dataset can be unimodal or bimodal. Any more than 2 modes is not counted.

Range

The range is the difference between the highest and lowest values. It indicates how much the data varies. Beware of outliers: if most data is small but one value is extremely high, the range can be misleading.

Cumulative Frequency

Cumulative frequency helps us find the total number of data entries up to a certain class limit. It is particularly useful for grouped or continuous data.

Grouped/Continuous Data Example

Not all data comes as distinct, separate values. Some fall within ranges. For example, if surveying people’s heights, responses are continuous (5.5, 5.4, 5.11 feet, etc.). We group data into class intervals:

Time taken to run 100 meters (seconds) Frequency
7 - <93
9 - <1112
11 - <1318
13 - <1520
15 - <177
Total60

Here:

Example: 12 times fall into 9 - <11 class. The exact values are unknown; they could be 9, 9.1, 10.5, etc.

Cumulative frequency: Total number of entries up to a certain class. It is calculated by adding the frequency of a class to the sum of all previous class frequencies.

Cumulative Frequency

Cumulative frequency measures the “running total” as you go along. Essentially, how much do you have in total up to a certain class limit that increases the further you go down, till you reach the maximum class limit that encompasses all the scores and you get all the people in it.

Example Table

Time taken to run 100m (s) Frequency Cumulative Frequency
7 - <933
9 - <111215
11 - <131833
13 - <152053
15 - <17760

Now, we can plot this on a graph with the leftmost column on the x-axis, and the rightmost column on the y-axis. But there’s a catch– since these are class intervals, we can’t plot both values. We plot the upper value, because the cumulative frequency is a measure of the values up to the upper limit of each progressive class interval.

Cumulative Frequency Graph

This produces a graph called a cumulative frequency graph, which can be divided into various intervals for easy interpretation. The graph ends at 60, which means 60 people in total took the survey. The point at 60 has an x-coordinate of 17, indicating that all 60 people have running times ≤17 seconds.

With this graph, if we take any point on the y-axis and draw a line to meet the graph, then draw a line down to the x-axis, we can estimate values. The y-axis point = number of people. The x-axis point = approximate value of the data. Approximate because we are using grouped data.

Quartiles

Quartiles are a neat way to sift through data in a cumulative frequency graph.

Lower Quartile (LQ)

Calculated as:

LQ = 25/100 × TOTAL CM

In this example: 25% of 60 → LQ = 15. Drawing a line to the x-axis, we get 11. Meaning: 15 people ran ≤11 seconds. 25% ran ≤11 seconds, 75% ran ≥11 seconds.

Upper Quartile (UQ)

Calculated as:

UQ = 75/100 × TOTAL CM

In this example: 75% of 60 → UQ = 45. Drawing a line to the x-axis, we get 14. Meaning: 45 people ran ≤14 seconds, 25% ran ≥14 seconds.

Median

Calculated as:

Median = 50/100 × TOTAL CM

Note: Quartiles and median values are approximations because we cannot know exact values in grouped data.

Interquartile Range (IQR)

The difference between the upper quartile (UQ) and lower quartile (LQ):

IQR = UQ - LQ

IQR is often better than the range as a measure of spread because it ignores outliers.

Outliers

LQ Outliers: LQ - (1.5 × IQR) → any number below this is an outlier.

UQ Outliers: UQ + (1.5 × IQR) → any number above this is an outlier.

Finding the Mean, Median, and Mode of Grouped Data

The mean of grouped data is similar to how you find it for discrete data using the formula f × x, except that we’re working with class intervals here, so we don’t know the exact “data value” of each row.

Instead, we assume that whatever data values fall in each class will average out to the midpoint between both class intervals. For example, for the class [10 - <15], the midpoint would be:

(10 + 15) / 2 = 25 / 2 = 12.5

Effectively, the midpoint becomes the “x” in our f × x calculation.

The median uses cumulative frequency. Take a running total of the frequency of each class interval, then find the median using the formula (n + 1)/2. (We don’t use n/2 here because we start with a normal frequency table, not a cumulative frequency graph.)

Median in Grouped Data

For a total frequency of 40, the median is the 20.5th value. This means averaging the 20th and 21st values. Both these values fall into the class with a value of 6 (21 numbers are 6 or less, only 7 are 5 or less), so:

(6 + 6) / 2 = 12 / 2 = 6

Though the example uses discrete data, it works similarly with grouped data.

The mode of grouped data is the class with the highest frequency, called the modal class.

Box-and-Whisker Plots

These visually represent the five-figure summary of a data set:

Box-and-Whisker Plot

The five-figure summary effectively briefs you about key parts of a data set. Example phrasing of questions:

  • “The highest value was…” → Maximum value
  • “The lowest value was…” → Minimum value
  • “Half the scores were equal to or greater than…” → Median
  • “The top 25% of values were at least…” → Upper quartile
  • “The middle half of values were between…” → Upper and lower quartile
  • “The values of the middle 50% of data were spread out over…” → IQR

Standard Deviation

Standard deviation measures the “spread” of all values in a data set around the mean. Unlike range or IQR, it considers every value.

Example: If surveying heights, a high standard deviation means a wide distribution of heights. A low standard deviation means most heights are similar.

σ = √( Σf(x - x̄)² / Σf )

Explanation:

Scatter Plots

These graphs are plotted to investigate the relationship between two variables, highlighting their correlation. In a data table, the column on the left is usually plotted on the x-axis, while the one on the right is on the y-axis.

For scatter plots, a line of best fit is drawn if there is correlation between the points to illustrate a potential linear relationship. The line of best fit must:

Note: When calculating the slope of the line of best fit, use the values of the mean point and the origin!

Finding or predicting a data value that falls within the set of data points is called interpolation, while predicting beyond the set is called extrapolation.

Types of Correlation

Coordinate Geometry

Formulas to Recall:

Collinear Points

If three or more points are collinear, their gradients are the same.

Parallel and Perpendicular Lines

Tangents

Tangents meet a circle at a single point and are perpendicular to the radius from that point.

  1. Find gradient of the radius.
  2. Take the negative reciprocal for tangent gradient.
  3. Use the point coordinates to find the tangent equation.

Transformations

Reflections

Translations

Translation vector written as (x
y) vertically. Rules:

Shape translation formulas:

Dilations

Rotations

Tessellations

Patterns of geometric shapes with no overlaps and no gaps. Shapes are called “tiles”.

Rules for Non-Right-Angled Triangles

Triangle rules image

Capital letters = angles, lowercase letters = sides.

Bearings

These are a mathematical measure of direction always using north as a reference, with north = 000 bearings. These are always measured clockwise from north, and are always given in three figures, like 000, 012, 210, 360, etc. Ensure to ALWAYS INDICATE NORTH with an “N” when drawing bearings, like so:

If something is due east/west, then it will be at an exact 90 degree angle from north or south.

Exponents and Radicals

Exponents and Radicals

Simple and Compound Interest

Simple Interest Formula:

I = P × R × T

Compound Interest Formula:

A = P(1 + r/n)^(nt)

Simple Interest Reminder

!!!Note! If a question ever asks for a “change at a constant rate relative to the year”, it means the amount of change must be the same every year, a.k.a. simple interest.

Sine and Cosine Functions

These are periodic functions because they continuously repeat themselves over a fixed interval. They are used to model waves, oscillations, and circular motion, among other applications.

General Forms

Parameters Explained

Key Properties

Important Points on the Graph

Applications

Transformations