Statistics is the practice or science of collecting and analysing numerical data often in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative sample.
Definitions
- Average – a generic term for a single representative value for a set of numbers, e.g.
- Mean – calculated by adding together the observed values and dividing by the number of observations; may relate to a sample or the population – see table below
- Median – the value mid-way along the ordered set of a distribution (if distribution is not even then it is the average (mean) of the two ‘middle’ points)
- Mode – the value in the distribution that has been observed with the greatest frequency; two or more values may share the greatest popularity
- Descriptive statistics – methods used to summarise or describe our observations; concerned with summarising or describing a sample
- Dispersion – a measure of variability; the spread of a data distribution – describable by inter-quartile range (IQR) or standard deviation (SD)
- Distribution – the distribution of a statistical dataset is the spread of the data which shows all possible values or intervals of the data and how they occur. A distribution is simply a collection of data or scores on a variable. See normal distribution
- Inferential statistics – using observations as a basis for making estimates or predictions, i.e. inferences about a situation that has not yet been investigated; concerned with generalising from a sample, to make estimates and inferences about a wider population
- Inter-quartile range (IQR) – a measure of the spread of a sample or a population distribution, specifically the distance between the 25th and 75th percentiles. Equivalent to the difference between the 1st and 3rd quartiles
- Normal distribution – A normal distribution or Gaussian distribution refers to a probability distribution where the values of a random variable are distributed symmetrically. These values are equally distributed on the left and the right side of the central tendency. Thus, a bell-shaped curve is formed. About two-thirds of observations are within one standard deviation (SD) either side of the mean.
- Sample versus Parameter – A parameter is a number describing a whole population (e.g., population mean), while a statistic is a number describing a sample (e.g., sample mean)
- Sampling – Random sampling – may be blind sampling e.g. picking numbered markers from a bag, or mechanical sampling e.g. using a random number generator
- Standard deviation – the square root of the variance of a sample or distribution
- Variables
- Category variable – any variable that involves putting individuals into categories
- Continuous variable – whatever two values one has it is always possible to imagine more values in between them, e.g. 2.5 between 2 and 3
- Discrete variable – one in which possible values are clearly separated from one another e.g. number of children in a family has to be 1, 2, 3 etc – can’t have 2.5
- Nominal variables – giving names to the different forms the variable may take e.g. the Brand name of a bicycle such as Raleigh
- Ordinal variable – categories that can be put in order e.g. less of more of a characteristic – better, bigger or faster
- Quantity variable – where one is looking for a numerical value – a quantity
- Variance – measures variability from the average or mean. It is calculated by taking the differences between each number in the data set and the mean, then squaring the differences to make them positive, and finally dividing the sum of the squares by the number of values in the data set. It is the square of the standard deviation.
Further Reading
David Spiegelhalter, The Art of Statistics (Penguin Random House UK, 2004) ISBN:978-0-241-39863-0
Derek Rowntree, Statistics without Tears (Penguin Random House UK, updated edition 2018) ISBN:978-0-141-98749-1
Tom Chivers and David Chivers, How to Read Numbers (Weidenfeld & Nicolson 2021) ISBN:978-1-4746-1996-7