Spread - Streuung
When looking at different sets of data, one way to compare them is to look at the difference of spread.
We use the mean $\overline{x}$
as a reference point to investigate the spread.
One possibility could be the sum of the differences around the mean.
$\sum (x - \overline{x})$
But, here the positive and negative differences always add up to zero.
One could consider using the absolute value of the differences $\sum \mid x - \overline{x}\mid$
or the sum of squares of the differences $\sum (x - \overline{x})^2$
To make this independent of the number of values, we take the mean of the latter. This is called the variance, and is often used in statistics.
Variance - Varianz
The variance and the square root of the variance called the standard deviation (s.d.) is used to measure the spread of data. In German, it is called Standardabweichung.
$s = \sqrt{V(x)}$
in a sample
or
$\sigma = \sqrt{V(x)}$
in a population
German definition:
exercise
formulas
$${V(x) = \frac{\sum (x - \overline{x})^2}{n}}$$
$${\text{standard deviation} = \sqrt{\frac{\sum (x - \overline{x})^2}{n}}}$$
alternative formula for variance and s.d.
$${\begin{align} V(x) & = \frac{\sum (x - \overline{x})^2}{n} \newline & = \frac{1}{n} \sum (x - \overline{x})(x - \overline{x}) \newline & = \frac{1}{n} \sum (x^2 - 2x\overline{x} + (\overline{x})^2) \newline & = \frac{\sum x^2}{n} - \frac{\sum 2x\overline{x}}{n} + \frac{\sum (\overline{x})^2}{n} \end{align}}$$
As $\overline{x}$
is a constant it can be taking ouside the sum.
$${\frac{\sum 2x\overline{x}}{n} = 2\overline{x}\frac{\sum x}{n} = 2\overline{x}\overline{x} = 2(\overline{x})^2}$$
Also, in $\frac{\sum (\overline{x})^2}{n}$
the term $(\overline{x})^2$
is added to itself n times and the result is divided by n. So,
$${\frac{\sum (\overline{x})^2}{n}= \frac{n(\overline{x})^2}{n}= (\overline{x})^2}$$
Now the equation for the variance becomes
$${\begin{align} V(x) & = \frac{\sum x^2}{n} - 2(\overline{x})^2 + (\overline{x})^2 \newline & = \frac{\sum x^2}{n} - (\overline{x})^2 \end{align}}$$
This form of the equation saves a lot of work when calculating the variance directly.
variance and standard deviation in a binomial distribution
$${\begin{align} V(x) = n \cdot p \cdot (1-p) \end{align}}$$
$${\begin{align} V(x) = n \cdot p \cdot q \end{align}}$$ for $q = 1 - p$
$E(x) = n \cdot p$