Statistics and probability

Basic terms

Average (Arithmetic mean)

     1  ⎛ N-1   ⎞
x̅ = ─── ⎜  ∑  xᵢ⎟
     N  ⎝ i=0   ⎠

There are many kinds of 'mean'-s.

Average is arithmetic mean.

https://www.cuemath.com/data/difference-between-average-and-mean/

Standard deviation

A measure of 'spread' of data.

aka σ, s, SD

         ⎡ 1  ⎛ N-1          ⎞⎤
σ² =     ⎢─── ⎜  ∑  (xᵢ - μ)²⎟⎥
         ⎣ N  ⎝ i=0          ⎠⎦



       --------------------------                    
      /  ⎡ 1  ⎛ N-1          ⎞⎤
σ =  /   ⎢─── ⎜  ∑  (xᵢ - μ)²⎟⎥
    √    ⎣ N  ⎝ i=0          ⎠⎦

σ² is variance.

To get the value of the same unit as the xᵢ values, we take the square root of σ², which is the standard deviation σ.

Mode

Most frequently occurring value.

Eg:

In

10, 23, 42, 23, 20, 24, 19, 39, 24, 28, 24

24 is mode.

DOUBT: What if there are multiple values which occur most frequently?

https://www150.statcan.gc.ca/n1/edu/power-pouvoir/ch11/mode/5214873-eng.htm

Median

The middle value when the values are arranged from smallest to largest.

From Britannica:

(mean, mode and median are) the three principal ways of designating the average value of a list of numbers.

DOUBT: How can we get average value from median or mode?

Regression

From Spiegelhalter's popsci book:

any process of fitting lines or curves to data

Difference (or error) of a point from the line: residual

Response variable:

Explanatory variable:

The gradient/slope of the regression curve/line: regression coefficient

Statistic model

Errors

Type I error False positive
Type II error False negative

Algorithm performance

In a classification problem.

Error matrix aka confusion matrix.

Markov process

A process where next state depends only the current state.

'Future is independent of the past' in some sense. ˡ

More

Geometric mean

Useful for data where growth/decline is exponential.