Hackerrank 1: 10 days of stats
Posted on 03/04/2019, in Python, Mathematics.This note is only used for learning on Hackerrank - 10 days of Statistics.
tocIn this post
keyboard_arrow_right
Goto this Chalenge.
Day 0 : Mean, Median, Mod, Weighted mean
- mean = mean value $\frac{1}{n}\sum_i x_i$
- median = the number at the center, if the number of elements are odd, it’s the center number, if even, it’s the mean of two center elements.
- mod = number(s) with the most number of appearances.
- Given a set of numbers, X, and corresponding set of weights, W, the weighted mean is calculated as follows
-
Find
median
withoutnumpy
def find_median(lst): len_lst = len(lst) if len_lst % 2 == 1: return lst[len_lst//2] else: return (lst[len_lst//2-1] + lst[len_lst//2])/2
- With numpy
import numpy as np from scipy import stats print(np.mean(<list>)) print(np.median(<list>)) print(int(stats.mode(<list>)[0]))
Day 1 : Quartiles, Interquartile Range, Standard Deviation
- Quartile of an ordered data set are the 3 points that split the data set into 4 groups.
- $Q_1$: the middle number between the smallest number in a data set and its median
- $Q_2$: the median ($50^{th}$ percentile) of the data set
- $Q_3$: the middle number between a data set’s median and its largest number
- Algorithm:
- If the number of elements is odd, don’t include the median for each half when seeking $Q_1, Q_2$
- If the number of elements is even, just devide into 2 halves.
- $Q_1$ is the median of first half, $Q_2$ is the median of second half.
- Get the input:
size = int(input()) numbers = list(map(int, input().split()))
- The interquartile range of an array is the difference between its first (Q1) and third (Q3) quartiles
- Output (0.9):
print("{:.1f}".format(Q3-Q1))
- Expected value $\mu$ = mean of discrete random variable X.
-
Variance $\sigma^2$: $\sigma^2 = \dfrac{\Sigma (x_i-\mu)}{n}$
-
Standard deviation $\sigma$: $\sigma^2 = \sqrt{\dfrac{\Sigma (x_i-\mu)}{n}}$
- Use
sqrt
:import math as m
Day 2 : Basic probability
- $P(A) = \dfrac{\text{favorable outcomes}}{\text{total outcomes}}$
- $0\le P(A)\le 1$
- $P(A^C) = 1-P(A)$
- mutually exclusive or disjoint: $A\cap B=\emptyset, P(A\cap B)=0$
- A,B are disjoint (mutually exclusive): $P(A\cup B) = P(A) + P(B)$
- A,B are independent: $P(A \cap B) = P(A)\times P(B)$
- Genreal: $\vert A\cup B\vert = \vert A\vert + \vert B\vert -\vert A\cap B\vert$
Day 3 : Conditional probability
- A, B are considered to be independent if event: $P(B\vert A) = P(B)$
- $P(A\cap B) = P(B\vert A)\times P(A)$
- $P(B\vert A) = \dfrac{P(A\cap B)}{P(A)}$
- Bayes’ theorem:
- Permutations (hoán vị): take r-element permutation from n elements (don’t care about order): $nPr = \dfrac{n!}{(n-r)!}$
- Combinations (chỉnh hợp): take r-element from n elements (care about order): $nCr = \dfrac{nPr}{r!} = \dfrac{n!}{r!(n-r)!}$
Day 4 : Binomial distribution
- Tutorial link.
- Random variable X:is the real value function $X: S\to R$ in which there is an event for each interval $I\subseteq R$
- A binomial experiment (or Bernoulli trial) is a statistical experiment that has the following properties:
- The experiment consists of repeated trials.
- The trials are independent.
- The outcome of each trial is either success ($s$) or failure ($f$).
- Bernoulli Random Variable and Distribution: The sample space of a binomial experiment only contains two points, s and f. We define a Bernoulli random variable to be the random variable defined by $X(s)=1$ and $X(f)=0$. If we consider the probability of success to be p and the probability of failure to be q (where $q=1-p$), then the probability mass function (PMF) of is:
or
- Binomial distribution: is the binomial probability, meaning the probability of having exactly x successes out of n trials.
- The number of successes is x.
- The total number of trials is n.
- The probability of success of 1 trial is p.
- The probability of failure of 1 trial q, where q=1-p.
- Cumulative Probability (CDF): $F_X(x) = P(X\le x)$ and $P(a<X\le b) = F_X(b)-F_X(a)$.
Day 4 : Geometric Distribution
- Negative Binomial Experiment: A negative binomial experiment is a statistical experiment that has the following properties:
- The experiment consists of n repeated trials.
- The trials are independent.
- The outcome of each trial is either success (s) or failure (f).
- P(s) is the same for every trial.
- The experiment continues until x successes are observed.
- If x is the number of experiments until the xth success occurs, then X is a discrete random variable called a negative binomial.
- The number of successes to be observed is x.
- The total number of trials is n.
- The probability of success of 1 trial is p.
- The probability of failure of 1 trial q, where q=1-p.
-
$b^{\ast}(x,n,p)$ is the negative binomial probability, meaning the probability of having x-1 successes after n-1 trials and having x successes after n trials.
- Geometric Distribution: The geometric distribution is a special case of the negative binomial distribution that deals with the number of Bernoulli trials required to get a success (i.e., counting the number of failures before the first success). Recall that X is the number of successes in n independent Bernoulli trials, so for each i (where $1\le i \le n$):
- The geometric distribution is a negative binomial distribution where the number of successes is 1. We express this with the following formula:
Day 5 : Poisson Distribution
- A Poisson experiment is a statistical experiment that has the following properties:
- The outcome of each trial is either success or failure.
- The average number of successes ($\lambda$) that occurs in a specified region is known.
- The probability that a success will occur is proportional to the size of the region.
- The probability that a success will occur in an extremely small region is virtually zero.
- A Poisson random variable is the number of successes that result from a Poisson experiment. The probability distribution of a Poisson random variable is called a Poisson distribution:
- $e=2.71828$
- $\lambda$ is the average number of successes that occur in a specified region.
- k is the actual number of successes that occur in a specified region.
- $P(k,\lambda)$ is the Poisson probability, which is the probability of getting exactly k successes when the average number of successes is $\lambda$.
- Check examples & special case here.
- Special case: X has Poisson Distribution, $E[X] = \lambda = Var(X), Var(X) = E[X^2] - (E[X])^2$
Day 5 : Normal Distribution
- The probability density of normal distribution is:
- $\mu$ is the mean (or expectation) of the distribution. It is also equal to median and mode of the distribution.
- $\sigma^2$ is the variance.
- $\sigma$ is the standard deviation.
- In python, we can use
numpy.random.normal
(ref) - If $\mu=0, \sigma=1$ then the normal distribution is known as standard normal distribution
- Every normal distribution can be represented as standard normal distribution:
- Consider a real-valued random variable, X. The cumulative distribution function of X (or just the distribution function of X) evaluated at x is the probability that X will take a value less than or equal to x:
- The cumulative distribution function for a function with normal distribution is (erf = error function):
- In python, we can use
math.erf
function.