100 Statistics Concept to Remember

Posted on Sep 13, 2024 @ 08:14 AM under Statistics Machine Learning

1. Mean (Average)

Definition: The sum of all values in a dataset divided by the number of values.
Example: For the dataset [4, 8, 6], the mean is (4 + 8 + 6) / 3 = 6.

2. Median

Definition: The middle value in a dataset when it is ordered from least to greatest. If there is an even number of values, it is the average of the two middle values.
Example: For the dataset [3, 5, 7], the median is 5. For [3, 5, 7, 9], the median is (5 + 7) / 2 = 6.

3. Mode

Definition: The value that occurs most frequently in a dataset.
Example: In the dataset [1, 2, 2, 3, 4], the mode is 2.

4. Range

Definition: The difference between the maximum and minimum values in a dataset.
Example: For the dataset [5, 8, 12], the range is 12 - 5 = 7.

5. Variance

Definition: A measure of how much values in a dataset vary from the mean, calculated as the average of the squared differences from the mean.
Example: For the dataset [2, 4, 4, 4, 5, 5, 7, 9], the variance is 4.
If you are using the sample variance to estimate the population variance, remember to divide by n -1 to correct for potential underestimation of the population variance.
The formula for sample variance $s^2$ is:

$ s^2 = \frac{1}{n - 1} \sum_{i=1}^{n} (x_i - \bar{x})^2 $

6. Standard Deviation

Definition: The square root of the variance, representing the average distance of each data point from the mean.
Example: For the dataset [2, 4, 4, 4, 5, 5, 7, 9], the standard deviation is 2.

7. Percentile

Definition: A percentile is a way of understanding where a particular value stands in relation to a whole set of data, based on the percentage of data that falls below it. In other words, the Nth percentile is the value below which N% of the data falls.
Example: Imagine you have a list of test scores for 100 students. If your score is in the 90th percentile, it means you scored better than 90% of the students.

8. Quartiles

Definition: Values that divide a dataset into four equal parts. The first quartile (Q1) is the 25th percentile, the second quartile (Q2) is the median, and the third quartile (Q3) is the 75th percentile.
Example: For the dataset [1, 2, 3, 4, 5, 6, 7, 8, 9], Q1 is 3, Q2 is 5, and Q3 is 7.
Usefulness: Helps to understand the distribution of data and also show where the middle of data is, in addition to identifying outliers.

9. Interquartile Range (IQR)

Definition: The range between the first quartile (Q1) and the third quartile (Q3), measuring the spread of the middle 50% of the data.
Example: For the dataset [1, 2, 3, 4, 5, 6, 7, 8, 9], IQR is 7 - 3 = 4.

10. Skewness

Definition: A measure of the asymmetry of the distribution of values in a dataset.
Example: A right-skewed dataset might have a long tail on the right side, such as income data where most people earn lower wages but a few earn very high wages.

11. Kurtosis

Definition: A measure of the "tailedness" of the distribution, indicating whether the data has heavy or light tails compared to a normal distribution.
Example: A dataset with high kurtosis has more outliers (extreme values) compared to a normal distribution.

12. Correlation

Definition: A statistical measure that describes the strength and direction of a relationship between two variables (+1 perfect positive correlation, -1 perfect negative correlation, 0 no correlation).
Example: Height and weight are typically positively correlated; as height increases, weight tends to increase.

13. Regression

Definition: A statistical technique for modeling and analyzing the relationship between a dependent variable and one or more independent variables.
Example: Predicting a person's weight based on their height using a linear regression model.

14. Linear Regression

Definition: A type of regression analysis where the relationship between the dependent variable and one or more independent variables is modeled as a linear function.
Example: Predicting house prices based on the size of the house using a line equation.

15. Multiple Regression

Definition: A type of regression that models the relationship between a dependent variable and two or more independent variables.
Example: Predicting a student’s final grade based on hours studied, attendance, and participation.

16. Chi-Square Test

Definition: A statistical test used to determine if there is a significant association between categorical variables.
Example: Testing if there is an association between gender and preference for a type of product.

17. T-Test

Definition: A statistical test used to compare the means of two groups and determine if they are significantly different from each other.
Example: Comparing the average test scores of two different teaching methods to see if one is more effective.

18. ANOVA (Analysis of Variance)

Definition: A statistical method used to compare the means of three or more groups to determine if at least one group mean is significantly different from the others.
Example: Comparing the average scores of students from three different teaching methods to see if there is a significant difference in their performance.

19. P-Value

Definition: A measure that helps determine the significance of results in hypothesis testing. It indicates the probability of obtaining the observed results, or more extreme results, assuming the null hypothesis is true.
Example: A p-value less than 0.05 typically suggests that the observed effect is statistically significant.

20. Confidence Interval

Definition: A range of values that is likely to contain the true value of an unknown parameter with a certain level of confidence.
Example: A 95% confidence interval for a mean might be [45, 55], indicating that there is a 95% chance that the true mean falls within this range.

21. Null Hypothesis

Definition: The hypothesis that there is no effect or no difference, and it serves as the default or starting assumption in statistical testing.
Example: In a drug efficacy study, the null hypothesis might state that the new drug has no effect compared to a placebo.

22. Alternative Hypothesis

Definition: The hypothesis that there is an effect or a difference, opposing the null hypothesis.
Example: In the same drug study, the alternative hypothesis might state that the new drug has a significant effect compared to the placebo.

23. Type I Error

Definition: The error made when rejecting a true null hypothesis (false positive).
Example: Concluding that a drug is effective when it actually is not.

24. Type II Error

Definition: The error made when failing to reject a false null hypothesis (false negative).
Example: Concluding that a drug is not effective when it actually is.

25. Bayesian Statistics

Definition: A statistical approach that uses Bayes’ Theorem to update the probability of a hypothesis as more evidence or information becomes available.
Example: Updating the probability of a patient having a disease based on new test results and prior knowledge of disease prevalence.

26. Probability Distribution

Definition: A function that describes the likelihood of different outcomes in an experiment.
Example: The normal distribution, which is bell-shaped and describes many natural phenomena, such as heights of people.

27. Normal Distribution

Definition: A continuous probability distribution that is symmetric about the mean, with the majority of data clustering around the mean.
Example: Heights of adults typically follow a normal distribution, with most people being of average height and fewer people being extremely tall or short.

28. Binomial Distribution

Definition: A discrete probability distribution representing the number of successes in a fixed number of independent Bernoulli trials.
Example: Flipping a coin 10 times and counting the number of heads, where each flip is a Bernoulli trial.

29. Poisson Distribution

Definition: A discrete probability distribution used to model the number of events occurring within a fixed interval of time or space.
Example: The number of customer arrivals at a store in an hour.

30. Exponential Distribution

Definition: A continuous probability distribution used to model the time between events in a Poisson process.
Example: The time between phone calls at a call center.

31. Sampling

Definition: The process of selecting a subset of individuals from a population to estimate characteristics of the whole population.
Example: Surveying 100 people from a city of 10,000 to understand the overall opinion on a new policy.

32. Random Sampling

Definition: A sampling method where each individual in the population has an equal chance of being selected.
Example: Drawing names from a hat to select participants for a study.

33. Stratified Sampling

Definition: A sampling method where the population is divided into distinct subgroups, and random samples are taken from each subgroup.
Example: Sampling students from different grade levels in a school to ensure representation from each grade.

34. Systematic Sampling

Definition: A sampling method where every nth individual is selected from a list or sequence.
Example: Selecting every 10th person on a list of 1,000 employees for a survey.

35. Cluster Sampling

Definition: A sampling method where the population is divided into clusters, and entire clusters are randomly selected.
Example: Selecting entire schools from a district to survey all students in those schools.

36. Bias

Definition: A systematic error that leads to incorrect or unfair estimates or conclusions in statistical analysis.
Example: Surveying only online users to understand the preferences of the general population, which might not be representative.

37. Outliers

Definition: Data points that are significantly different from the majority of data in a dataset.
Example: A single extremely high income in a dataset of middle-income earners.

38. Hypothesis Testing

Definition: A statistical method used to determine if there is enough evidence in a sample to infer that a certain condition is true for the population.
Example: Testing if a new teaching method improves test scores compared to a traditional method.

39. Effect Size

Definition: A quantitative measure of the strength or magnitude of a phenomenon, indicating the practical significance of the results.
Example: Measuring the difference in test scores between two teaching methods and quantifying how substantial the difference is.

40. Power of a Test

Definition: The probability that a statistical test will correctly reject a false null hypothesis (i.e., detect an effect if there is one).
Example: A test with 80% power has an 80% chance of detecting a true effect if it exists.

41. Point Estimate

Definition: A single value that estimates a population parameter.
Example: Using the sample mean to estimate the population mean.

42. Interval Estimate

Definition: A range of values used to estimate a population parameter, giving a range within which the parameter is expected to lie.
Example: A 95% confidence interval for a mean might be [45, 55], indicating that the true mean is likely between 45 and 55 with 95% confidence.

43. Central Limit Theorem

Definition: A statistical theorem that states that the distribution of the sample mean approaches a normal distribution as the sample size becomes large, regardless of the population’s distribution.
Example: The average height of a sample of 30 people will approximately follow a normal distribution, even if the height distribution of the entire population is not normal.

44. Bayes’ Theorem

Definition: A formula used to update the probability of a hypothesis based on new evidence.
Example: Updating the probability of having a disease based on a positive test result and prior knowledge of disease prevalence.

45. Marginal Probability

Definition: The probability of an event occurring irrespective of the outcome of another variable.
Example: The probability of a student passing an exam regardless of their major.

46. Joint Probability

Definition: The probability of two events occurring simultaneously.
Example: The probability of a student passing an exam and also being a senior.

47. Conditional Probability

Definition: The probability of an event occurring given that another event has already occurred.
Example: The probability of passing an exam given that a student has attended all classes.

48. Likelihood

Definition: A measure of how well a statistical model explains the observed data, often used in parameter estimation.
Example: Estimating the parameters of a normal distribution by maximizing the likelihood function.

49. Probability Mass Function (PMF)

Definition: A function that gives the probability of each possible value in a discrete probability distribution.
Example: For a fair six-sided die, the PMF gives a probability of 1/6 for each face of the die.

50. Probability Density Function (PDF)

Definition: A function that describes the likelihood of a continuous random variable taking on a particular value.
Example: The PDF of a normal distribution describes the probability density of different values along the distribution's curve.

51. Cumulative Distribution Function (CDF)

Definition: A function that gives the probability that a random variable takes on a value less than or equal to a certain value.
Example: The CDF of a normal distribution shows the probability that a value is below a certain threshold.

52. Moment Generating Function (MGF)

Definition: A function that summarizes all the moments of a probability distribution, useful for finding the expected value and variance.
Example: Using the MGF of a normal distribution to derive its mean and variance.

53. Law of Large Numbers

Definition: A principle that states as the sample size increases, the sample mean will get closer to the population mean.
Example: Rolling a die many times, the average result will approach the expected value of 3.5.

54. Empirical Rule

Definition: A rule stating that for a normal distribution, approximately 68% of the data falls within one standard deviation of the mean, 95% within two, and 99.7% within three.
Example: In a dataset of test scores with a normal distribution, about 68% of scores will fall within one standard deviation of the mean.

55. Z-Score

Definition: A measure of how many standard deviations an element is from the mean of the distribution.
Example: A Z-score of 2 indicates that a data point is two standard deviations above the mean.

56. T-Distribution

Definition: A probability distribution used in hypothesis testing when the sample size is small and the population standard deviation is unknown.
Example: Using the T-distribution to determine confidence intervals for small sample sizes in a study.

57. Chi-Square Distribution

Definition: A probability distribution used in hypothesis testing for categorical data, particularly in tests of independence and goodness of fit.
Example: Using the chi-square distribution to test if there is a significant association between two categorical variables.

58. F-Distribution

Definition: A probability distribution used in analysis of variance (ANOVA) and regression analysis, comparing variances between groups.
Example: Using the F-distribution to test if there are significant differences in mean test scores between multiple teaching methods.

59. Non-Parametric Tests

Definition: Statistical tests that do not assume a specific distribution for the data, often used for ordinal or nominal data.
Example: The Mann-Whitney U test for comparing differences between two independent groups when the data is not normally distributed.

60. Parametric Tests

Definition: Statistical tests that assume a specific distribution for the data, often used for interval or ratio data.
Example: The t-test assumes normal distribution and is used for comparing the means of two groups.

61. Bootstrap Method

Definition: A resampling technique used to estimate the distribution of a statistic by repeatedly sampling with replacement from the data.
Example: Estimating the confidence interval for the mean by creating many bootstrap samples from the original dataset.

62. Jackknife Resampling

Definition: A resampling technique that systematically leaves out one observation at a time from the dataset to estimate the distribution of a statistic.
Example: Estimating the standard error of a mean by leaving out one data point at a time and recalculating the mean.

63. Survival Analysis

Definition: A statistical method used to analyze the time until an event occurs, such as death, failure, or other life events.
Example: Analyzing the time until patients experience a relapse after treatment.

64. Cox Proportional-Hazards Model

Definition: A statistical model used in survival analysis to explore the relationship between the survival time and one or more predictor variables.
Example: Examining how factors like age and treatment type affect the survival time of cancer patients.

65. Logistic Regression

Definition: A regression model used for binary classification tasks, predicting the probability of a categorical outcome.
Example: Predicting whether a student will pass or fail an exam based on study hours and attendance.

66. Discriminant Analysis

Definition: A statistical technique used to classify observations into predefined classes based on predictor variables.
Example: Classifying customers as "high risk" or "low risk" based on their credit scores and income.

67. Factor Analysis

Definition: A technique used to identify underlying relationships between variables and group them into factors that explain the patterns in the data.
Example: Identifying latent variables like "socioeconomic status" that explain patterns in survey responses about education, income, and occupation.

68. Principal Component Analysis (PCA)

Definition: A dimensionality reduction technique that transforms data into a set of orthogonal (uncorrelated) components to capture the maximum variance.
Example: Reducing the number of features in a dataset while retaining most of the variance to simplify analysis.

69. Canonical Correlation Analysis

Definition: A statistical method used to examine the relationships between two sets of multivariate variables.
Example: Analyzing the relationship between physical and cognitive test scores in a study.

70. Multivariate Analysis of Variance (MANOVA)

Definition: An extension of ANOVA that examines the differences in multiple dependent variables simultaneously.
Example: Testing if different teaching methods affect students' scores in math and science.

71. Hierarchical Clustering

Definition: A clustering method that builds a hierarchy of clusters, either agglomeratively (bottom-up) or divisively (top-down).
Example: Grouping customers into hierarchical segments based on purchasing behavior.

72. K-Means Clustering

Definition: A clustering algorithm that partitions data into K distinct clusters based on feature similarity.
Example: Segmenting customers into K groups based on their purchasing habits.

73. Silhouette Score

Definition: A metric used to evaluate the quality of clustering, measuring how similar an object is to its own cluster compared to other clusters.
Example: Calculating the silhouette score to determine how well-separated the clusters are in a customer segmentation task.

74. Eigenvalues and Eigenvectors

Definition: Mathematical concepts used in various analyses, where eigenvalues represent the variance captured by each eigenvector in a matrix.
Example: In PCA, eigenvectors represent principal components, and eigenvalues indicate the amount of variance captured by each component.

75. Confusion Matrix

Definition: A table used to evaluate the performance of a classification algorithm, showing true positives, false positives, true negatives, and false negatives.
Example: Evaluating a model that classifies emails as "spam" or "not spam" by showing how many emails were correctly or incorrectly classified.

76. Precision

Definition: The proportion of true positive predictions among all positive predictions made by a classification model.
Example: In a medical test, precision is the fraction of true positives (correctly identified cases of a disease) out of all positive test results.

77. Recall (Sensitivity)

Definition: The proportion of true positive predictions among all actual positive instances.
Example: In a medical test, recall is the fraction of actual positive cases (people with the disease) that the test correctly identifies.

78. F1 Score

Definition: The harmonic mean of precision and recall, providing a single metric to evaluate a classification model's performance.
Example: If a model has a precision of 0.8 and recall of 0.6, the F1 score combines these to give a balanced performance measure.

79. ROC Curve (Receiver Operating Characteristic Curve)

Definition: A graphical plot that illustrates the diagnostic ability of a binary classification model across different threshold values.
Example: Plotting the true positive rate (sensitivity) versus the false positive rate (1 - specificity) to assess a model's performance.

80. AUC (Area Under the ROC Curve)

Definition: The area under the ROC curve, representing the model's ability to discriminate between positive and negative classes.
Example: An AUC of 0.8 indicates that the model has good performance in distinguishing between positive and negative instances.

81. Gini Coefficient

Definition: A measure of statistical dispersion used to quantify inequality, often used in credit scoring to assess the discriminatory power of a model.
Example: A Gini coefficient of 0.6 indicates a model's strong ability to discriminate between good and bad credit risks.

82. Cross-Validation

Definition: A technique for assessing how the results of a statistical analysis generalize to an independent dataset, often used to validate the performance of a model.
Example: Using k-fold cross-validation to train and test a model on different subsets of data to ensure its robustness.

83. Leave-One-Out Cross-Validation

Definition: A type of cross-validation where one observation is used as the validation set while the remaining observations are used as the training set.
Example: In a dataset with 100 samples, training a model 100 times, each time leaving out one sample for validation.

84. Bootstrap Aggregating (Bagging)

Definition: An ensemble method that improves model accuracy by training multiple models on different random subsets of the data and averaging their predictions.
Example: Using bagging to train several decision trees on different subsets of data and combining their predictions to make a final decision.

85. Boosting

Definition: An ensemble method that builds multiple models sequentially, each one correcting errors made by the previous models.
Example: Using AdaBoost to sequentially train models, where each new model focuses on the mistakes of the previous ones to improve overall accuracy.

86. Gradient Boosting

Definition: A boosting technique that builds models in a stage-wise fashion, optimizing the loss function using gradient descent.
Example: Using Gradient Boosting Machines (GBM) to improve predictive performance by iteratively adding models that correct errors from previous iterations.

87. Support Vector Machine (SVM)

Definition: A classification algorithm that finds the hyperplane that best separates different classes in the feature space.
Example: Using SVM to classify emails as spam or not spam by finding the optimal boundary that maximizes the margin between classes.

88. Naive Bayes

Definition: A classification algorithm based on Bayes’ theorem with the assumption of independence between features.
Example: Using Naive Bayes to classify news articles into topics based on the frequency of words, assuming that the presence of one word is independent of others.

89. Principal Component Analysis (PCA)

Definition: A technique used for dimensionality reduction that transforms data into a new coordinate system where the greatest variance lies on the first coordinate (principal component).
Example: Reducing the number of features in a dataset by projecting it onto the first few principal components while retaining most of the variance.

90. Hierarchical Clustering

Definition: A clustering method that builds a hierarchy of clusters, either agglomeratively (bottom-up) or divisively (top-down).
Example: Grouping animals based on their characteristics into a hierarchy, starting from individual species and combining them into broader categories.

91. K-Means Clustering

Definition: A clustering algorithm that partitions data into K distinct clusters based on feature similarity, where each cluster is represented by its centroid.
Example: Segmenting customers into K groups based on their purchasing behavior to tailor marketing strategies.

92. Silhouette Score

Definition: A metric used to evaluate the quality of clustering by measuring how similar an object is to its own cluster compared to other clusters.
Example: Calculating the silhouette score to assess the coherence of clusters formed in customer segmentation.

93. Bayesian Network

Definition: A probabilistic graphical model that represents variables and their conditional dependencies using a directed acyclic graph.
Example: Modeling the relationship between symptoms and diseases to infer the likelihood of a disease given observed symptoms.

94. Markov Chain

Definition: A statistical model that represents systems which transition from one state to another based on certain probabilities, with the future state dependent only on the current state.
Example: Modeling weather patterns where tomorrow's weather depends only on today's weather and not on previous days.

95. Meta-Models

Definition: Models designed to enhance the performance or selection of other models by operating at a higher level, often combining predictions or optimizing model parameters.
Example: Using stacking to combine predictions from multiple base models and improve overall performance with a meta-model that aggregates their outputs.

96. Maximum Likelihood Estimation (MLE)

Definition: A method of estimating parameters by maximizing the likelihood function, which measures how likely it is to observe the given data under different parameter values.
Example: Estimating the mean and variance of a normal distribution by finding the parameters that make the observed data most probable.

97. Expectation-Maximization (EM)

Definition: An iterative algorithm used to find maximum likelihood estimates of parameters in models with latent variables by alternating between estimating missing data (expectation) and optimizing parameters (maximization).
Example: Estimating the parameters of a Gaussian Mixture Model where the model contains latent variables representing the component distributions.

98. Resampling Methods

Definition: Techniques used to assess the stability and performance of statistical estimates by repeatedly drawing samples from the original data.
Example: Using the bootstrap method to estimate confidence intervals for a statistic by resampling the original dataset with replacement.

99. Time Series Analysis

Definition: A statistical technique used to analyze data points collected or recorded at specific time intervals to identify trends, seasonal patterns, and other temporal behaviors, and to forecast future values.
Example: Analyzing monthly sales data to identify seasonal trends and predict future sales.

100. Autocorrelation

Definition: A measure of how a time series is correlated with a lagged version of itself. It assesses the relationship between a variable's current value and its past values.
Example: In a monthly temperature series, autocorrelation helps to determine how current temperatures are related to temperatures from previous months. For example, high autocorrelation might suggest that if it's warm this month, it’s likely to be warm next month as well.