class: center, middle, inverse, title-slide # Descriptive statistics ## ⚔
with xaringan ### Goran Kardum ### Department of Psychology ### 2021-10-25 --- ``` ## Loading required namespace: bibtex ``` # Descriptive statistics The aim of calculating descriptive statistics is to summarize the sample you have collected (Cox, 2017) -- - The main characteristics that are of interest when dealing with continuous data are the shape (distribution), the location, statistics (central tendency and variability) and the spread (dispersion) of the data. -- - Appropriate presenting the results of descriptive statistics (table, figures) -- - Descriptive statistics aim to summarize a sample, rather than use the data to learn about the population -- - Descriptive statistics are not developed on the basis of probability theory --- # Descriptive statistics - Choosing which summary statistics are appropriate depend on the type of variable being examined. -- - Different statistics should be used for interval/ratio, ordinal, and nominal data. --- # Mean - In statistics, the term average refers to any of the measures of central tendency! -- - Psychologists are often interested in looking at how data points tend to group around a central value. -- - The arithmetic mean of a set of observed data is defined as being equal to the sum of the numerical values of each and every observation, divided by the total number of observations. -- - It is greatly influenced by **outliers** (values that are very much larger or smaller than most of the values) --- # Mean (Navarro, 2015) - If your data are nominal scale, you probably shouldn’t be using either the mean or the median. -- - If your data are ordinal scale, you’re more likely to want to use the median than the mean. -- - For interval and ratio scale data, either one is generally acceptable --- ## Mean - It is not appropriate for skewed distributions. In probability theory and statistics, **skewness** is a measure of the asymmetry of the distribution. -- - It is appropriate for larger number of observations (N>20 or N>30) -- - Arithmetic mean for ungrouped data and arithmetic mean for grouped data --- # Formula for mean - Mean x̄ = Sum of all observations / Number of observations -- $$ \bar{x} = \frac{1}{N}\sum_{i=1}^{N} x_i $$ --- # How mean is the mean? - The most common form of the mean that is used in psychology is the arithmetic mean. -- - In a linear world, averages make perfect sense. -- - The uncritical or unreflective use of the mean in much psychological research makes us problematically blind to variation and distribution amongst the data we collect (Speelman, McGann, 2013). -- - Alternative model: Psychology are used to explore methods of research less mean-dependent and suggest that a critical assessment of the assumptions underlying its use in research play a more explicit role in the process of study design and review. --- ## Assumptions underlying the use of the mean in psychology research (Speelman, McGann, 2013) - Assumption 1: There is a **True Value** that we are Trying to Approximate When we Measure Humans on Some Dimension -- - Assumption 2: Averaging Helps us to Eliminate the Noise in Our Measures to See the True Value -- - Assumption 3: Any Inability to Use the Mean as a Reliable Measure of a Stable Characteristic is a Product of Weaknesses in Methodology or Calculation (i.e., it does not Represent a Failure in the Initial Assumption that a True Value Exists) -- - Assumption 4: The Noise in Our Measurements Represents the Effects of Variables Unrelated to the One Being Measured --- # Median - the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution -- - The Median is the **middle** of a sorted list of numbers. -- - it is not dependent/affected of extreme values -- - appropriate for lower number of observation -- - appropriate when we could not use arithmetic mean --- # Median - example ```r sample_x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9) median(sample_x) ``` ``` ## [1] 5 ``` -- ```r sample_xy <- c(sample_x, 10) sample_xy ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 10 ``` ```r median(sample_xy) ``` ``` ## [1] 5.5 ``` --- # Mode The mode of a sample is very simple: it is the value that occurs most frequently. -- - appropriate for variable on nominal scale -- - in some cases, examples and real life situation we would like to calculate on ordinal, interval and ratio scale -- - usage on frequency - in tables --- # Comparison of means  --- ## Practical usage in table ```r library(psych) library(kableExtra) good_tbl <- describe(sat.act) kable(good_tbl) ``` <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> vars </th> <th style="text-align:right;"> n </th> <th style="text-align:right;"> mean </th> <th style="text-align:right;"> sd </th> <th style="text-align:right;"> median </th> <th style="text-align:right;"> trimmed </th> <th style="text-align:right;"> mad </th> <th style="text-align:right;"> min </th> <th style="text-align:right;"> max </th> <th style="text-align:right;"> range </th> <th style="text-align:right;"> skew </th> <th style="text-align:right;"> kurtosis </th> <th style="text-align:right;"> se </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> gender </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 700 </td> <td style="text-align:right;"> 1.647143 </td> <td style="text-align:right;"> 0.4782004 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 1.683929 </td> <td style="text-align:right;"> 0.0000 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> -0.6145233 </td> <td style="text-align:right;"> -1.6246760 </td> <td style="text-align:right;"> 0.0180743 </td> </tr> <tr> <td style="text-align:left;"> education </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 700 </td> <td style="text-align:right;"> 3.164286 </td> <td style="text-align:right;"> 1.4253515 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 3.307143 </td> <td style="text-align:right;"> 1.4826 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> -0.6807999 </td> <td style="text-align:right;"> -0.0748912 </td> <td style="text-align:right;"> 0.0538732 </td> </tr> <tr> <td style="text-align:left;"> age </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 700 </td> <td style="text-align:right;"> 25.594286 </td> <td style="text-align:right;"> 9.4986466 </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 23.862500 </td> <td style="text-align:right;"> 5.9304 </td> <td style="text-align:right;"> 13 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:right;"> 52 </td> <td style="text-align:right;"> 1.6430573 </td> <td style="text-align:right;"> 2.4243053 </td> <td style="text-align:right;"> 0.3590151 </td> </tr> <tr> <td style="text-align:left;"> ACT </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 700 </td> <td style="text-align:right;"> 28.547143 </td> <td style="text-align:right;"> 4.8235599 </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 28.842857 </td> <td style="text-align:right;"> 4.4478 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 36 </td> <td style="text-align:right;"> 33 </td> <td style="text-align:right;"> -0.6564026 </td> <td style="text-align:right;"> 0.5349691 </td> <td style="text-align:right;"> 0.1823134 </td> </tr> <tr> <td style="text-align:left;"> SATV </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 700 </td> <td style="text-align:right;"> 612.234286 </td> <td style="text-align:right;"> 112.9025659 </td> <td style="text-align:right;"> 620 </td> <td style="text-align:right;"> 619.453571 </td> <td style="text-align:right;"> 118.6080 </td> <td style="text-align:right;"> 200 </td> <td style="text-align:right;"> 800 </td> <td style="text-align:right;"> 600 </td> <td style="text-align:right;"> -0.6438111 </td> <td style="text-align:right;"> 0.3251946 </td> <td style="text-align:right;"> 4.2673159 </td> </tr> <tr> <td style="text-align:left;"> SATQ </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 687 </td> <td style="text-align:right;"> 610.216885 </td> <td style="text-align:right;"> 115.6392972 </td> <td style="text-align:right;"> 620 </td> <td style="text-align:right;"> 617.254083 </td> <td style="text-align:right;"> 118.6080 </td> <td style="text-align:right;"> 200 </td> <td style="text-align:right;"> 800 </td> <td style="text-align:right;"> 600 </td> <td style="text-align:right;"> -0.5929212 </td> <td style="text-align:right;"> -0.0177603 </td> <td style="text-align:right;"> 4.4119144 </td> </tr> </tbody> </table> --- - draw example of hist function --- # Standard deviation - The standard deviation is a measure of variation which is commonly used with interval/ratio data -- It’s a measurement of how close the observations in the data set are to the mean. -- - If the deviations from the mean are generally large, the standard deviation will be large. -- - Population mean is often represented with the letter mu $$ {\mu} $$, and the population standard deviation is represented with the letter sigma $$ {\sigma} $$ -- - The value of SD = 7.5. Is it small or large? Depends of mean value. -- - Standard deviation may not be appropriate for skewed data (This is the reason among the others that we must first draw the figure - distribution of data. -- ## Formula for SD `$$\sigma = \sqrt{\frac{\sum\limits_{i=1}^{n} \left(x_{i} - \bar{x}\right)^{2}} {n-1}}$$` --- # Example Create variable with some data -- ```r data <- c(5.56, 3.45, 4.55, 6.2, 4.8, 6.8, 7.2, 8.3, 3.2, 7.7, 6.9, 4.1, 8.3, 7.5, 7.7, 4.3, 3.8, 2.8, 3.2, 8.9) mean(data) ``` ``` ## [1] 5.763 ``` ```r median(data) ``` ``` ## [1] 5.88 ``` ```r library(psych) datas_1<- rnorm(20, mean = 5.7, sd = 2) SD(data) ``` ``` ## [1] 2.004275 ``` ```r median(datas_1) ``` ``` ## [1] 5.744962 ``` --- # References ``` ## You haven't cited any references in this bibliography yet. ``` NULL --- class: center, middle # Thanks! Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan). The chakra comes from [remark.js](https://remarkjs.com), [**knitr**](https://yihui.org/knitr/), and [R Markdown](https://rmarkdown.rstudio.com).