class: center, middle, inverse, title-slide # Data collection, data types ## ⚔
with xaringan ### Goran Kardum ### Department of Psychology ### 2021-10-18 --- ``` ## Loading required namespace: bibtex ``` # Important terms, definitions... from the last lecture - Data collection -- - Quantitative variable -- - Qualitative variable -- - Discrete vs continuous -- - Scales of measurement (is a concept for distinguishing between different types of variables) --- ## Scales of measurement - Nominal scale -- - Ordinal scale -- - Interval scale -- - Ratio scale --- # Type of variables according to models in research - Do not confuse with dependent and independent research measure design -- - Independent variable -- - Dependent variable --- # R structure
--- # R data types R atomic data types: -- character (characters and strings; "a", "name"...) -- numeric (real or decimal; 2, 3, 7, 8.15) -- integer (explicitly integer; 8L, 148L) -- logical (boolean values; true/false) -- complex (real + complex value: 5+7i) -- raw (any type store as raw bytes) --- # R data structures R objects: -- atomic vector -- list -- matrix -- array -- data frame -- factors --- # R functions R language have several important functions for objects or vectors: -- class() - what kind of object is it (high-level)? -- typeof() - what is the object’s data type (low-level)? -- length() - how long is it? What about two dimensional objects? -- attributes() - does it have any metadata? --- # Examples ```r a <- "abcdefgh" typeof(a) ``` ``` ## [1] "character" ``` ```r i <- 1:20 typeof(i) ``` ``` ## [1] "integer" ``` ```r i ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ``` ```r x <- c("a", "b", "c") typeof(x) ``` ``` ## [1] "character" ``` --- # Vectors The most important family of data types in base R. --
--- # Vectors - Atomic vectors: all elements must have the same type -- - List can have the different type of elements --- ## Atomic vectors There are four type of atomic vectors: -- - logical -- - integer -- - double -- - character, strings -- - Numeric are: integer and double --- ## Character, string ```r months <- c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December") months ``` ``` ## [1] "January" "February" "March" "April" "May" "June" ## [7] "July" "August" "September" "October" "November" "December" ``` -- - It's possible to access the exact position -- ```r months[7] ``` ``` ## [1] "July" ``` -- - The number of letters in values (strings) of variables -- ```r nchar(x=months) ``` ``` ## [1] 7 8 5 5 3 4 4 6 9 7 8 8 ``` --- ## Names in vectors Three ways to name vector -- ```r # when we create a vector i <- c(a = 1, b = 2, c = 3, d = 4) # assigning a character vector to names i <- 1:4 names(i) <- c("a", "b", "c", "d") # with setNames function i <- setNames(1:4, c("a", "b", "c", "d")) ``` --- ## Matrix and array When we use dim attributes that allows to have 2-dimensional **matrix** or multi-dimensional **array**. -- ```r ex_matrix <- matrix(1:8, nrow = 2, ncol = 4) ex_matrix ``` ``` ## [,1] [,2] [,3] [,4] ## [1,] 1 3 5 7 ## [2,] 2 4 6 8 ``` --- ## Array ```r ex_array <- array(1:16, c(2, 4, 2)) ex_array ``` ``` ## , , 1 ## ## [,1] [,2] [,3] [,4] ## [1,] 1 3 5 7 ## [2,] 2 4 6 8 ## ## , , 2 ## ## [,1] [,2] [,3] [,4] ## [1,] 9 11 13 15 ## [2,] 10 12 14 16 ``` --- ## Factors A factor is a vector that can contain only predefined values (Wickham, 2019). -- - it is used to store categorical data -- ```r s <- factor(c("a","d","f","g")) s ``` ``` ## [1] a d f g ## Levels: a d f g ``` -- - ordered factors are a minor variation of factor ```r # ordered factors sch_grade <- ordered(c("d", "d", "b", "c"), levels = c("d", "c", "b")) sch_grade ``` ``` ## [1] d d b c ## Levels: d < c < b ``` --- ## Lists Lists are complex than atomic vectors because that each element can be any type. They could store character, string, number.... -- ```r list1 <- list(1:3, "x", c(TRUE, TRUE, FALSE), c(7.8, 8.9)) list1 ``` ``` ## [[1]] ## [1] 1 2 3 ## ## [[2]] ## [1] "x" ## ## [[3]] ## [1] TRUE TRUE FALSE ## ## [[4]] ## [1] 7.8 8.9 ``` -- Data frames are specific type of list --- ## Data frames It's collection of variables and one type of the list. -- Before - only variables in workspace / Environment (RStudio) -- For an example... ```r gender <- factor(c(1,2,2,2,1)) levels(gender) <- c("male","female") group <- c(1,2,1,2,1) levels(group) <- c("control","experimental") age <- c(21,24,23,27,31) ``` -- That variables exists only as separate variables in R workspace... until... --- ## Data frames Now we combine variables into **data.frame** -- ```r df_example <- data.frame (gender,group,age) df_example ``` ``` ## gender group age ## 1 male 1 21 ## 2 female 2 24 ## 3 female 1 23 ## 4 female 2 27 ## 5 male 1 31 ``` --- ## Tibbles Tibble is part of **tidyverse** package and that is the second type of list. -- a modern reimagining of the data frame (Wickham, 2021) -- There are two main differences in the usage of a **tibble** vs. a classic **data.frame**: printing and subsetting (Wickham, Grolemund 2017). -- ```r library(tidyverse) ``` ``` ## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ── ``` ``` ## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4 ## ✓ tibble 3.1.3 ✓ dplyr 1.0.7 ## ✓ tidyr 1.1.3 ✓ stringr 1.4.0 ## ✓ readr 2.0.0 ✓ forcats 0.5.1 ``` ``` ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() ``` ```r # create example of data frame data_df <- data.frame(a = 1:3, b = letters[1:3], c = Sys.Date() - 1:3) data_df ``` ``` ## a b c ## 1 1 a 2021-10-17 ## 2 2 b 2021-10-16 ## 3 3 c 2021-10-15 ``` -- ```r # create example of tibbles as_tibble(data_df) ``` ``` ## # A tibble: 3 × 3 ## a b c ## <int> <chr> <date> ## 1 1 a 2021-10-17 ## 2 2 b 2021-10-16 ## 3 3 c 2021-10-15 ``` --- ## Tibbles  --- ## One more object in the R Formulas -- Functions - generic functions (e.g. summary(), plot()) -- - function within some library --- ## Introduction to descriptive statistics - distribution of the data (vectors, factors) -- - normal distributian (Gaussian distribution) -- - sampling (sample size - N) --- # References ``` ## You haven't cited any references in this bibliography yet. ``` NULL --- class: center, middle # Thanks! Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan). The chakra comes from [remark.js](https://remarkjs.com), [**knitr**](https://yihui.org/knitr/), and [R Markdown](https://rmarkdown.rstudio.com).