class: center, middle, inverse, title-slide # Data collection and data types ## ⚔
with xaringan ### Goran Kardum ### Department of Psychology ### 2021-10-11 --- ``` ## Loading required namespace: bibtex ``` # Data collection - definition Data collection is process to collect data which may contain of facts, figures, objects, symbols, and events gathered from different sources. --  --- # Process of collecting data draw... --- # Variables could be... Quantitative -- Qualitative -- - Quantitative could be: discrete and continuous --- # Data in psychology The outcome of psychology measurement is called a **variable**. -- But that is not simple as it looks like... -- For example... -- Age... How old are you? Could be expressed as number of years or month. In developmental psychology .... -- Could be expressed as exact number or categorical variable --- **Scales of measurement** is a concept for distinguishing between different types of variables: -- - Nominal scale -- - Ordinal scale -- - Interval scale -- - Ratio scale --- ## Nominal scale It's also called categorical variable -- - There are no particular relationships between/among different possibilities. -- - Bigger or better does not work -- - We could only said that some of the category is different then the other -- - gender, transport type, eye colour, work status... -- - possible but absurdly/pointless to express with statistical value like mean -- - usage of frequency or number of events --- ## Ordinal scale This variables are bit more structure/complex than previous -- - order the different possibilities -- - 1st, 2nd... -- - mathematical operations: rank order -- - descriptive statistics: mode, median, range statistics --- ## Interval scale Interval scale is quantitative scale where the numbers have different meaning than in the ordinal scale -- - **Differences** between the values are interpretable and 'make sense' -- - But... the interpretation between some ranges in for example in temperature... Is it possible to say that 20 C twice as hot as 10? -- - the problem with absolute zero -- - mathematical operation: addition, subtraction -- - statistical operation: mode, median, mean, range statistics, variance, standard deviation, parametric tests --- ## Ratio scale Ratio scale variable: -- - zero really means zero -- - RT, meters... -- - But, with some examples like RT - is it possible that absolute 'no time' exists? -- - all matmetamtical and statistical procedure --- # Type of variables according to models in research - Independent variable -- - Dependent variable -- Draw... --- # R structure
--- # R data types R atomic data types: -- character (characters and strings; "a", "name"...) -- numeric (real or decimal; 2, 3, 7, 8.15) -- integer (explicitly integer; 8L, 148L) -- logical (boolean values; true/false) -- complex (real + complex value: 5+7i) -- raw (any type store as raw bytes) --- # R data structures R objects: -- atomic vector -- list -- matrix -- array -- data frame -- factors --- # R functions R language have several important functions for objects or vectors: -- class() - what kind of object is it (high-level)? -- typeof() - what is the object’s data type (low-level)? -- length() - how long is it? What about two dimensional objects? -- attributes() - does it have any metadata? --- # Examples ```r a <- "abcdefgh" typeof(a) ``` ``` ## [1] "character" ``` ```r i <- 1:20 typeof(i) ``` ``` ## [1] "integer" ``` ```r i ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ``` ```r x <- c("a", "b", "c") typeof(x) ``` ``` ## [1] "character" ``` --- # Vectors The most important family of data types in base R. --
--- # Vectors - Atomic vectors: all elements must have the same type -- - List can have the different type of elements --- ## Atomic vectors There are for type of atomic vectors: -- - logical -- - integer -- - double -- - character, strings -- - Numeric are: integer and double --- ## Names in vectors Three ways to name vector -- ```r # when we create a vector i <- c(a = 1, b = 2, c = 3, d = 4) # assigning a character vector to names i <- 1:4 names(i) <- c("a", "b", "c", "d") # with setNames function i <- setNames(1:4, c("a", "b", "c", "d")) ``` --- ## Matrix and array When we use dim attributes that allows to have 2-dimensional **matrix** or multi-dimensional **array**. -- ```r ex_matrix <- matrix(1:8, nrow = 2, ncol = 4) ex_matrix ``` ``` ## [,1] [,2] [,3] [,4] ## [1,] 1 3 5 7 ## [2,] 2 4 6 8 ``` --- ## Array ```r ex_array <- array(1:16, c(2, 4, 2)) ex_array ``` ``` ## , , 1 ## ## [,1] [,2] [,3] [,4] ## [1,] 1 3 5 7 ## [2,] 2 4 6 8 ## ## , , 2 ## ## [,1] [,2] [,3] [,4] ## [1,] 9 11 13 15 ## [2,] 10 12 14 16 ``` --- ## Factors A factor is a vector that can contain only predifined values (Wickham, 2019). -- - it is used to store categorical data -- ```r s <- factor(c("a","d","f","g")) s ``` ``` ## [1] a d f g ## Levels: a d f g ``` -- ```r # ordered factors sch_grade <- ordered(c("d", "d", "b", "c"), levels = c("d", "c", "b")) sch_grade ``` ``` ## [1] d d b c ## Levels: d < c < b ``` --- ## Lists Lists are complex than atomic vectors because that each element can be any type. They could store character, string, number.... -- ```r list1 <- list(1:3, "x", c(TRUE, TRUE, FALSE), c(7.8, 8.9)) list1 ``` ``` ## [[1]] ## [1] 1 2 3 ## ## [[2]] ## [1] "x" ## ## [[3]] ## [1] TRUE TRUE FALSE ## ## [[4]] ## [1] 7.8 8.9 ``` --- ## Data frames next time... --- # References ``` ## You haven't cited any references in this bibliography yet. ``` NULL --- class: center, middle # Thanks! Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan). The chakra comes from [remark.js](https://remarkjs.com), [**knitr**](https://yihui.org/knitr/), and [R Markdown](https://rmarkdown.rstudio.com).