Types of Data and summarising Data

Types of Data

  • Types of data include:
    • Non-numerical/qualitative/categorical
      • When a variable/observation can only belong to a distinct category
      • Includes nominal & ordinal
NominalUnordered & mutually exclusive categories. Not possible to rank Examples – alive/dead, blood group
OrdinalOrdered  & mutually exclusive categories. Can be ranked. Difference or ‘gap’ between values can be ill-defined, Examples – mild/moderate/severe
  • Numerical/quantitative
    • When a variable/observation takes a numerical value
    • Can be continuous (infinitely dividable) or discrete (whole numbers)
    • Continuous data includes interval & ratio
IntervalContinuous, no reference. No ‘none’ or ‘true zero’ available. Difference or ‘gap’ between observations is equal at all points on the scale. Examples – temperature
RatioContinuous, with reference. There is a reference  or ‘zero’ Difference or ‘gap’ between observations is equal at all points on the scale. Examples – pain scores
  • Data can be converted to different types for analysis BUT this results in information loss and statistical power loss

Summarising Data

  • Statistical methods used for summarising data depend on the type of data your dealing with 

Measure of typical value
Mean (x̄ is the sample mean, μ is the population mean) – also known as the arithmetic mean – add up all numbers & divide by sample size
Geometric mean – used for right/positively skewed data – to do this, you need to take the log of each value, then calculate the mean, then back transform
Weighted mean – if certain variables are of particular interest, then a weighting can be added to those values
Median – middle value (or average of 2 middle values)
Mode – most common value

Measures of variance/spread
Range – difference between minimum and maximum value
Median – middle value (or average of 2 middle values)
Interquartile range – 25th to 75th centile
Standard deviation (σ) – square root of variance
Variance = The average of the squared differences from the mean. (First calculate the mean of a dataset. Then the difference between a data point and the mean. Square this difference, ie “squared differences”. The average of squared differences is the mean)

Frequency tables
Need statistical tests to interpret

Graphs
Visually illustrate frequency or proportion relative frequency (i.e. %)
Categorical or discrete – tend to use bar, pie chart
Continuous – tend to use histogram, dot plot, stem & leaf plot, box plot
If 2 continuous variables – can use scatter diagram
chart

Example:

Data: 13, 17, 21, 26, 27, 32

Gaps between bars
No gaps between columns, which all typically represent equal intervals

Stem and Leaf Plot:

13,   17,   21,   26,   27,   32
StemLeaf
13     7
21     6     7
32
  • Numerical/quantitative data
    • Summary statistics chosen depends on whether the data has a normal or non-normal distribution
    • Numerical/quantitative data can be described by using measures of the typical value and also measures of spread…

Mean equation:

Standard deviation equation (for a sample):

Where x is data value; x̄ is sample mean; n is sample size; Σ (sigma) means the “sum of”. Note the denominator for standard deviation of a population is “n” and not “n-1”. The -1 factor is used to account for a degree of error when SD is calculated for a sample.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s