Understanding your data – part 1

To understand the data structure and type, summary statistics and visualization of the distribution of data is much important

For basic, A data can be of two types, “Categorical/Qualitative” or “Numerical/Quantative”

Categorical can be of two types – a. Ordinal or b. Non-ordinal

Numerical data can be of two types – a. Discrete or b. Continuous

For an example, let say we measure the plants height from a experimental field which is accordingly:

P <- c(10.2, 10.5, 11.5, 12.6, 14.3, 12.8, 9.4, 13.2, 15.6, 13.9, 14.8, 16.2, 12.5)

So, if I ask, what is the data type ?

its obvious its numerical, but is it discrete or continuous ?

So, for discrete data: that can only takes certain values, like number of students in a class : it may be 6 or 7 or 9

but for continuous data, they can take any value over a range, for an example, the height of the student of the class:

that may be like: 6.6, 6.7,6.8,6.9 so on

Thus, the above data frame is an example of continuous value

if you want to summarise the data with two values we can take mean and the standard deviation for the above data frame:

for calculating mean:

mean(P)


12.88462

For standard deviation:

sd(P)


2.088798

So, you can now tell the average height of the (n = 13) plant samples was 12.88 with standard deviation of 2.088

or you can also use the summary function to understand the data range like:

summary(P)

which will provide the following results
Min. 1st Qu. Median Mean 3rd Qu. Max.

9.40 11.50 12.80 12.88 14.30 16.20

So, the data is ranging with a minimum of 9.4 to maximum 16.2, a median of 12.80 and mean of 12.88

You can also visualize the distribution of data with histogram:

A histogram is a representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable (quantitative variable). It differs from a bar graph, in the sense that a bar graph relates two variables, but a histogram relates only one. ” (Source: Wikipedia)

For drawing a histogram:

we can put,

hist(P, col = “grey”)

hist function will draw the histogram and col argument will fill the columns with grey color

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s