Combining Data Frames with join function from”dplyr”

In real life, the data frames you want to work on often comes as separate files and sometime you want to combine them to do your analysis, but often we face lots of trouble in merging the different data sets, lets see how we can use the “dplyr’ package to merge the data sets in your way

Lets create some data sets to try this:

# Data frame A
A <- data.frame(ID = c(“A”, “B”, “C”, “D”), Y = 1:4)
View(A)
#Data frame B
B <- data.frame(ID = c(“A”, “C”, “D”, “E”), Y = 5:8)
View(B)

Now lets join this two data frame with the join function from “dplyr” package:

#loading the library

library (dplyr)

#Lets call the new data set as “C” and we will do the full join by their “ID”

C <- full_join (A, B, by = “ID”)

# so now the two data sets are joined with their ID, the join function will give you a warning message like:

The joining function will give you a warning message like:

“Warning message:
Column ID joining factors with different levels, coercing to character vector”

#as there are some element which are not in the either of data sets and they are being filled with NA as a default function of coercing from R. the new data set will look like:

ID Y.X Y.Y
1 A 1 5
2 B 2 NA
3 C 3 6
4 D 4 7
5 E NA 8

#Now lets say you want to remove all the NA’s in the new data set “C” with “Zero”

for this;

C[is.na(C)] <- 0
View(C)

This function will replace all the NA’s with zero.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s