Data files are mostly structured in a format to be optimized
for less storage rather than exploration friendly and we need to re-structure
it for ease of exploratory analysis.
Either Data will be stored Wide Spread : variables in Columns
OR Long Spread: variables in Rows
Wide Spread: Data is arranged
horizontally (variables Column – wise)
Long Spread : Data is arranged Vertically (variables Row-wise)
For Exploratory Analysis, we often need to Transpose (or
flip) data other way around. Reshape2
package provides “melt” and “cast” to transpose data effectively.
1 1)
melt :
Wide spread to Long Spread
Lets say Population and Area for a City is present in
Columns and requirement is to convert them Row-wise.
Data Creation:
Ø city = data.frame( “City” = c ( "Delhi", "London",
"New York"),
"Population.Mn" = c ( 44.30, 31.36, 40.42),
"Area.SqKm" = c ( 77.13, 90.05, 80.00)
)
Using melt to transpose data Population and Area to rows
Ø m
= melt(city)
m
Or
Melt provides further options to specify id columns and explicitly
provide name for variable column.
Ø mt =
melt (city , id.vars = c ("City")
, variable_name = "Attribute" )
mt
2) Cast : Long spread to Wide Spread
Ø c =
cast(m)
c
Or
we can explicitly define Id and value variables
we can explicitly define Id and value variables
Ø ct =
cast (mt
, City ~ Attribute
, value.var = c (
"Population.Mn", "Area.SqKm") )
ct
Cheers!!
No comments:
Post a Comment