Friday, September 07, 2018

Google Dataset Search


Google's Vision statement is “to provide access to the world's information in one click.

Google’s mission Statement is “to organize the world’s information and make it universally accessible and useful.”

Carrying on Mission and Vision, Google has released a search engine just for (us J ) for Data Analytics / Scientist community. The search engine crawls through different Data Set hosting sites and provides search across them.






Result : 




Very helpful indeed and makes seamless to search for datasets, Thank You Google!!

Happy Searching..


Thursday, September 06, 2018

melt : Wide to Long spread || cast : Long to Wide spread (R)


Data files are mostly structured in a format to be optimized for less storage rather than exploration friendly and we need to re-structure it for ease of exploratory analysis.
Either Data will be stored Wide Spread : variables in Columns  OR Long Spread: variables in Rows

Wide Spread:  Data is arranged horizontally  (variables Column – wise)

            

Long Spread : Data is arranged Vertically (variables Row-wise)





For Exploratory Analysis, we often need to Transpose (or flip) data other way around. Reshape2 package provides “melt” and “cast” to transpose data effectively.

1     1)       melt : Wide spread to Long Spread

Lets say Population and Area for a City is present in Columns and requirement is to convert them Row-wise.

Data Creation:

Ø  city = data.frame( “City”                        = c ( "Delhi", "London", "New York"),
                                "Population.Mn"  = c ( 44.30, 31.36, 40.42),
                                   "Area.SqKm"       = c ( 77.13, 90.05, 80.00)
                               )



Using melt to transpose data Population and Area to rows


Ø  m =  melt(city)
 m
                


Or
Melt provides further options to specify id columns and explicitly provide name for variable column.

Ø  mt = melt (city , id.vars = c ("City")
                                           , variable_name = "Attribute"   )
    mt




 2) Cast : Long spread to Wide Spread


Ø  c = cast(m)
c



Or
we can explicitly define Id and value variables

Ø  ct = cast (mt
                                      , City ~  Attribute
                                    , value.var = c ( "Population.Mn", "Area.SqKm") )

                  ct






Cheers!!

Google Dataset Search

Google's Vision statement  is “ to provide access to the world's information in one click. ” Google’s mission Statement is “ ...