Thursday, April 19, 2012

Getting Historical Weather Data in R and SAP HANA

For many of my latest data blogs, I needed historical weather data to perform data mash-ups to pin-point the cause.  For example, for my continued exploration into the airlines/airports historical data using SAP HANA and R, I wanted to find out whether the weather was behind the extreme delay experienced out of a particular airport for a particular day/hour.  So I needed to mash-up the weather data with the airlines data for this analysis.

I looked around but could not find a better way to get the weather data.  So I turned to R.  Now, to get historical weather data, I am using Weather Underground's REST APIs and I put together a simple program in R to get the weather data in a data.frame.  This R module gets called from SAP HANA and it inserts a new table into HANA with the right weather information.  Once, I have the data in HANA, I performed mash-ups in HANA and off I go on my intellectual pursuit.

Weather Underground returns the data in both XML and JSON file formats.  The program logic is very simple, [once you have spent hours cracking it, the end product looks simple anyways :-)] and there are appropriate comments in the code below for self-learning.

I want to mention that you are not limited to just getting the historical view on weather data.  You can get the weather forecast for next 10 days, perform your analysis and predict future!

Make sure to register with Weather Underground (API documentation link), comply with their rules and get your own key to access their APIs.
############################################################################
getHistoricalWeather <- function(airport.code="SFO", date="Sys.Date()")
{
  base.url <- 'http://api.wunderground.com/api/{your key here}/'
  # compose final url
  final.url <- paste(base.url, 'history_', date, '/q/', airport.code, '.json', sep='')


  # reading in as raw lines from the web service
  conn <- url(final.url)
  raw.data <- readLines(conn, n=-1L, ok=TRUE)
 # Convert to a JSON
  weather.data <- fromJSON(paste(raw.data, collapse=""))
  close(conn)
  return(weather.data)
}



# get data for 10 days - restriction by Weather Underground for free usage
date.range <- seq.Date(from=as.Date('2006-1-01'), to=as.Date('2006-1-10'), by='1 day')


# Initialize a data frame
hdwd <- data.frame()



# loop over dates, and fetch weather data
for(i in seq_along(date.range)) {
    weather.data <- getHistoricalWeather('SFO', format(date.range[i], "%Y%m%d"))                 
      hdwd <- rbind(hdwd, ldply(weather.data$history$dailysummary, 
          function(x) c('SJC', date.range[i], x$fog, x$rain, x$snow,  x$meantempi, x$meanvism, x$maxtempi, x$mintempi)))
}
colnames(hdwd) <- c("Airport", "Date", 'Fog', 'Rain', 'Snow','AvgTemp', 'AvgVisibility','MaxTemp','MinTemp')


# save to CSV
write.csv(hdwd, file=gzfile('SFC-Jan2006.csv.gz'), row.names=FALSE)

############################################################################
Results - 

Airport Date Fog Rain Snow AvgTemp AvgVisibility MaxTemp MinTemp
SFO 13149 0 1 0 55 14 62 47
SFO 13150 0 1 0 53 11 55 50
SFO 13151 0 1 0 51 14 56 46
SFO 13152 0 0 0 56 16 62 50
SFO 13153 0 0 0 54 14 60 48
SFO 13154 0 1 0 52 14 59 45
SFO 13155 0 1 0 56 14 61 50
SFO 13156 0 0 0 51 16 57 45
SFO 13157 0 0 0 49 16 56 41
SFO 13158 0 0 0 54 10 61 46


Happy Analyzing!

19 comments:

  1. Is this a world wide data base?
    How far back in time can you get data?

    ReplyDelete
    Replies
    1. I don't recall Gustaf, please check it out here: http://www.wunderground.com/weather/api/d/documentation.html
      Thanks!

      Delete
  2. Cool thanks, now I just need to dig in to the API and find out what I can get.


    You should add that you need:

    library(RJSONIO)
    library(plyr)

    for the script to work.

    ReplyDelete
  3. R works well for REST calls! For production work, I would add 2 things to your code:
    options(timeout=600) so your code doesn't crash with slow network response times;
    wrap your call to readLines() in try() so you can trap issues such as missing date/airport code combinations (e.g., errors in their backend data service).

    The WeatherUnderground data is simplest for your particular use, as you can query it via airport codes. For other uses and larger hunks and further back in time, the NCDC historic weather products are mostly free for everyone now, and will become free by the end of Sept 2012 (see http://www.ncdc.noaa.gov/oa/mpp/digitalfiles.html). These include global data in ghcn, and hourly data as well as daily data with many more attributes. To grab these data, you need to look up the station index for your site, then build the simple urls, then grab the data with read.fwf() (in some cases looping over year-specific urls).

    A harder case is grabbing the NOAA 6-minute historic tidal height data, as it comes back as a web page embedded in (varying) html, so it requires try(Text<-readlines(Q)) and grep so distinguish data lines. Still, its wonderful that nearly all historic NOAA data is becoming freely available on the web, and R has the tools for easy assimilation.

    ReplyDelete
  4. Solid points tomp. I did look at NCDC's website but since there were no API's offered and I am not a big fan of screen scrapping, I decided to stick with W-Underground. I am pretty sure that I will run into issues with free APIs.

    I put a sleep for 60 seconds before making another batch request and I can only make 500 calls per days. What I don't like about W-Underground's API is that I can only download data for a single day in one API call.

    Thanks for your insights!

    ReplyDelete
  5. This post has a lot of importance to the people…I hope you can continue to inspire and post more of this…Thanks

    ReplyDelete
  6. Data Visualization Software
    SQIAR (www.sqiar.com/solutions/technology/tableau) is a leading Business Intelligence company and provides Tableau Software consultancy across United Kingdom and USA

    ReplyDelete
  7. Hi, sir I have seen your info on sap.. Especially on sap hana. The info you provided is very useful for any interested on sap. I am very much satisfied with your info. I wish I can get furthered info from you.I am very glad to say thanks for this info.
    thank you very much sir.
    sap hana online training

    ReplyDelete
  8. Well appreciated keep updates nice post see sap global consultants.

    ReplyDelete
  9. Excellent piece of information on sap hana admin, I feel it is right place to get info on SAP HANA. This was really something, I had been looking for days. Too excited to view your other blogs on HANA. we provide SAP HANA ONLINE TRAINING.

    ReplyDelete
  10. If you are using SAP HANA in conjunction with other SAP business applications, it is possible to integrate with SAP Solution Manager.

    The SAP HANA ADMIN Configuration Check Tool allows you to check the interoperability of SAP HANA with your existing enterprise storage in production environments.

    SAP HANA ADMINISTRATION ONLINE TRAINING

    ReplyDelete
  11. Your blog was excellent. Posted best topics for the readers help them to get to know things. Also helps the readers to choose the better career. SAP HANA ADMINISTRATION ONLINE TRAINING

    ReplyDelete
  12. This SAP HANA ADMINISTRATION ONLINE TRAINING course not only gives an overview but provides you with a thorough in-depth knowledge about how the SAP HANA works. we also provide online training on SAP HANA ADMINISTRATION ONLINE TRAINING.

    SAP HANA ADMINISTRATION ONLINE TRAINING

    ReplyDelete
  13. Best Post about SAP Courses and the Training. Helps the readers to get better knowledge on Current SAP Technologies.
    SAP HANA ADMINISTRATION ONLINE TRAINING

    ReplyDelete
  14. Thanks so very much for taking your time to create this very useful and informative site. I have learned a lot from your site. Thanks!!

    Hadoop Course in Chennai

    ReplyDelete
  15. there is another way of doing this here

    http://casoilresource.lawr.ucdavis.edu/drupal/node/991

    ReplyDelete
  16. . Your blog was excellent. Posted best topics for the readers help them to get to know things. Also helps the readers to choose the better career reading this. sap bods online training

    ReplyDelete