Thursday, April 19, 2012

Getting Historical Weather Data in R and SAP HANA

For many of my latest data blogs, I needed historical weather data to perform data mash-ups to pin-point the cause.  For example, for my continued exploration into the airlines/airports historical data using SAP HANA and R, I wanted to find out whether the weather was behind the extreme delay experienced out of a particular airport for a particular day/hour.  So I needed to mash-up the weather data with the airlines data for this analysis.

I looked around but could not find a better way to get the weather data.  So I turned to R.  Now, to get historical weather data, I am using Weather Underground's REST APIs and I put together a simple program in R to get the weather data in a data.frame.  This R module gets called from SAP HANA and it inserts a new table into HANA with the right weather information.  Once, I have the data in HANA, I performed mash-ups in HANA and off I go on my intellectual pursuit.

Weather Underground returns the data in both XML and JSON file formats.  The program logic is very simple, [once you have spent hours cracking it, the end product looks simple anyways :-)] and there are appropriate comments in the code below for self-learning.

I want to mention that you are not limited to just getting the historical view on weather data.  You can get the weather forecast for next 10 days, perform your analysis and predict future!

Make sure to register with Weather Underground (API documentation link), comply with their rules and get your own key to access their APIs.
############################################################################
getHistoricalWeather <- function(airport.code="SFO", date="Sys.Date()")
{
  base.url <- 'http://api.wunderground.com/api/{your key here}/'
  # compose final url
  final.url <- paste(base.url, 'history_', date, '/q/', airport.code, '.json', sep='')


  # reading in as raw lines from the web service
  conn <- url(final.url)
  raw.data <- readLines(conn, n=-1L, ok=TRUE)
 # Convert to a JSON
  weather.data <- fromJSON(paste(raw.data, collapse=""))
  close(conn)
  return(weather.data)
}



# get data for 10 days - restriction by Weather Underground for free usage
date.range <- seq.Date(from=as.Date('2006-1-01'), to=as.Date('2006-1-10'), by='1 day')


# Initialize a data frame
hdwd <- data.frame()



# loop over dates, and fetch weather data
for(i in seq_along(date.range)) {
    weather.data <- getHistoricalWeather('SFO', format(date.range[i], "%Y%m%d"))                 
      hdwd <- rbind(hdwd, ldply(weather.data$history$dailysummary, 
          function(x) c('SJC', date.range[i], x$fog, x$rain, x$snow,  x$meantempi, x$meanvism, x$maxtempi, x$mintempi)))
}
colnames(hdwd) <- c("Airport", "Date", 'Fog', 'Rain', 'Snow','AvgTemp', 'AvgVisibility','MaxTemp','MinTemp')


# save to CSV
write.csv(hdwd, file=gzfile('SFC-Jan2006.csv.gz'), row.names=FALSE)

############################################################################
Results - 

Airport Date Fog Rain Snow AvgTemp AvgVisibility MaxTemp MinTemp
SFO 13149 0 1 0 55 14 62 47
SFO 13150 0 1 0 53 11 55 50
SFO 13151 0 1 0 51 14 56 46
SFO 13152 0 0 0 56 16 62 50
SFO 13153 0 0 0 54 14 60 48
SFO 13154 0 1 0 52 14 59 45
SFO 13155 0 1 0 56 14 61 50
SFO 13156 0 0 0 51 16 57 45
SFO 13157 0 0 0 49 16 56 41
SFO 13158 0 0 0 54 10 61 46


Happy Analyzing!

10 comments:

  1. Is this a world wide data base?
    How far back in time can you get data?

    ReplyDelete
    Replies
    1. I don't recall Gustaf, please check it out here: http://www.wunderground.com/weather/api/d/documentation.html
      Thanks!

      Delete
  2. Cool thanks, now I just need to dig in to the API and find out what I can get.


    You should add that you need:

    library(RJSONIO)
    library(plyr)

    for the script to work.

    ReplyDelete
  3. R works well for REST calls! For production work, I would add 2 things to your code:
    options(timeout=600) so your code doesn't crash with slow network response times;
    wrap your call to readLines() in try() so you can trap issues such as missing date/airport code combinations (e.g., errors in their backend data service).

    The WeatherUnderground data is simplest for your particular use, as you can query it via airport codes. For other uses and larger hunks and further back in time, the NCDC historic weather products are mostly free for everyone now, and will become free by the end of Sept 2012 (see http://www.ncdc.noaa.gov/oa/mpp/digitalfiles.html). These include global data in ghcn, and hourly data as well as daily data with many more attributes. To grab these data, you need to look up the station index for your site, then build the simple urls, then grab the data with read.fwf() (in some cases looping over year-specific urls).

    A harder case is grabbing the NOAA 6-minute historic tidal height data, as it comes back as a web page embedded in (varying) html, so it requires try(Text<-readlines(Q)) and grep so distinguish data lines. Still, its wonderful that nearly all historic NOAA data is becoming freely available on the web, and R has the tools for easy assimilation.

    ReplyDelete
  4. Solid points tomp. I did look at NCDC's website but since there were no API's offered and I am not a big fan of screen scrapping, I decided to stick with W-Underground. I am pretty sure that I will run into issues with free APIs.

    I put a sleep for 60 seconds before making another batch request and I can only make 500 calls per days. What I don't like about W-Underground's API is that I can only download data for a single day in one API call.

    Thanks for your insights!

    ReplyDelete
  5. This post has a lot of importance to the people…I hope you can continue to inspire and post more of this…Thanks

    ReplyDelete
  6. Data Visualization Software
    SQIAR (www.sqiar.com/solutions/technology/tableau) is a leading Business Intelligence company and provides Tableau Software consultancy across United Kingdom and USA

    ReplyDelete
  7. Hi, sir I have seen your info on sap.. Especially on sap hana. The info you provided is very useful for any interested on sap. I am very much satisfied with your info. I wish I can get furthered info from you.I am very glad to say thanks for this info.
    thank you very much sir.
    sap hana online training

    ReplyDelete
  8. Well appreciated keep updates nice post see sap global consultants.

    ReplyDelete