Tuesday, November 19, 2013

R and Solr Integration Using Solr's REST APIs


Solr is the most popular, fast and reliable open source enterprise search platform from the Apache Luene project.  Among many other features, we love its powerful full-text search, hit highlighting, faceted search, and near real-time indexing.  Solr powers the search and navigation features of many of the world's largest internet sites.  Solr, written in Java, uses the Lucene Java search library for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language including R.  

We invested significant amount of time integrating our R-based data-management platform with Solr using HTTP/JSON based REST interface.  This integration allowed us to index millions of data-sets in solr in real-time as these data-sets get processed by R.  It took us few days to stabilize and optimize this approach and we are very proud to share this approach and source code with you.  The full source code can be found and downloaded from datadolph.in's git repository

The script has R functions for:
  • querying Solr and returning matching docs
  • posting a document to solr  (taking a list and converting it to JSON before posting it)
  • deleting all indexes, deleting indexes for a certain document type and for a certain category within document type
     # query a field for the text and return docs
      querySolr <- function(queryText, queryfield="all") {
        response <- fromJSON(getURL(paste(getQueryURL(), queryfield, ":", queryText, sep="")))
        if(!response$responseHeader$status) #if 0
          return(response$response$docs)
      }

      # delete all indexes from solr server
      deleteAllIndexes <-function() {
        response <- postForm(getUpdateURL(),
                             .opts = list(postfields = '{"delete": {"query":"*:*"}}',
                                          httpheader = c('Content-Type' = 'application/json', 
                                                         Accept = 'application/json')
                                          ssl.verifypeer=FALSE
                             )
        ) #end of PostForm
        return(fromJSON(response)$responseHeader[1])
      }

      # delete all indexes for a document type from solr server 
      # in this example : type = sports
      deleteSportsIndexes <-function() {
        response <- postForm(getUpdateURL(),
                             .opts = list(postfields = '{"delete": {"query":"type:sports"}}',
                                          httpheader = c('Content-Type' = 'application/json', 
                                                         Accept = 'application/json'),
                                          ssl.verifypeer=FALSE
                             )
        ) #end of PostForm
        return(fromJSON(response)$responseHeader[1])
      }

      # delete indexes for all baskeball category in sports type from solr server 
      # in this example : type = sports and category: basketball
      deleteSportsIndexesForCat <-function(category) {
        response <- postForm(getUpdateURL(),
                             .opts = list(postfields = 
                               paste('{"delete": {"query":"type:sports AND category:', category, '"}}', sep=""),
                                          httpheader = c('Content-Type' = 'application/json', 
                                                         Accept = 'application/json'),
                                          ssl.verifypeer=FALSE
                             )
        ) #end of PostForm
        return(fromJSON(response)$responseHeader[1])
      }
      #deletePadIndexesForCat("baskeball")

      #Post a new document to Solr
      postDoc <- function(doc) { 
        solr_update_url <- getUpdateURL()
        jsonst <- toJSON(list(doc))
        response <- postForm(solr_update_url,
                             .opts = list(postfields = jsonst,
                                          httpheader = c('Content-Type' = 'application/json', 
                                                         Accept = 'application/json'),
                                          ssl.verifypeer=FALSE
                             )) #end of PostForm
        return(fromJSON(response)$responseHeader[1])
        ########## Commit - only if it doesn't work the other way ###############
        #return(fromJSON(getURL(getCommitURL())))
      }

Happy Coding!

16 comments:

  1. Excellent post, this has been extremely useful to me. I work with a lot of Russian language texts, and to make this work with utf-8 characters you will want this as the first line in querySolr()
    response <- fromJSON(getURL(paste(getQueryURL(), queryfield, ":", curlEscape(queryText), sep="")))

    Just thought it might save you or someone else a headache!
    R

    ReplyDelete
    Replies
    1. All Things R: R And Solr Integration Using Solr'S Rest Apis >>>>> Download Now

      >>>>> Download Full

      All Things R: R And Solr Integration Using Solr'S Rest Apis >>>>> Download LINK

      >>>>> Download Now

      All Things R: R And Solr Integration Using Solr'S Rest Apis >>>>> Download Full

      >>>>> Download LINK 9b

      Delete
  2. Thanks, this is a good insight, very useful! We have faced this issue too in other place.

    ReplyDelete
  3. Great post! Thank you for sharing.. Here is a great new course on youtube for beginners and Data Science aspirants. The content is great and the videos are short and crisp. New ones are getting added, so I suggest to subscribe.
    https://www.youtube.com/watch?v=BGWVASxyow8&list=PLFAYD0dt5xCzTQHDhMPZwBoaAXWeVhZzg&index=19

    ReplyDelete
  4. X Frame with Banner Services Company - Businesses, whether large, medium scale or small scale often use X Frame with banners to promote their businesses like new product announcement, sales event, opening of a new branch, new offers and more such promotion-oriented messages. We are online of the leading printing and design company in USA.

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. i think your blog is great. thank you for stopping by here. Always great to have new eyes and opinions.
    Email Support

    ReplyDelete
  8. Great Post. Good Luck
    Seems like it will be pretty effective
    https://callpcexpert.com/dell-computer-support-phone-number.php

    ReplyDelete
  9. Thanks for sharing this amazing piece of info, Letting you know we are the Guest Post Blogger, You can send your articles to us. Just have a look at some piece of work.

    Happy New Year Wishes
    108 Names of Lord Ganesha
    Places to Visit in Varanasi
    Top 10 Reasons for Breakups

    ReplyDelete
  10. Hi, I am ELLy Leone is currently working with HP Printer Official which is a top notch company in USA provides HP printer customer service for HP users. We are 24/7 available over the phone, call +1 888-309-0939.

    HP Officejet 5255 Setup
    HP Officejet 5255 Wireless Setup

    ReplyDelete
  11. Compre documentos en línea, documentos originales y registrados.
    Acerca de Permisodeespana, algunos dicen que somos los solucionadores de problemas, mientras que otros se refieren a nosotros como vendedores de soluciones. Contamos con cientos de clientes satisfechos a nivel mundial. Hacemos documentos falsos autorizados y aprobados como Permiso de Residencia Español, DNI, Pasaporte Español y Licencia de Conducir Española. Somos los fabricantes y proveedores de primer nivel de estos documentos, reconocidos a nivel mundial.

    Comprar permiso de residencia,
    permiso de residenciareal y falso en línea,
    Compre licencia de conducir en línea,
    Compre una licencia de conducir española falsa en línea,
    Comprar tarjeta de identificación,
    Licencia de conducir real y falsa,
    Compre pasaporte real en línea,

    Visit Here fpr more information. :- https://permisodeespana.com/licencia-de-conducir-espanola/
    Address: 56 Guild Street, London, EC4A 3WU (UK)
    Email: contact@permisodeespana.com
    WhatsApp: +443455280186

    ReplyDelete
  12. All Things R: R And Solr Integration Using Solr'S Rest Apis >>>>> Download Now

    >>>>> Download Full

    All Things R: R And Solr Integration Using Solr'S Rest Apis >>>>> Download LINK

    >>>>> Download Now

    All Things R: R And Solr Integration Using Solr'S Rest Apis >>>>> Download Full

    >>>>> Download LINK 6A

    ReplyDelete
  13. This comment has been removed by the author.

    ReplyDelete