Groovy's super-convenient HTTPBuilder makes web-scraping a breeze. Surprisingly, it doesn't seem to have a built-in DSL for adding pre-configured cookies to your client's session. That's okay, because they're pretty easy to add (once you plow through the documentation). Here's a quick example:
#!/usr/bin/groovy
import groovyx.net.http.HTTPBuilder
import org.apache.http.impl.cookie.BasicClientCookie
@Grapes([
@Grab(group='org.codehaus.groovy.modules.http-builder',
module='http-builder', version='0.5.1')
])
class MyScraper {
static http = new HTTPBuilder('http://www.example.com/')
static void main(String[] args) {
try {
run args
} catch (e) {
e.printStackTrace()
}
}
static run(args) {
// add some cookies
addCookie domain: 'www.example.com', path: '/',
name: 'mycookie', value: 'myvalue'
addCookie domain: 'www.example.com', path: '/',
name: 'anothercookie', value: 'anothervalue'
// make some requests using the cookies
def html = http.get(path: '/search', query: [q:'groovy'])
// ...
}
// adds a cookie to the http client
// with the specified name and value, and optional domain and path
static addCookie(m) {
// create the basic cookie object
def cookie = new BasicClientCookie(m.name, m.value)
// add optional cookie properties
m.findAll { k,v -> !(k in ['name', 'value']) }.
each { k,v -> cookie[k] = v }
// add the new cookie to the client's cookie-store
http.client.cookieStore.addCookie cookie
}
}
No comments:
Post a Comment