You work at a company that does data integration for (yeah you guessed it) one of those cryptic JSON producing info. farms. And you want to try to get a HUMAN READABLE Json off the web in an automated fashion.... just to scan through the data , you know, to feel things out; predict how much Xenadrine you're gonna have to purchase this week.... etc.....
Anyways, so being the diligent engineer that you are, you decide to do it in clojure, you know, as an excuse to learn clojure.
So.... here's my little how your clojure-compare tool would work .
First, you need to set up a project,
> lein new clojure-compare
Now, you need to make http requests. Edit your project.clj file to look like this.
(defproject clojure-compare "1.0.0-SNAPSHOT"
:description "FIXME: write description"
:dependencies [[org.clojure/clojure "1.2.1"]
[commons-lang "2.3"]
[ring/ring-jetty-adapter "0.3.9"]
[org.clojure/clojure-contrib "1.2.0"]
[clj-http "0.1.3"]])
Note the line on the bottom (clj-http is a nice little clojure libary for getting data via http).
Of course, we have to run >lein deps from the top level to grab these jars, then we're home free on the libraries and we can start building the app.
So, now you go into the core.clj file, created by lein (in the clojure-compare folder) and start editing. Here goes :
A Sort-of Begginers Tutorial on Making a Simple Leiningan Project That REST requests JSONs using the clj-http & clojure.contrib.json libraries
1) Open a terminal with vi . Call this terminal T1.
2) Open a 2nd terminal , in the clojure-compare top level directory, and run "lein repl". Call this one T2.
3) type (use 'clojure-compare.core :reload) into the repl terminal. now your source is loaded.
4) Change something in the source from T1, and save (without exiting).
5) Rerun the command from (3). Welcome to the wonderful world of repl.
6) Add the line (require '[clj-http.client :as client]) below the (ns ...) declaration.
7) Save.
8) run the command from (3) in T2 again. Welcome back to the wonderful world of REPL. IT should FAIL !
9) Open a third terminal , go to clojure-compare's top level directory, and type "lein deps". This will go get the http-client.
10) Go back to T2, and redo step 8.
11) Now lets get some data ....
;;Add this to the file in T1, and save.
;;simple wrapper function. Goes to a website and gets the contents.
(defn getUrl [url]
(client/get url {:accept :json})
)
11) Now, in T2, you guessed it... run (8), and then type >(getUrl "http://www.yahoo.com")
12) Wait a while . Yahoo is big. Watch the terminal dump a bunch of garbage out....
13) Yay you did it. Now, lets do something with jsons.... replace the yahoo url with the twitter json sample : "http://www.twitter.com/help/test.json" as the arg.
>(getUrl "http://www.twitter.com/help/test.json")
{:status 200, :headers {"last-modified" "Sat, 27 Aug 2011 01:48:28 GMT", "x-xss-protection" "1; mode=block", "server" "hi", "x-runtime" "0.00393", "x-transaction" "1314409708-72602-22772", "x-frame-options" "SAMEORIGIN", "content-type" "application/json; charset=utf-8", "date" "Sat, 27 Aug 2011 01:48:28 GMT", "set-cookie" "_twitter_sess=BAh7CDoPY3JlYXRlZF9hdGwrCMN77AgyAToHaWQiJWU2YTI5YzNlNGQ2YTU5%250AOTMwNmNhNGE0ZjE5NmMxYjE4IgpmbGFzaElDOidBY3Rpb25Db250cm9sbGVy%250AOjpGbGFzaDo6Rmxhc2hIYXNoewAGOgpAdXNlZHsA--f45753008a9ef9565bcea3f434e28462f5451467; domain=.twitter.com; path=/; HttpOnly", "x-transaction-mask" "a6183ffa5f8ca943ff1b53b5644ef11469a9536c", "x-revision" "DEV", "cache-control" "no-cache, no-store, must-revalidate, pre-check=0, post-check=0", "vary" "Accept-Encoding", "status" "200 OK", "x-mid" "e66279436ca213b477524b1cc23618979ada0bd1", "x-content-type-options" "nosniff", "expires" "Tue, 31 Mar 1981 05:00:00 GMT", "etag" "\"72054d9a6fbdcc7df012e19f32345b65\"", "content-length" "4", "pragma" "no-cache", "connection" "close"}, :body "\"ok\""}
15) Now http://clojuredocs.org/clojure_contrib/clojure.contrib.json/read-json has some instructions on how to use the Clojure json apis. Try to run a simple (read-json ...) and then proceed. There's no need for an example here... its easy enough.
16) So... Now that we know we can Lets get a non-trivial json, like the peerindex facebook profile json :
"https://graph.facebook.com/peerindex"
16) You'll notice that its pretty big . Now, lets bring it in using our get method, and look a little closer at what happens.
(require '[clj-http.client :as client])
(use 'clojure.contrib.json)
(client/get "https://graph.facebook.com/peerindex")
Now, if you look closely at the results, you'll notice something -->(.. its a map ..)
17) So what ? Well, remember that a WHOLE http page isn't json --- just the response. The other stuff (i.e. the headers, etc) aren't really that important , we just want the data. So how do we "get" the json out of our facebook request ?
----- ONE OF THE POWER-FEATURES OF CLOJURE : keys ARE functions -------------
18) Take a look at the text returned by our get method. It has several keys ":status" ":headers", and of course ":body" (the one we want) ....
Now, run the following ---> (read-json (:body (client/get "https://graph.facebook.com/peerindex"))
And you will see something magical happen --- Your json is now human readable ... How did this happen ?
Well, the get returns a map, and then the "body" portion is extracted using the "key as function" clojure idiom. Finally, when you "read-json", the json text is converted into a clojure map, which you can see in the final output .
Conclusions
So - you can get a url, extract its body, and parse that body into a map, in just 3 intuitive nested functions in Clojure.
(BEFORE)
user=> (client/get "https://graph.facebook.com/peerindex")
{:status 200, :headers {"x-fb-rev" "430801", "p3p" "CP=\"Facebook does not have a P3P policy. Learn why here: http://fb.me/p3p\"", "content-type" "text/javascript; charset=UTF-8", "date" "Sat, 27 Aug 2011 02:44:25 GMT", "set-cookie" "datr=CVpYTmpWyfKwq0WDBvwFQLd5; expires=Mon, 26-Aug-2013 02:44:25 GMT; path=/; domain=.facebook.com; httponly", "cache-control" "private, no-cache, no-store, must-revalidate", "expires" "Sat, 01 Jan 2000 00:00:00 GMT", "etag" "\"3e0f944feedbd6ab6996568501d26565229e8ad8\"", "content-length" "1570", "pragma" "no-cache", "connection" "close", "x-fb-server" "10.62.5.40"}, :body "{\"id\":\"113752328670957\",\"name\":\"PeerIndex\",\"picture\":\"http:\\/\\/profile.ak.fbcdn.net\\/hprofile-ak-snc4\\/41591_113752328670957_6056_s.jpg\",\"link\":\"http:\\/\\/www.facebook.com:443\\/PeerIndex\",\"likes\":1128,\"category\":\"Product\\/service\",\"website\":\"http:\\/\\/www.peerindex.net\\/\\nhttp:\\/\\/blog.peerindex.net\\/\\nhttp:\\/\\/twitter.com\\/peerindex\\nhttp:\\/\\/www.facebook.com\\/PeerIndex\",\"username\":\"PeerIndex\",\"founded\":\"2009\",\"company_overview\":\"PeerIndex is a web technology company that is algorithmically mapping out the social web. PeerIndex wants to become the standard that identifies, ranks, and scores authorities on the web, and help them benefit from the social capital people have built up.\",\"mission\":\"We help you understand your social capital on the web. \",\"products\":\"PeerIndex\\nViewsflow\",\"parking\":{\"street\":0,\"lot\":0,\"valet\":0},\"hours\":{\"mon_1_open\":0,\"mon_1_close\":0,\"tue_1_open\":0,\"tue_1_close\":0,\"wed_1_open\":0,\"wed_1_close\":0,\"thu_1_open\":0,\"thu_1_close\":0,\"fri_1_open\":0,\"fri_1_close\":0,\"sat_1_open\":0,\"sat_1_close\":0,\"sun_1_open\":0,\"sun_1_close\":0,\"mon_2_open\":0,\"mon_2_close\":0,\"tue_2_open\":0,\"tue_2_close\":0,\"wed_2_open\":0,\"wed_2_close\":0,\"thu_2_open\":0,\"thu_2_close\":0,\"fri_2_open\":0,\"fri_2_close\":0,\"sat_2_open\":0,\"sat_2_close\":0,\"sun_2_open\":0,\"sun_2_close\":0},\"payment_options\":{\"cash_only\":0,\"visa\":0,\"amex\":0,\"mastercard\":0,\"discover\":0},\"restaurant_services\":{\"reserve\":0,\"walkins\":0,\"groups\":0,\"kids\":0,\"takeout\":0,\"delivery\":0,\"catering\":0,\"waiter\":0,\"outdoor\":0},\"restaurant_specialties\":{\"breakfast\":0,\"lunch\":0,\"dinner\":0,\"coffee\":0,\"drinks\":0}}"}
(AFTER)
user=> (read-json (:body (client/get "https://graph.facebook.com/peerindex")))
{:company_overview "PeerIndex is a web technology company that is algorithmically mapping out the social web. PeerIndex wants to become the standard that identifies, ranks, and scores authorities on the web, and help them benefit from the social capital people have built up.", :link "http://www.facebook.com:443/PeerIndex", :mission "We help you understand your social capital on the web. ", :name "PeerIndex", :hours {:thu_1_close 0, :mon_1_open 0, :thu_1_open 0, :wed_2_open 0, :wed_2_close 0, :tue_1_close 0, :sat_1_open 0, :sat_1_close 0, :fri_1_open 0, :mon_2_close 0, :sun_2_open 0, :mon_1_close 0, :fri_2_close 0, :thu_2_open 0, :sat_2_open 0, :tue_1_open 0, :wed_1_close 0, :sun_1_open 0, :thu_2_close 0, :sun_2_close 0, :tue_2_open 0, :sat_2_close 0, :wed_1_open 0, :sun_1_close 0, :mon_2_open 0, :fri_1_close 0, :tue_2_close 0, :fri_2_open 0}, :likes 1128, :products "PeerIndex\nViewsflow", :payment_options {:cash_only 0, :visa 0, :amex 0, :mastercard 0, :discover 0}, :username "PeerIndex", :parking {:street 0, :lot 0, :valet 0}, :founded "2009", :restaurant_services {:catering 0, :waiter 0, :reserve 0, :takeout 0, :kids 0, :groups 0, :walkins 0, :delivery 0, :outdoor 0}, :id "113752328670957", :website "http://www.peerindex.net/\nhttp://blog.peerindex.net/\nhttp://twitter.com/peerindex\nhttp://www.facebook.com/PeerIndex", :restaurant_specialties {:breakfast 0, :lunch 0, :dinner 0, :coffee 0, :drinks 0}, :picture "http://profile.ak.fbcdn.net/hprofile-ak-snc4/41591_113752328670957_6056_s.jpg", :category "Product/service"}
--------- Now what -------------
1) To start, you can get the keys :
The results :
user=>
(keys (read-json (:body (client/get "https://graph.facebook.com/peerindex"))) )
(:company_overview :link :mission :name :hours :likes :products :payment_options :username :parking :founded :restaurant_services :id :website :restaurant_specialties :picture :category)
2) Okay , but thats not enough to get a feel for whats hidden in the mystery jsons.
Once you see the keys, you might assume that several are just fluff, and that the "main" content was hidden in one or two of the entries of the map. For example, the "company overview" or the "restaruant services" ... In order to ascertain this, you can use the "fmap" function, which applies a function to each value in a map ---
The results :
user=>
(fmap #(count(str %)) ( read-json (:body (client/get "https://graph.facebook.com/peerindex")) ) )
{:company_overview 255, :link 37, :mission 55, :name 9, :hours 434, :likes 4, :products 19, :payment_options 60, :username 9, :parking 29, :founded 4, :restaurant_services 105, :id 15, :website 115, :restaurant_specialties 57, :picture 77, :category 15}
No comments:
Post a Comment