This is cobbled together from a variety of easily google'd posts on how to install IPython, how to install Spark, and how to install IPython on spark. In particular, the main parts of this post come from http://nbviewer.ipython.org/gist/JoshRosen/6856670 , and I've just added a few screenshots to make it easy for people wanting to decouple the directions from EC2, who aren't yet familiar with IPython.
 |
First update your ipython startup file. |
 |
Then create the 00-pyspark.py file. This sets up the base spark variables and imports the libraries. I think the imports didnt work for me, so I went ahead and redo them in my notebook (below). |
 |
Now create a spark context and your on your way. |
|
|
|
 |
And your off ! Note that you can do long running or simple calculations. IPython will will stream long running outputs to stdout the same way you would expect it to if you were running it from the spark terminal. |
And don't forget to put in matplotlib:
FYI, the matplot lib stuff allows you to embed spark generated graphs into your notebook.
ReplyDelete