Skip to content

Getting started with spark tk

Daniel Smith edited this page Jan 20, 2017 · 1 revision

##Getting Started with Spark-tk

Spark-tk is supported in TAP versions 0.7.3 and 0.7.4.

Spark-tk is an analytics toolkit library that is compatible with Apache Spark. It provides APIs for Python and Scala. This page explains how to use Spark in TAP.

Visit https://github.com/trustedanalytics/spark-tk for additional information.

Accessing Spark-tk with Jupyter

The easiest way to get started with Spark-tk on TAP is within a Jupyter notebook, as follows:

  1. First, create a Jupyter notebook.

  2. Open your Jupyter instance and navigate to /examples/tklibs/sparktk/README.ipynb

    Accessing Readme files

  3. The README notebook demonstrates how to create a TkContext for Spark-tk and contains some simple Spark-tk code.

    Readme files in Jupyter Sample

The other example notebooks show how to use Datacatalog, Frame, Latent Dirichlet Allocation, and Logistic Regression with Spark-tk.

Readme files in Jupyter Sample

More information about Spark is available on the Apache Spark website

###Accessing a terminal from Jupyter

  1. From the Jupyter dashboard, select the New button located in the upper right.

    Accessing a Terminal from Jupyter

  2. Select Terminal from the sub menu to open a new terminal within Jupyter.

    Jupyter Terminal

You can enter CLI commands in the terminal window.

##Troublshooting Tips

Q: I am using Spark-tk and want to save files/export models to my local file system instead of HDFS. How do I do that?

The SparkContext created by TkContext follows the system's current Spark configuration. If your system defaults to HDFS, but you want to use a local file system instead, include use_local_fs=True when creating your TkContext, as follows:

  tc = sparktk.TkContext(use_local_fs=True)  

If switching from ATK to Spark-tk, some additional information is available at: https://github.com/trustedanalytics/platform-wiki-0.7/wiki/Switching-from-Analytics-Toolkit-to-spark-tk-Library.

Clone this wiki locally