Skip to content

0.7.3 Release Notes

Daniel Smith edited this page Oct 24, 2016 · 4 revisions

#General Remarks TAP 0.7.3 is a minor release to version 0.7.0/0.7.1/0.7.2.

TAP 0.7.3 adds the spark-tk Library to TAP. Previously TAP analytics were performed using the Analytics Toolkit program, accessed as a REST service. The spark-tk Library adds a new TAP analytics toolkit in the form of libraries to the TAP family.

It is not mandatory to upgrade to 0.7.3 from 0.7.0/0.7.1/0.7.2.

All GitHub repositories included in this TAP release were tagged with a 0.7.3 tag.

#New Features ##spark-tk spark-tk is an analytics library format that makes the capabilities in the Analytics Toolkit available to Spark users. It also lets you ensure your data is clean and able to run Machine Learning algorithms. In addition to providing enhanced Spark integration and performance over the Analytics Toolkit, this upgrade includes a Box Cox transformation, additional time series features, and introduces beta support for DICOM* images. See the list of supported algorithms and operations by group here.

###Switching from Analytics Toolkit to spark-tk The Analytics Toolkit is being deprecated this release; it will not be available in later releases with TAP. You should plan your switch from the Analytics Toolkit to spark-tk now. (The Analytics Toolkit will still be supported by TAP 0.7.x releases.) For information on switching from the Analytics Toolkit to spark-tk, go here.

##spark-tk submit Jupyter notebook on TAP provides REST endpoints for uploading and submitting spark-tk scripts or applications to be scheduled to run in “yarn-client” or “local” mode using “spark-submit”. (omit logs) Scripts and applications can be run on YARN or locally. You can find some examples here. The REST API calls are documented here.

##spark-tk scoring engine A new scoring engine is included for use with applications using the spark-tk library. This scoring engine provides the same capabilities as the scoring engine used with the Analytics Toolkit. The scoring engine runs on models stored in the MAR (Model ARchive) format, a new format for exchanging model information. The MAR format allows data scientists to export model information, including dependencies, from modeling tools to the Scoring Engine.

MAR format information is located here.

##Including models in TAP You can include models generated by Spark in the TAP Data Catalog via a sequence of calls in the Jupyter notebook. These calls first export a created model to HDFS in MAR format, then add a catalog entry in the TAP Data Catalog that points to the model on HDFS. You can see these calls in this notebook. In additional, an application called hdfsclient supports basic file operations (ls, rm, mkdir, and mv). The hdfsclient is located here.

#Upgrade Information You can use this release to install a fresh instance of TAP 0.7.3 or to upgrade an existing instance of TAP 0.7.0/0.7.1/0.7.2.

Note: This release cannot be used with TAP versions earlier than 0.7.0.

##Fresh Installation Information To install a new TAP 0.7.3 instance, follow these instructions:

##Upgrade Instructions To apply the TAP 0.7.3 upgrade follow the instructions here.

#Applications Changed

Application name TAP 0.7.2 TAP 0.7.3
atk 20160810 0.7.3
scoring-engine 20160810 0.7.3

#Fixed Problems and Issues

Issue Number Description
TRACS-160 Nginx server is misconfigured during TAP instance deployment.

#Known Defects

None.

Clone this wiki locally