|
2 | 2 |
|
3 | 3 | __NOTE: This repository is experimental and undergoing frequent changes!__ |
4 | 4 |
|
5 | | -The Yarn Kernel Provider package provides support necessary for launching Jupyter kernels within YARN clusters. This is accomplished via two classes: |
| 5 | +The Yarn Kernel Provider package provides support necessary for launching Jupyter kernels within YARN clusters. It adheres to requirements set forth in the [Jupyter Kernel Management](https://github.com/takluyver/jupyter_kernel_mgmt) refactoring for kernel management and discovery. This is accomplished via two classes: |
6 | 6 |
|
7 | 7 | 1. [`YarnKernelProvider`](https://github.com/gateway-experiments/yarn_kernel_provider/blob/master/yarn_kernel_provider/provider.py) is invoked by the application to locate and identify specific kernel specificiations (kernelspecs) that manage kernel lifecycles within a YARN cluster. |
8 | 8 | 2. [`YarnKernelLifecycleManager`](https://github.com/gateway-experiments/yarn_kernel_provider/blob/master/yarn_kernel_provider/yarn.py) is instantiated by the [`RemoteKernelManager`](https://github.com/gateway-experiments/remote_kernel_provider/blob/master/remote_kernel_provider/manager.py) to peform the kernel lifecycle management. It performs post-launch discovery of the application and handles its termination via the [YARN REST API](https://github.com/toidi/hadoop-yarn-api-python-client). |
9 | 9 |
|
| 10 | +Installation of yarn_kernel_provider also includes a Jupyter application that can be used to create appropriate kernel specifications relative to YARN Spark and Dask. |
| 11 | + |
10 | 12 | ## Installation |
11 | | -`YarnKernelProvider` is a pip-installable package: |
| 13 | +Yarn Kernel Provider is a pip-installable package: |
12 | 14 | ```bash |
13 | 15 | pip install yarn_kernel_provider |
14 | 16 | ``` |
15 | 17 |
|
16 | | -## YARN Kernel Specifications |
17 | | -Criteria for discovery of the kernel specification via the `YarnKernelProvider` is that a `kernel.json` file exist in a sub-directory of `yarn_kernels`. |
| 18 | +##Usage |
| 19 | +Because this version of Jupyter kernel management is still in its experimental stages, a [special branch of Notebook](https://github.com/takluyver/notebook/tree/jupyter-kernel-mgmt) is required, which includes the machinery to leverage the new framework. An installable build of this branch is available as an asset on the [interim-dev release](https://github.com/gateway-experiments/remote_kernel_provider/releases/tag/v0.1-interim-dev) of the Remote Kernel Provider on which Yarn Kernel Provider depends. |
| 20 | + |
| 21 | +### YARN Kernel Specifications |
| 22 | +Criteria for discovery of the kernel specification via the `YarnKernelProvider` is that a `yarnkp_kernel.json` file exist in a sub-directory named `kernels` in the Jupyter path hierarchy. |
| 23 | + |
| 24 | +Such kernel specifications should be initially created using the included Jupyter application`jupyter-yarn-kernelspec` to insure the minimally viable requirements exist. This application can be used to create specifications for YARN Spark and Dask. Spark support is available for three languages: Python, Scala and R, while Dask support is available for Python. |
| 25 | + |
| 26 | +To create kernel specifications for use by YarnKernelProvider use `juptyer yarn-kernelspec install`. Here are it's parameter options, produced using `jupyter yarn-kernelspec install --help`. All parameters are optional with no parameters yielding a Python-based kernelspec for Spark on the local YARN cluster. However, locations for SPARK_HOME and Python runtimes may likely require changes if not provided. |
| 27 | + |
| 28 | +``` |
| 29 | +A Jupyter kernel for talking to Spark/Dask within a YARN cluster |
| 30 | +
|
| 31 | +Options |
| 32 | +------- |
| 33 | +
|
| 34 | +Arguments that take values are actually convenience aliases to full |
| 35 | +Configurables, whose aliases are listed on the help line. For more information |
| 36 | +on full configurables, see '--help-all'. |
| 37 | +
|
| 38 | +--user |
| 39 | + Install to the per-user kernel registry |
| 40 | +--sys-prefix |
| 41 | + Install to Python's sys.prefix. Useful in conda/virtual environments. |
| 42 | +--dask |
| 43 | + Install kernelspec for Dask YARN. |
| 44 | +--debug |
| 45 | + set log level to logging.DEBUG (maximize logging output) |
| 46 | +--prefix=<Unicode> (YKP_SpecInstaller.prefix) |
| 47 | + Default: '' |
| 48 | + Specify a prefix to install to, e.g. an env. The kernelspec will be |
| 49 | + installed in PREFIX/share/jupyter/kernels/ |
| 50 | +--kernel_name=<Unicode> (YKP_SpecInstaller.kernel_name) |
| 51 | + Default: 'yarnkp_spark_python' |
| 52 | + Install the kernel spec into a directory with this name. |
| 53 | +--display_name=<Unicode> (YKP_SpecInstaller.display_name) |
| 54 | + Default: 'Spark Python (YARN Cluster)' |
| 55 | + The display name of the kernel - used by user-facing applications. |
| 56 | +--yarn_endpoint=<Unicode> (YKP_SpecInstaller.yarn_endpoint) |
| 57 | + Default: None |
| 58 | + The http url specifying the YARN Resource Manager. Note: If this value is |
| 59 | + NOT set, the YARN library will use the files within the local |
| 60 | + HADOOP_CONFIG_DIR to determine the active resource manager. |
| 61 | + (YKP_YARN_ENDPOINT env var) |
| 62 | +--alt_yarn_endpoint=<Unicode> (YKP_SpecInstaller.alt_yarn_endpoint) |
| 63 | + Default: None |
| 64 | + The http url specifying the alternate YARN Resource Manager. This value |
| 65 | + should be set when YARN Resource Managers are configured for high |
| 66 | + availability. Note: If both YARN endpoints are NOT set, the YARN library |
| 67 | + will use the files within the local HADOOP_CONFIG_DIR to determine the |
| 68 | + active resource manager. (YKP_ALT_YARN_ENDPOINT env var) |
| 69 | +--yarn_endpoint_security_enabled=<Bool> (YKP_SpecInstaller.yarn_endpoint_security_enabled) |
| 70 | + Default: False |
| 71 | + Is YARN Kerberos/SPNEGO Security enabled (True/False). |
| 72 | + (YKP_YARN_ENDPOINT_SECURITY_ENABLED env var) |
| 73 | +--language=<Unicode> (YKP_SpecInstaller.language) |
| 74 | + Default: 'Python' |
| 75 | + The language of the underlying kernel. Must be one of 'Python', 'R', or |
| 76 | + 'Scala'. Default = 'Python'. |
| 77 | +--python_root=<Unicode> (YKP_SpecInstaller.python_root) |
| 78 | + Default: '/opt/conda' |
| 79 | + Specify where the root of the python installation resides (parent dir of |
| 80 | + bin/python). |
| 81 | +--spark_home=<Unicode> (YKP_SpecInstaller.spark_home) |
| 82 | + Default: '/usr/hdp/current/spark2-client' |
| 83 | + Specify where the spark files can be found. |
| 84 | +--spark_init_mode=<Unicode> (YKP_SpecInstaller.spark_init_mode) |
| 85 | + Default: 'lazy' |
| 86 | + Spark context initialization mode. Must be one of 'lazy', 'eager', or |
| 87 | + 'none'. Default = 'lazy'. |
| 88 | +--extra_spark_opts=<Unicode> (YKP_SpecInstaller.extra_spark_opts) |
| 89 | + Default: '' |
| 90 | + Specify additional Spark options. |
| 91 | +--extra_dask_opts=<Unicode> (YKP_SpecInstaller.extra_dask_opts) |
| 92 | + Default: '' |
| 93 | + Specify additional Dask options. |
| 94 | +--log-level=<Enum> (Application.log_level) |
| 95 | + Default: 30 |
| 96 | + Choices: (0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL') |
| 97 | + Set the log level by value or name. |
| 98 | +--config=<Unicode> (JupyterApp.config_file) |
| 99 | + Default: '' |
| 100 | + Full path of a config file. |
| 101 | +
|
| 102 | +To see all available configurables, use `--help-all` |
| 103 | +
|
| 104 | +Examples |
| 105 | +-------- |
| 106 | +
|
| 107 | + jupyter-yarn-kernelspec install --language=R --spark_home=/usr/local/spark |
| 108 | + jupyter-yarn-kernelspec install --kernel_name=dask_python --dask --yarn_endpoint=http://foo.bar:8088/ws/v1/cluster |
| 109 | + jupyter-yarn-kernelspec install --language=Scala --spark_init_mode='eager' |
| 110 | +``` |
0 commit comments