Skip to content

Commit e1a5783

Browse files
authored
Torch install back to build (#365)
Move torch install back to smart build, revert CI build to use `[ml]`, update docs with proper build steps [ committed by @MattToast ] [ reviewed by @al-rigazzi @ashao @mellis13 ]
1 parent fa59b18 commit e1a5783

File tree

9 files changed

+189
-202
lines changed

9 files changed

+189
-202
lines changed

.github/workflows/run_tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ jobs:
108108
- name: Install SmartSim (with ML backends)
109109
run: |
110110
python -m pip install git+https://github.com/CrayLabs/SmartRedis.git@develop#egg=smartredis
111-
python -m pip install .[dev,ml-cpu]
111+
python -m pip install .[dev,ml]
112112
113113
- name: Install ML Runtimes with Smart (with pt, tf, and onnx support)
114114
if: (matrix.py_v != '3.10')

doc/changelog.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ SmartSim
1515
0.5.1
1616
-----
1717

18-
Released on 13 September, 2023
18+
Released on 14 September, 2023
1919

2020
Description
2121

@@ -45,7 +45,6 @@ Detailed Notes
4545
- Create public properties where appropriate to mitigate `protected-access` errors. (PR341_)
4646
- Fix a failure to execute `_prep_colocated_db` due to incorrect named attr check. (PR339_)
4747
- Enabled and mitigated mypy `disallow_any_generics` and `warn_return_any`. (PR338_)
48-
- Move installation of all optional SmartSim Python ML dependencies to `pip install` time. (PR336_)
4948
- Add a `smart validate` target to provide a simple smoke test to assess a SmartSim build. (PR336_, PR351_)
5049
- Add typehints to `smartsim._core.launcher.step.*`. (PR334_)
5150
- Log errors reported from slurm WLM when attempts to retrieve status fail. (PR331_, PR332_)

doc/installation_instructions/basic.rst

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -160,15 +160,15 @@ and install SmartSim from PyPI with the following command:
160160
161161
If you would like SmartSim to also install python machine learning libraries
162162
that can be used outside SmartSim to build SmartSim-compatible models, you
163-
can request their installation through the ``ml-*`` optional dependencies,
163+
can request their installation through the ``[ml]`` optional dependencies,
164164
as follows:
165165

166166
.. code-block:: bash
167167
168-
# For CPU based models
169-
pip install smartsim[ml-cpu]
170-
# For CPU and CUDA based models
171-
pip install smartsim[ml-cuda]
168+
# For bash
169+
pip install smartsim[ml]
170+
# For zsh
171+
pip install smartsim\[ml\]
172172
173173
At this point, SmartSim is installed and can be used for more basic features.
174174
If you want to use the machine learning features of SmartSim, you will need
@@ -287,9 +287,8 @@ source remains at the site of the clone instead of in site-packages.
287287
.. code-block:: bash
288288
289289
cd smartsim
290-
pip install -e .[dev,ml-cpu] # for CPU only
291-
# OR
292-
pip install -e .[dev,ml-cuda] # for CUDA support
290+
pip install -e .[dev,ml] # for bash users
291+
pip install -e .\[dev,ml\] # for zsh users
293292
294293
Use the now installed ``smart`` cli to install the machine learning runtimes.
295294

doc/installation_instructions/site-install.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,5 @@ from source with the following steps replacing ``COMPILER_VERSION`` and
1111
1212
module use -a /lus/scratch/smartsim/local/modulefiles
1313
module load cudatoolkit/11.8 cudnn smartsim-deps/COMPILER_VERSION/SMARTSIM_VERSION
14-
pip install smartsim[ml-cuda]
14+
pip install smartsim[ml]
15+
smart build --only_python_packages --device gpu [--onnx]

smartsim/_core/_cli/build.py

Lines changed: 70 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -32,15 +32,15 @@
3232

3333
from tabulate import tabulate
3434

35-
from smartsim._core._cli.utils import color_bool, SMART_LOGGER_FORMAT
35+
from smartsim._core._cli.utils import SMART_LOGGER_FORMAT, color_bool, pip
3636
from smartsim._core._install import builder
3737
from smartsim._core._install.buildenv import (
3838
BuildEnv,
39+
DbEngine,
3940
SetupError,
4041
Version_,
41-
Versioner,
42-
DbEngine,
4342
VersionConflictError,
43+
Versioner,
4444
)
4545
from smartsim._core._install.builder import BuildError
4646
from smartsim._core.config import CONFIG
@@ -244,16 +244,34 @@ def check_py_torch_version(versions: Versioner, device: _TDeviceStr = "cpu") ->
244244
else:
245245
raise BuildError("Unrecognized device requested")
246246

247-
_check_packages_in_python_env(
248-
{
249-
"torch": Version_(f"{versions.TORCH}{device_suffix}"),
250-
"torchvision": Version_(f"{versions.TORCHVISION}{device_suffix}"),
251-
},
247+
torch_deps = {
248+
"torch": Version_(f"{versions.TORCH}{device_suffix}"),
249+
"torchvision": Version_(f"{versions.TORCHVISION}{device_suffix}"),
250+
}
251+
missing, conflicts = _assess_python_env(
252+
torch_deps,
253+
package_pinning="==",
252254
validate_installed_version=_create_torch_version_validator(
253255
with_suffix=device_suffix
254256
),
255257
)
256258

259+
if len(missing) == len(torch_deps) and not conflicts:
260+
# All PyTorch deps are not installed and there are no conflicting
261+
# python packages. We can try to install torch deps into the current env.
262+
logger.info(
263+
"Torch version not found in python environment. "
264+
"Attempting to install via `pip`"
265+
)
266+
pip(
267+
"install",
268+
"-f",
269+
"https://download.pytorch.org/whl/torch_stable.html",
270+
*(f"{package}=={version}" for package, version in torch_deps.items()),
271+
)
272+
elif missing or conflicts:
273+
logger.warning(_format_incompatible_python_env_message(missing, conflicts))
274+
257275

258276
def _create_torch_version_validator(
259277
with_suffix: str,
@@ -297,20 +315,7 @@ def _check_packages_in_python_env(
297315
)
298316

299317
if missing or conflicts:
300-
indent = "\n\t"
301-
fmt_list: t.Callable[[str, t.List[str]], str] = (
302-
lambda n, l: f"{n}:{indent}{indent.join(l)}" if l else ""
303-
)
304-
missing_str = fmt_list("Missing", missing)
305-
conflict_str = fmt_list("Conflicting", conflicts)
306-
sep = "\n" if missing_str and conflict_str else ""
307-
logger.warning(
308-
"Python Env Status Warning!\n"
309-
"Requested Packages are Missing or Conflicting:\n\n"
310-
f"{missing_str}{sep}{conflict_str}"
311-
"\n\nConsider installing packages at the requested versions via "
312-
"`pip` or installing SmartSim with optional ML dependencies"
313-
)
318+
logger.warning(_format_incompatible_python_env_message(missing, conflicts))
314319

315320

316321
def _assess_python_env(
@@ -334,6 +339,26 @@ def _assess_python_env(
334339
return missing, conflicts
335340

336341

342+
def _format_incompatible_python_env_message(
343+
missing: t.Iterable[str], conflicting: t.Iterable[str]
344+
) -> str:
345+
indent = "\n\t"
346+
fmt_list: t.Callable[[str, t.Iterable[str]], str] = (
347+
lambda n, l: f"{n}:{indent}{indent.join(l)}" if l else ""
348+
)
349+
missing_str = fmt_list("Missing", missing)
350+
conflict_str = fmt_list("Conflicting", conflicting)
351+
sep = "\n" if missing_str and conflict_str else ""
352+
return (
353+
"Python Env Status Warning!\n"
354+
"Requested Packages are Missing or Conflicting:\n\n"
355+
f"{missing_str}{sep}{conflict_str}\n\n"
356+
"Consider installing packages at the requested versions via `pip` or "
357+
"uninstalling them, installing SmartSim with optional ML dependencies "
358+
"(`pip install smartsim[ml]`), and running `smart clean && smart build ...`"
359+
)
360+
361+
337362
def execute(args: argparse.Namespace) -> int:
338363
verbose = args.v
339364
keydb = args.keydb
@@ -376,21 +401,22 @@ def execute(args: argparse.Namespace) -> int:
376401
print(tabulate(vers, headers=version_names, tablefmt="github"), "\n")
377402

378403
try:
379-
# REDIS/KeyDB
380-
build_database(build_env, versions, keydb, verbose)
381-
382-
# REDISAI
383-
build_redis_ai(
384-
build_env,
385-
versions,
386-
device,
387-
pt,
388-
tf,
389-
onnx,
390-
args.torch_dir,
391-
args.libtensorflow_dir,
392-
verbose=verbose,
393-
)
404+
if not args.only_python_packages:
405+
# REDIS/KeyDB
406+
build_database(build_env, versions, keydb, verbose)
407+
408+
# REDISAI
409+
build_redis_ai(
410+
build_env,
411+
versions,
412+
device,
413+
pt,
414+
tf,
415+
onnx,
416+
args.torch_dir,
417+
args.libtensorflow_dir,
418+
verbose=verbose,
419+
)
394420
except (SetupError, BuildError) as e:
395421
logger.error(str(e))
396422
return 1
@@ -406,7 +432,7 @@ def execute(args: argparse.Namespace) -> int:
406432
check_py_tf_version(versions)
407433
if "onnxruntime" in backends:
408434
check_py_onnx_version(versions)
409-
except SetupError as e:
435+
except (SetupError, BuildError) as e:
410436
logger.error(str(e))
411437
return 1
412438

@@ -430,6 +456,12 @@ def configure_parser(parser: argparse.ArgumentParser) -> None:
430456
choices=["cpu", "gpu"],
431457
help="Device to build ML runtimes for",
432458
)
459+
parser.add_argument(
460+
"--only_python_packages",
461+
action="store_true",
462+
default=False,
463+
help="Only evaluate the python packages (i.e. skip building backends)",
464+
)
433465
parser.add_argument(
434466
"--no_pt",
435467
action="store_true",

smartsim/_core/_cli/utils.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,11 +26,14 @@
2626

2727
import importlib
2828
import shutil
29+
import subprocess as sp
30+
import sys
2931
import typing as t
3032
from argparse import ArgumentParser, Namespace
3133
from pathlib import Path
3234

3335
from smartsim._core._install.buildenv import SetupError
36+
from smartsim._core._install.builder import BuildError
3437
from smartsim._core.utils import colorize
3538
from smartsim.log import get_logger
3639

@@ -60,6 +63,16 @@ def color_bool(trigger: bool = True) -> str:
6063
return colorize(str(trigger), color=_color)
6164

6265

66+
def pip(*args: str) -> None:
67+
cmd = (sys.executable, "-m", "pip") + args
68+
with sp.Popen(cmd, stdout=sp.PIPE, stderr=sp.PIPE) as proc:
69+
_, err = proc.communicate()
70+
if int(proc.returncode) != 0:
71+
raise BuildError(
72+
f"`pip` returned with a non-zero exit code:\n{err.decode('utf-8')}"
73+
)
74+
75+
6376
def clean(core_path: Path, _all: bool = False) -> int:
6477
"""Remove pre existing installations of ML runtimes
6578

smartsim/_core/_cli/validate.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,7 @@ def _test_tf_install(client: Client, tmp_dir: str, device: _TCapitalDeviceStr) -
194194
model_path, inputs, outputs = recv_conn.recv()
195195
except EOFError as e:
196196
raise Exception(
197-
"Failed to recieve serialized model from subprocess. "
197+
"Failed to receive serialized model from subprocess. "
198198
"Is the `tensorflow` python package installed?"
199199
) from e
200200

smartsim/_core/_install/buildenv.py

Lines changed: 2 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,6 @@
2626

2727
# pylint: disable=invalid-name
2828

29-
import itertools
3029
import importlib.metadata
3130
import os
3231
import platform
@@ -262,30 +261,6 @@ def get_defaults(self) -> t.Dict[str, str]:
262261
return self.defaults[self.version].copy()
263262

264263

265-
def _format_linux_torch_py_package_req(
266-
arch: str, python_version: str, torch_version: str
267-
) -> str:
268-
pyv_no_dot = python_version.replace(".", "")
269-
return (
270-
"torch"
271-
# pylint: disable-next=line-too-long
272-
f" @ https://download.pytorch.org/whl/{arch}/torch-{torch_version}%2B{arch}-cp{pyv_no_dot}-cp{pyv_no_dot}-linux_x86_64.whl"
273-
f' ; python_version == "{python_version}" and sys_platform != "darwin"'
274-
)
275-
276-
277-
def _format_linux_torchvision_py_package_req(
278-
arch: str, python_version: str, torchvision_version: str
279-
) -> str:
280-
pyv_no_dot = python_version.replace(".", "")
281-
return (
282-
"torchvision"
283-
# pylint: disable-next=line-too-long
284-
f" @ https://download.pytorch.org/whl/{arch}/torchvision-{torchvision_version}%2B{arch}-cp{pyv_no_dot}-cp{pyv_no_dot}-linux_x86_64.whl"
285-
f' ; python_version == "{python_version}" and sys_platform != "darwin"'
286-
)
287-
288-
289264
class Versioner:
290265
"""Versioner is responsible for managing all the versions
291266
within SmartSim including SmartSim itself.
@@ -376,26 +351,8 @@ def ml_extras_required(self) -> t.Dict[str, t.List[str]]:
376351
"""
377352
ml_defaults = self.REDISAI.get_defaults()
378353

379-
def _format_custom_linux_torch_deps(
380-
torchv: str, torchvisionv: str, arch: str
381-
) -> t.Tuple[str, ...]:
382-
# The correct versions and suffixes were scraped from
383-
# https://pytorch.org/get-started/previous-versions/
384-
supported_py_versions = ("3.8", "3.9", "3.10")
385-
return tuple(
386-
itertools.chain.from_iterable(
387-
(
388-
_format_linux_torch_py_package_req(arch, pyv, torchv),
389-
_format_linux_torchvision_py_package_req(
390-
arch, pyv, torchvisionv
391-
),
392-
)
393-
for pyv in supported_py_versions
394-
)
395-
)
396-
397354
# remove torch-related fields as they are subject to change
398-
# by having the user set env vars
355+
# by having the user change hardware (cpu/gpu)
399356
_torch_fields = [
400357
"torch",
401358
"torchvision",
@@ -405,25 +362,8 @@ def _format_custom_linux_torch_deps(
405362
for field in _torch_fields:
406363
ml_defaults.pop(field)
407364

408-
common = tuple(f"{lib}=={vers}" for lib, vers in ml_defaults.items())
409365
return {
410-
"ml-cpu": [
411-
*common,
412-
# osx
413-
f'torch=={self.TORCH} ; sys_platform == "darwin"',
414-
f'torchvision=={self.TORCHVISION} ; sys_platform == "darwin"',
415-
# linux
416-
*_format_custom_linux_torch_deps(
417-
self.TORCH, self.TORCHVISION, self.TORCH_CPU_SUFFIX.lstrip("+")
418-
),
419-
],
420-
"ml-cuda": [
421-
*common,
422-
# linux
423-
*_format_custom_linux_torch_deps(
424-
self.TORCH, self.TORCHVISION, self.TORCH_CUDA_SUFFIX.lstrip("+")
425-
),
426-
],
366+
"ml": [f"{lib}=={vers}" for lib, vers in ml_defaults.items()]
427367
}
428368

429369
@staticmethod

0 commit comments

Comments
 (0)