Skip to content

Conversation

alexnikulkov
Copy link
Contributor

Summary: Attempt to fix the test timing out on CircleCI

Differential Revision: D33900623

Jason Gauci and others added 30 commits April 7, 2021 16:18
Summary: Pull Request resolved: facebookresearch#445

Reviewed By: kaiwenw

Differential Revision: D27303639

fbshipit-source-id: 1c8f105a90aa929c8fecae12aa3191a0a8ed0008
Differential Revision: D27643630

fbshipit-source-id: 38246baa4212271a68c3ae3044e4c87e37de5b4d
Summary:
Pull Request resolved: facebookresearch#446

Switch eval_td_loss to Tensorboard

Reviewed By: bankawas

Differential Revision: D27643487

fbshipit-source-id: 25c0af8f0d943abaa68b024fd2f61caf65445cd9
Summary: Pull Request resolved: facebookresearch#444

Reviewed By: kaiwenw

Differential Revision: D27614614

fbshipit-source-id: ce5de96de5714eab80c1e3c6c78100663426ff66
Summary: Adding binary-cross-entropy-with-logits loss for myopic values between 0 and 1.

Reviewed By: czxttkl

Differential Revision: D27712539

fbshipit-source-id: f9e5fa67cee9955d191712a4c472968086e94c91
Differential Revision: D27759437

fbshipit-source-id: 7a886f01fe28589242b6b666dcc4b5e09f571cf4
…ates in stderr log

Summary: Add time_line_operator to notifications, enable printing of IPS and Direct scores in stderr log

Reviewed By: czxttkl

Differential Revision: D27730248

fbshipit-source-id: 87f0929b3fc83e081451f8d83d4edb0ac275d0bd
Summary: Rewrite the logic to filter the features before filling in nan values. This reduces the latency significantly when model only uses a fraction of input features. 70x from 1.4 secs to 0.02 sec when the fraction is ~5%.

Reviewed By: czxttkl

Differential Revision: D27740568

fbshipit-source-id: 4850864cd75ef39ce03790d10153b075f94be9c9
Summary:
Pull Request resolved: facebookresearch#449

- add option to train as residual boost (on top of prod vm score)
- net builder for MLP for better configuration of MLPScorer
- filter out slates with 0 scores (ostensibly from precision problems); these caused nan problems in training
- add option for orthogonal weight initialization

Reviewed By: czxttkl

Differential Revision: D27264221

fbshipit-source-id: 0c53893a155c29229efafed9f459f6e950dbcf12
Summary:
Pull Request resolved: facebookresearch#437

- make slate_rewards into separate columns, and enable indexing reward in python
- try recurring training / test warmstart
- add trainer_conf so we can add batched gradients accumulate_grad_batches
- remove some unneeded files

Reviewed By: czxttkl

Differential Revision: D27495823

fbshipit-source-id: 01199bc3228d53e2869b6246a2fb2ed704eea62e
…acebookresearch#452)

Summary:
Pull Request resolved: facebookresearch#452

Move the batch_size for training seq2reward `trainer_param` to `reader_options`.

Reviewed By: czxttkl

Differential Revision: D27720626

fbshipit-source-id: dcefcfbda56298a1ab67e3031813bccd3d67ae2f
…rch#448)

Summary:
Pull Request resolved: facebookresearch#448

Now Seq2Reward and its compress model share the same `mini_batchsize` specified in `reader_option`.

Also fix the bug in https://fburl.com/diffusion/igzadset by replacing `seq2reward_network` with `compress_model_network` in validation_step.

Reviewed By: czxttkl

Differential Revision: D27663810

fbshipit-source-id: 6d77a6e48cdd7dec165d48327a1057dd2b60a2ed
Summary: Pull Request resolved: facebookresearch#453

Reviewed By: czxttkl

Differential Revision: D27753538

fbshipit-source-id: 6e02e8d0d1a037b6cc349179fc2d68b5fa892b51
Differential Revision: D27782677

fbshipit-source-id: d6a80c8b1ae1a943fddc351a0bc647367495abc1
Summary: Fast RL model manager names need to be updated after our refactor

Reviewed By: alexnikulkov

Differential Revision: D27780305

fbshipit-source-id: 14e4d45d1fd47eabf2916fd634e650dcf51ebd39
Summary:
Pull Request resolved: facebookresearch#451

This diff uses the logger in pytorch lightning to recreate the graphs that were traditionally reported through the dqn_reporter.  These graphs are then fed back into fblearner, eliiminating the need to report them manually.

Reviewed By: czxttkl

Differential Revision: D27694627

fbshipit-source-id: 9f5437ff38d61f316c09b03d6088ce36f4d6199c
Summary:
Pull Request resolved: facebookresearch#454

title

Reviewed By: alexnikulkov

Differential Revision: D27800185

fbshipit-source-id: 406001b48f55d7304d18e06237e7bf82ed07c11b
Reviewed By: divchenko

Differential Revision: D27835360

fbshipit-source-id: cbb23793ee57382e43bd65bd40cfeb2820c6eec2
Summary: Pull Request resolved: facebookresearch#384

Test Plan:
CI Tests
...but without running open source tests.

Reviewed By: gji1

Differential Revision: D27842452

Pulled By: MisterTea

fbshipit-source-id: 6fb192d30217d358e86a04e6bcc5a69911276e71
…ainers (facebookresearch#457)

Summary:
Pull Request resolved: facebookresearch#457

trainer.train(batch) was the old, pre-Lightning ReAgent trainer API.
With this diff we make sure that nobody is trying to call trainer.train(batch).
trainer.train() or trainer.train(True/False) is allowed - this puts the network into training/eval mode.

Reviewed By: MisterTea

Differential Revision: D27862583

fbshipit-source-id: b0875e11cd4ef214c75fd1bef5b696f1cdf2b8d6
Summary:
fix bugs: GreedyActionSampler returned one as a log prob and EpsilonGreedyActionSampler didn't work.

Pull Request resolved: facebookresearch#393

Test Plan:
Imported from GitHub, without a `Test Plan:` line.
...but without running open source tests.

Reviewed By: kaiwenw

Differential Revision: D27842450

Pulled By: MisterTea

fbshipit-source-id: 9b4aa85f352f2d7565473127b280d61bcc6d3b71
Summary: Pull Request resolved: facebookresearch#455

Test Plan: CI Tests

Reviewed By: czxttkl

Differential Revision: D27842449

Pulled By: MisterTea

fbshipit-source-id: bee6d009236e87eaddae7ea7d083c7500dc1220b
Summary:
Pull Request resolved: facebookresearch#458

When trying to follow the [tutorial](https://reagent.ai/rasp_tutorial.html) there are a few things that need fixing:

1. When running the script serving/scripts/rasp_to_model.py I came across this error

```
python serving/scripts/rasp_to_model.py /tmp/rasp_logging/log.txt /tmp/input_df.pkl

Traceback (most recent call last):
  File "serving/scripts/rasp_to_model.py", line 13, in <module>
    logger.setLevel(logging.info)
  File "/usr/local/anaconda3/envs/reagent/lib/python3.7/logging/__init__.py", line 1353, in setLevel
    self.level = _checkLevel(level)
  File "/usr/local/anaconda3/envs/reagent/lib/python3.7/logging/__init__.py", line 195, in _checkLevel
    raise TypeError("Level not an integer or a valid string: %r" % level)
TypeError: Level not an integer or a valid string: <function info at 0x7fb8000d73b0>
```

Luckily it is an easy fix to pass an actual loglevel.

2. This config file probably is outdated: serving/examples/ecommerce/training/contextual_bandit.yaml
- changed indentation level
- changed key name

3. There is an __init__.py file missing in the gym tests therefore leading to an error

4. The path to the SPARK_JAR was not resolving correctly.

Pull Request resolved: facebookresearch#391

Test Plan:
Imported from GitHub, without a `Test Plan:` line.
...but without running open source tests.

Reviewed By: czxttkl

Differential Revision: D27842451

Pulled By: MisterTea

fbshipit-source-id: 2175296c6b60db4dc4b22804a74c2259b14fee7e
…Set test model type appropriately.

Reviewed By: bankawas

Differential Revision: D27863892

fbshipit-source-id: 0084920bd82d54f5aece46f36c32fbbec5ba3380
Summary:
Pull Request resolved: facebookresearch#459

as titled. also some small polish on the codebase.

Reviewed By: kaiwenw

Differential Revision: D27899809

fbshipit-source-id: 882471f1a9376d0d50bd935e02328667f1867450
…rch#460)

Summary:
Pull Request resolved: facebookresearch#460

OOM issues can occur in CFEval of DQN and CRR workflows when the validation set is too large, as in https://fb.workplace.com/groups/horizon.users/permalink/836921400197015/. This diff solves this issue by computing the numbers needed for CFEval in `validation_step`, instead of just stacking the raw batches, which include all the state features that can take a lot of memory.

Note that if `use_gpu=True`, for speed the CFEval-required numbers are computed on the GPUs, where both the validation batch and the trainer is stored. Then the returned `EvaluationDataPage` will be moved to the CPU, because later in `validation_epoch_end` everything will be done on the CPU for larger memory capacity. To enable this transportation between devices, in this diff `EvaluationDataPage` is changed to a subclass of `TensorDataClass` from the previous `NamedTuple`.

Reviewed By: kaiwenw

Differential Revision: D27929283

fbshipit-source-id: f57948232f395b297d957cdc2afbc38a874a1810
Differential Revision: D27949485

fbshipit-source-id: 7f0fde8111150922bd0c62cb473f71a3a2bc7367
…ookresearch#450)

Summary: Pull Request resolved: facebookresearch#450

Reviewed By: kaiwenw

Differential Revision: D27692807

fbshipit-source-id: 2b880d2a5543db0fa244b818747328d6bce7ed20
Summary:
- Add more elements to the output
- Fix dependency in TARGETS
- Fix some typos in comments
- Wrap paths in `os.path.expanduser()`

Reviewed By: bankawas

Differential Revision: D27946814

fbshipit-source-id: b9cd0bedfecc1e63007e7d15f40a5431ed85e3ae
Summary: Pull Request resolved: facebookresearch#447

Reviewed By: czxttkl

Differential Revision: D26627900

fbshipit-source-id: 7be325fada7819f011092726d1cd29fb5483d599
czxttkl and others added 20 commits December 4, 2021 21:58
Summary: We have updated fbcode/reagent/oss/docs/index.rst in D32583915. Now, we need to also update fbcode/reagent/oss/README.md

Reviewed By: gji1

Differential Revision: D32860084

fbshipit-source-id: 7add052a1c39051cd786aa3df8ba413e1e477fc8
Summary:
Add a yaml config-based RL orchestrator, which can start and monitor all necessary workflows in one place.

Currently, I just prototype for the domain analysis tool. I expect the user could use the cookbook by:
```
path = "PATH_TO_YAML_CONFIG"
cookbook = parse(path)
cookbook.domain_analysis()
```

Reviewed By: gji1

Differential Revision: D31334181

fbshipit-source-id: 21b94dcd3db04bdc8234c5f8098284dd9ca41612
Summary: from https://www.internalfb.com/intern/wiki/Pytorch_Ecosystem_Foundation_(EcoF)/PyTorch_Lightning/Operations/Sync_OSS_FBCode/

Reviewed By: ananthsub

Differential Revision: D32933988

fbshipit-source-id: 60d9054d7c1f6951910a0892e3001f26930a16f5
Summary:
Pull Request resolved: facebookresearch#591

Print adjusted Direct method score

Reviewed By: gji1

Differential Revision: D32926551

fbshipit-source-id: 87a316d4b140f324c79ab1863db87baa0b0bba6a
Summary: Added an important comment

Reviewed By: czxttkl

Differential Revision: D32954765

fbshipit-source-id: f74a973c091484139a83b558ad9935ab8a3d07ef
Summary:
1. disable sparse dqn test because it touches many other systems which is hard to maintain.
2. move get_oncall_str_if_none to reagent/core/fb/flow_utils.py
3. add schema argument in construct_data_loader
4. fix world model test by explicitly providing state features

Reviewed By: gji1

Differential Revision: D33018554

fbshipit-source-id: 1b570fc71e604d50e1c6f89c3d3758bd09f3d4da
Differential Revision: D33201912

fbshipit-source-id: 5077a653203355467747f676f9f95e0e6d06fc9a
…#11061)

Summary:
### New commit log messages
  c335a7891 Remove redundant special case for disabling the progress bar on TPU (#11061)

Reviewed By: daniellepintz

Differential Revision: D33164799

fbshipit-source-id: 698753d06d9797a7b0cf5e444dee08d3f3d88088
Differential Revision: D33215229

fbshipit-source-id: a4e62bca19e657c04f6fb0ff65ef3b1e4436fc7e
Summary:
### New commit log messages
  860959fb3 Enable logging hparams only if there are any (#11105)

Reviewed By: tangbinh

Differential Revision: D33193527

fbshipit-source-id: e48abbfb703a1cc01ee4cf86ae20dbe91656c8df
…#592)

Summary:
Pull Request resolved: facebookresearch#592

The current version 11.3.0 is going to be deprecated soon. Switch our jobs to the highest version of Xcode as recommended.

The full list of Xcode versions that are available: https://urldefense.com/v3/__https://go.circleci.com/NDg1LVpNSC02MjYAAAGBeOzdGHbPV3GF6dow_mnielqRftmf3jkavLpdgGfA0_Jp1uRUkK1aSMY7wVpolG11FH_ZSgg=__;!!Bt8RZUm9aw!r6QpSB3ZU-ROLgB75dnnvWIsbokMsxcPiE36ptsjuRFvogdwt9NvRHcx$

Reviewed By: czxttkl

Differential Revision: D33282621

fbshipit-source-id: 43b1398635b1e35899ea1aad30660fad905b4588
Summary:
Pull Request resolved: facebookresearch#594

as titled

Reviewed By: gji1

Differential Revision: D33314988

fbshipit-source-id: 9d5e96db1a9bba043c1b1af884c66e2145105f28
…es (facebookresearch#593)

Summary:
Pull Request resolved: facebookresearch#593

Recent changes in PyTorch Lightning doesn't set batch_size to 1 any more for customized types. Therefore, we need to explicitly pass in the correct batch size when using the self.log function. Otherwise, the following errors would occur in OSS tests:

https://app.circleci.com/pipelines/github/facebookresearch/ReAgent/2211/workflows/c4eb86dc-cbb9-46d4-849a-aeb966be50e2/jobs/19599

https://app.circleci.com/pipelines/github/facebookresearch/ReAgent/2211/workflows/c4eb86dc-cbb9-46d4-849a-aeb966be50e2/jobs/19591

Reviewed By: czxttkl

Differential Revision: D33311293

fbshipit-source-id: 47321abb85c769402a30e46409d6d36a3b4dd82d
Differential Revision: D33337676

fbshipit-source-id: 34ddb3312749e8c1ae80e5c688d4c3d7f2da40af
Summary:
Pull Request resolved: facebookresearch#595

The test was flaky because:
1. The seed wasn't fixed
2. Both UCB1 and MetricUCB were estimating variance, so UCB1 wasn't always at a disadvantage

Reviewed By: czxttkl

Differential Revision: D33340651

fbshipit-source-id: 2e94997eb2a7c0c209ed1ecd62412900ed701152
Summary:
Pull Request resolved: facebookresearch#598

Implemented :
- synthetic data
   - To match state feature with label(action), [++++++++, ++++----, ----++++, -------- ] respectively correspond to 4 different actions.
   - support state feature with random noise to emulate stochastic
   - support label in type of both one-hot and integer, e.g., action=[1,0,0,0] or action=[0].
   -
- trainer
   - CrossEntropyLoss is adopted on top of model from dqn.py
- unittest
   - training & validation loss both approach zero, as validation of reasonable training
   - probability matches labels

Reviewed By: gji1

Differential Revision: D33409534

fbshipit-source-id: 3d9bfac68f0ef405e379ad88add7b533f72f1e2a
Summary:
Pull Request resolved: facebookresearch#600

Add missing init file in reagent/prediction/cfeval/

Reviewed By: czxttkl

Differential Revision: D33795738

fbshipit-source-id: bee4f88bfce9aa21af81db1eb96843706c07afeb
Summary: as titled

Reviewed By: wenwei202

Differential Revision: D33796163

fbshipit-source-id: 8b9480c71f6f174b05bcf8d95b9313760a86d1aa
Summary:
Pull Request resolved: facebookresearch#601

as titled

Reviewed By: PavlosApo

Differential Revision: D33802718

fbshipit-source-id: 2c2668a1bcddfe706c6303c80544f997356af417
Reviewed By: daniellepintz

Differential Revision: D33848208

fbshipit-source-id: ccd590d0286cb2bd2f381e5003bba230c9406b58
@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D33900623

alexnikulkov added a commit to alexnikulkov/ReAgent that referenced this pull request Jan 31, 2022
Summary:
Pull Request resolved: facebookresearch#602

Attempt to fix the test timing out on CircleCI

Differential Revision: D33900623

fbshipit-source-id: ef9bb3a44ea726df9ba3e82a94fa8467704abe0a
@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D33900623

Summary:
Pull Request resolved: facebookresearch#602

Attempt to fix the test timing out on CircleCI

Differential Revision: D33900623

fbshipit-source-id: a0634539412d14021145c454582814669f5308ef
@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D33900623

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.