-
Notifications
You must be signed in to change notification settings - Fork 700
[ENH] EXPERIMENTAL: Example notebook based on the new data pipeline #1813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Hi @fkiraly, I am getting this error: I just downloaded the notebook from colab and pasted it in the repo, is there anything else I should do to avoid this? Really have no idea 😅 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1813 +/- ##
=======================================
Coverage ? 85.59%
=======================================
Files ? 68
Lines ? 6597
Branches ? 0
=======================================
Hits ? 5647
Misses ? 950
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Thank you for the review @xandie985!
So right now we assume that data could fit in the memory, but yes, in future we plan to add features like chunking, on-demand loading etc
These are some open questions, we still need to work on - We will tackle these questions in future iterations once an end-to-end prototype is ready and we get some reviews from the users of the package on this prototype. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More detailed review.
- please remove the install from the start of the notebook
- we should test that this is running, while we are working on v2. One way is to move the content to
docs/examples/tutorials
, the contents of which are automatically run an tested. - the data generation cell is useful, but not too illustrative. Can you move the code to a function
load_toydata
or similar, inpytorch_forecasting.data
, new module, e.g.,toydata
? Then we can also use this in testing later! - can you add basic markdown cells that explain what the notebook is showing, and what each steps are? E.g., a summary at the top of the multiple steps, and then again small headers for the steps with minimal explanations.
Thanks! I would make the changes accordingly, Just one doubt:
I think we can add it to |
Makes sense, to add it to the established location with data loaders. Would it make sense to split the file up and have on loader per file? Need not be done in this PR. |
Then I think we need to should create a new folder called |
Maybe a separate PR though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Some minor change requests only.
- in the header "data pipeline" - should it not be training and inference?
- please remove unused imports
- I would also suggest to move imports to the cells where they are used
- at the start, can you summarize what is shown over the entire notebook in a few bullet points? Mostly just a list of the headers (table of contents and signposting)
- can you explain usage of the important objects? Use markdown or in-line comments
- most important arguments of objects such as of
TimeSeries
- explain and show the types and structure of important returns such as
y_pred
- most important arguments of objects such as of
- I would split the last cell into multiple parts, too much is happening there
- rule of thumb, cells should be max 15 lines, and printouts max 10 lines. There should be descriptive content, even if very minimal, at the start or before the cell in a markdown.
I mean it is a basic vignette of "data pipeline", how data flow might look like in v2? Should I add words "training" and "inference" as well there? Model training is just to "complete" the process |
Data pipeline is not accurate imo - people expect pre-processing or ETL if they hear that. But in fact this is the full basic workflow for actually using the neural networks for forecasting. |
Hi @fkiraly, will this work? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, great!
Made some minor changes to the header to make this clearer.
Description
This PR adds example notebook for the new v2 data pipeline vignette, having the basic implementation of the tft model using this version. For more info see #1812 , #1811
Colab link: https://colab.research.google.com/drive/148MyhcNfYEh4CZ6vBXLqQNsUBF0n6_0v?usp=sharing