You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have trouble understanding what the right and most elegant way is to write to an input directory in Google Cloud.
Example of a process that computes the reference sizes:
process compute_sizes {
container 'biopython/biopython'
input:
path reference_dir
val reference_fname
script:
"""
#!/usr/bin/env python3
from Bio import SeqIO
with open("${reference_dir}/${reference_fname}.sizes", "w") as out_handle:
with open("${reference_dir}/${reference_fname}", "r") as handle:
for record in SeqIO.parse(handle, "fasta"):
out_handle.write(f"{record.id}:{len(record.seq)}")
"""
}
Output working directory contents are (gsutil ls -r gs://BUCKET/workdir):
The only way to manage doing this is by publishing to the input directory, however I cannot pass the input directory to the publishDir directive as is as when I am using path variable, the input directory is actually the mounted softlink and not the actual GS location, so the only way is to pass an extra argument to the process with val instead of path like so:
process compute_sizes {
container 'biopython/biopython'
publishDir $REFERENCE_DIRECTORY_AGAIN , mode: "copy", saveAs: {file(it).getName()}
input:
path reference_dir
val reference_fname
val REFERENCE_DIRECTORY_AGAIN
output:
path "${reference_dir}/*.sizes", includeInputs: false
script:
"""
#!/usr/bin/env python3
from Bio import SeqIO
with open("${reference_dir}/${reference_fname}.sizes", "w") as out_handle:
with open("${reference_dir}/${reference_fname}", "r") as handle:
for record in SeqIO.parse(handle, "fasta"):
out_handle.write(f"{record.id}:{len(record.seq)}")
"""
}
Maybe it is my OCD, but this way is ugly, unmaintainable, and the programmer should have experience to use it. There is no documentation about this bug and workaround. So I have to ask fellow developers, what am I missing here? Thank you all for your answers in advance!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have trouble understanding what the right and most elegant way is to write to an input directory in Google Cloud.
Example of a process that computes the reference sizes:
Output working directory contents are (
gsutil ls -r gs://BUCKET/workdir
):The original reference directory does not get updated:
Adding:
to the process definition does not do anything.
The only way to manage doing this is by publishing to the input directory, however I cannot pass the input directory to the publishDir directive as is as when I am using
path
variable, the input directory is actually the mounted softlink and not the actual GS location, so the only way is to pass an extra argument to the process withval
instead ofpath
like so:Maybe it is my OCD, but this way is ugly, unmaintainable, and the programmer should have experience to use it. There is no documentation about this bug and workaround. So I have to ask fellow developers, what am I missing here? Thank you all for your answers in advance!
Beta Was this translation helpful? Give feedback.
All reactions