This repository may contain references to JUMP profile data that need to be updated to reflect the new directory structure.
Context
The JUMP Cell Painting profiles have been reorganized to a new, cleaner structure. See jump-cellpainting/datasets#155 for details.
Required Changes
Your repository may contain references to the old profile paths that need to be updated:
Old → New Path Mappings
-
/workspace/profiles/jump-profiling-recipe_2024_a917fa7/ORF/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet → /workspace/profiles_assembled/ORF/v1.0a/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet
-
/workspace/profiles/jump-profiling-recipe_2024_a917fa7/ORF/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/profiles_wellpos_cc_var_mad_outlier.parquet → /workspace/profiles_assembled/ORF/v1.0a/profiles_wellpos_cc_var_mad_outlier.parquet
-
/workspace/profiles/jump-profiling-recipe_2024_a917fa7/CRISPR/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony_PCA_corrected/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony_PCA_corrected.parquet → /workspace/profiles_assembled/CRISPR/v1.0a/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony_PCA_corrected.parquet
-
/workspace/profiles/jump-profiling-recipe_2024_a917fa7/CRISPR/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony_PCA_corrected/profiles_wellpos_cc_var_mad_outlier.parquet → /workspace/profiles_assembled/CRISPR/v1.0a/profiles_wellpos_cc_var_mad_outlier.parquet
-
/workspace/profiles/jump-profiling-recipe_2024_a917fa7/COMPOUND/profiles_var_mad_int_featselect_harmony/profiles_var_mad_int_featselect_harmony.parquet → /workspace/profiles_assembled/COMPOUND/v1.0/profiles_var_mad_int_featselect_harmony.parquet
-
/workspace/profiles/jump-profiling-recipe_2024_a917fa7/COMPOUND/profiles_var_mad_int_featselect_harmony/profiles_var_mad_int.parquet → /workspace/profiles_assembled/COMPOUND/v1.0/profiles_var_mad_int.parquet
-
/workspace/profiles/jump-profiling-recipe_2024_0224e0f/ALL/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet → /workspace/profiles_assembled/ALL/v1.0b/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet
-
/workspace/profiles/jump-profiling-recipe_2024_0224e0f/ALL/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/profiles_wellpos_cc_var_mad_outlier_featselect.parquet → /workspace/profiles_assembled/ALL/v1.0b/profiles_wellpos_cc_var_mad_outlier_featselect.parquet
Update Script
The following AWK script by @afermg provides a more comprehensive solution that handles all profile paths generically:
Create a file named update_cpg_location.awk:
# Update the paths of cpg files
# /workspace/profiles/jump-profiling-recipe_2024_a917fa7/ORF/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet
# Is converted to
# /workspace/profiles_assembled/ORF/v1.0a/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet
BEGIN {
pattern = "/workspace/profiles/jump-profiling-recipe_2024_[a-z0-9]{7}/([A-Z]+)/.+/(.+[.]parquet)";
}
{
if (match($0, pattern, captures)){
version_name = "v1.0";
if (captures[1]=="ORF" || captures[1]=="CRISPR"){
version_name = version_name "a";
};
if (captures[1]=="ALL"){
version_name = version_name "b";
};
replacement = "/workspace/profiles_assembled/" captures[1] "/" version_name "/" captures[2];
gsub(pattern,replacement);
};
print $0
}
To update all relevant files in your codebase:
# Find and update all files containing old profile paths
rg "workspace/profiles/jump-profiling-recipe_2024" -t py -t json -t md -t sh -t org -t csv -t nix -l | xargs awk -i inplace -f update_cpg_location.awk
Note for macOS users: You'll need GNU awk for this script. Install it with brew install gawk and use gawk instead of awk in the command above.
This command:
- Uses ripgrep (
rg) to find files containing the old paths
-t selects specific file formats
-l provides a list of files only
awk -i inplace modifies files in place
Important: After running the AWK script, always review the changes with git diff to ensure the transformations were applied correctly. The script handles most cases, but edge cases or typos in the original paths may require manual adjustment.
Additional Note
If your repository also references manifests/profile_index.csv, note that the format has changed from CSV to JSON. See jump-cellpainting/datasets#152 and jump-cellpainting/datasets#155 for details.
Action Required
Please update your code to use the new profile paths. The old paths will be deprecated.
Feel free to reach out if you have any questions or need assistance with the migration.
This repository may contain references to JUMP profile data that need to be updated to reflect the new directory structure.
Context
The JUMP Cell Painting profiles have been reorganized to a new, cleaner structure. See jump-cellpainting/datasets#155 for details.
Required Changes
Your repository may contain references to the old profile paths that need to be updated:
Old → New Path Mappings
/workspace/profiles/jump-profiling-recipe_2024_a917fa7/ORF/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet→/workspace/profiles_assembled/ORF/v1.0a/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet/workspace/profiles/jump-profiling-recipe_2024_a917fa7/ORF/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/profiles_wellpos_cc_var_mad_outlier.parquet→/workspace/profiles_assembled/ORF/v1.0a/profiles_wellpos_cc_var_mad_outlier.parquet/workspace/profiles/jump-profiling-recipe_2024_a917fa7/CRISPR/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony_PCA_corrected/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony_PCA_corrected.parquet→/workspace/profiles_assembled/CRISPR/v1.0a/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony_PCA_corrected.parquet/workspace/profiles/jump-profiling-recipe_2024_a917fa7/CRISPR/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony_PCA_corrected/profiles_wellpos_cc_var_mad_outlier.parquet→/workspace/profiles_assembled/CRISPR/v1.0a/profiles_wellpos_cc_var_mad_outlier.parquet/workspace/profiles/jump-profiling-recipe_2024_a917fa7/COMPOUND/profiles_var_mad_int_featselect_harmony/profiles_var_mad_int_featselect_harmony.parquet→/workspace/profiles_assembled/COMPOUND/v1.0/profiles_var_mad_int_featselect_harmony.parquet/workspace/profiles/jump-profiling-recipe_2024_a917fa7/COMPOUND/profiles_var_mad_int_featselect_harmony/profiles_var_mad_int.parquet→/workspace/profiles_assembled/COMPOUND/v1.0/profiles_var_mad_int.parquet/workspace/profiles/jump-profiling-recipe_2024_0224e0f/ALL/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet→/workspace/profiles_assembled/ALL/v1.0b/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet/workspace/profiles/jump-profiling-recipe_2024_0224e0f/ALL/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/profiles_wellpos_cc_var_mad_outlier_featselect.parquet→/workspace/profiles_assembled/ALL/v1.0b/profiles_wellpos_cc_var_mad_outlier_featselect.parquetUpdate Script
The following AWK script by @afermg provides a more comprehensive solution that handles all profile paths generically:
Create a file named
update_cpg_location.awk:To update all relevant files in your codebase:
Note for macOS users: You'll need GNU awk for this script. Install it with
brew install gawkand usegawkinstead ofawkin the command above.This command:
rg) to find files containing the old paths-tselects specific file formats-lprovides a list of files onlyawk -i inplacemodifies files in placeImportant: After running the AWK script, always review the changes with
git diffto ensure the transformations were applied correctly. The script handles most cases, but edge cases or typos in the original paths may require manual adjustment.Additional Note
If your repository also references
manifests/profile_index.csv, note that the format has changed from CSV to JSON. See jump-cellpainting/datasets#152 and jump-cellpainting/datasets#155 for details.Action Required
Please update your code to use the new profile paths. The old paths will be deprecated.
Feel free to reach out if you have any questions or need assistance with the migration.