-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathsam2metacat.html
More file actions
298 lines (286 loc) · 18.7 KB
/
sam2metacat.html
File metadata and controls
298 lines (286 loc) · 18.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Sam to Metacat Conversion guide — DataCatalogDocs 0.2 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/nature.css" />
<link rel="stylesheet" type="text/css" href="_static/graphviz.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<link rel="canonical" href="https://dune.github.io/DataCatalogDocs/sam2metacat.html" />
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Metadata categories and samweb->metacat conversion" href="metadatameaning.html" />
<link rel="prev" title="Introduction" href="Intro.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="metadatameaning.html" title="Metadata categories and samweb->metacat conversion"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="Intro.html" title="Introduction"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">DataCatalogDocs 0.2 documentation</a> »</li>
<li class="nav-item nav-item-this"><a href="">Sam to Metacat Conversion guide</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="sam-to-metacat-conversion-guide">
<h1>Sam to Metacat Conversion guide<a class="headerlink" href="#sam-to-metacat-conversion-guide" title="Permalink to this heading">¶</a></h1>
<blockquote>
<div><p>This document includes examples of <cite>sam</cite> queries gathered from DUNE Dataset definitions and their <cite>metacat</cite> translations</p>
</div></blockquote>
<div class="section" id="get-metacat-started">
<h2>Get metacat started<a class="headerlink" href="#get-metacat-started" title="Permalink to this heading">¶</a></h2>
<p>First find the documentation:</p>
<p><a class="reference external" href="https://metacat.readthedocs.io/en/latest/index.html">https://metacat.readthedocs.io/en/latest/index.html</a></p>
<p>metacat is a <cite>ups</cite> product so you can get it by</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">source</span><span class="w"> </span>/cvmfs/dune.opensciencegrid.org/products/dune/setup_dune.sh
setup<span class="w"> </span>python<span class="w"> </span>v3_9_2<span class="w"> </span><span class="c1"># this avoids system python which may be very old</span>
setup<span class="w"> </span>metacat
</pre></div>
</div>
<p>but you can also do a local install using:</p>
<p><a class="reference external" href="https://metacat.readthedocs.io/en/latest/ui.html#installation">https://metacat.readthedocs.io/en/latest/ui.html#installation</a></p>
<p>Make certain you can point to the metacat server:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span><span class="w"> </span><span class="nv">METACAT_AUTH_SERVER_URL</span><span class="o">=</span>https://metacat.fnal.gov:8143/auth/dune
<span class="nb">export</span><span class="w"> </span><span class="nv">METACAT_SERVER_URL</span><span class="o">=</span>https://metacat.fnal.gov:9443/dune_meta_prod/app
</pre></div>
</div>
<p>Then authenticate to metacat:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>metacat<span class="w"> </span>auth<span class="w"> </span>login<span class="w"> </span>-m<span class="w"> </span>password<span class="w"> </span><span class="nv">$USER</span>
Password:
User:<span class="w"> </span>schellma
Expires:<span class="w"> </span>Thu<span class="w"> </span>Oct<span class="w"> </span><span class="m">13</span><span class="w"> </span><span class="m">16</span>:27:29<span class="w"> </span><span class="m">2022</span>
</pre></div>
</div>
<p><em>Note: you can also authenticate via other methods, for example</em></p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>kx509
<span class="nb">export</span><span class="w"> </span><span class="nv">X509_USER_PROXY</span><span class="o">=</span>/tmp/x509up_u<span class="k">$(</span>id<span class="w"> </span>-u<span class="k">)</span>
<span class="nb">export</span><span class="w"> </span><span class="nv">X509_USER_KEY</span><span class="o">=</span><span class="nv">$X509_USER_PROXY</span>
metacat<span class="w"> </span>auth<span class="w"> </span>login<span class="w"> </span>-m<span class="w"> </span>x509<span class="w"> </span><span class="nv">$USER</span>
</pre></div>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>If you are not on a Fermilab machine you may need to add your local credentials to the list of DN’s and explicitly tell metacat your FNAL user id.</p>
<p>do this</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>metacat<span class="w"> </span>auth<span class="w"> </span>mydn
</pre></div>
</div>
<ol class="arabic simple">
<li><p>Log in to MetaCat GUI using services password</p></li>
<li><p>Go to your user profile <a class="reference external" href="https://metacat.fnal.gov:9443/dune_prod/app/gui/user">https://metacat.fnal.gov:9443/dune_prod/app/gui/user</a>?username=<yourFNALusername></p></li>
<li><p>Copy-paste the output from “metacat auth mydn” into blank text box in front of Add button</p></li>
<li><p>Click Add</p></li>
</ol>
<p>then</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>metacat<span class="w"> </span>auth<span class="w"> </span>login<span class="w"> </span>-m<span class="w"> </span>x509<span class="w"> </span><yourFNALusername>
</pre></div>
</div>
</div>
</div>
<div class="section" id="example-get-the-raw-data-from-given-protodune-sp-detector-runs">
<h2>Example: Get the raw data from given protodune-sp detector runs<a class="headerlink" href="#example-get-the-raw-data-from-given-protodune-sp-detector-runs" title="Permalink to this heading">¶</a></h2>
<ul>
<li><p>samweb</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>samweb<span class="w"> </span>list-files<span class="w"> </span><span class="s2">"file_type detector and run_type 'protodune-sp'\</span>
<span class="s2"> and data_tier raw and data_stream physics and run_number 5141,5143"</span>
</pre></div>
</div>
<p>add <cite>–summary</cite> if you wish to know how many files there are.</p>
</li>
<li><p>metacat</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>metacat<span class="w"> </span>query<span class="w"> </span><span class="s2">"files from dune:all where core.file_type=detector \</span>
<span class="s2"> and core.run_type='protodune-sp' and core.data_tier=raw \</span>
<span class="s2"> and core.data_stream=physics and core.runs[any] in (5141,5143)"</span>
</pre></div>
</div>
</div></blockquote>
<p>add <cite>–summary</cite> after query if you want just the # of files</p>
<p><em>Notes:</em></p>
<ul class="simple">
<li><p><em>many of the metadata values are now in categories like `core`</em></p></li>
<li><p><em>things run faster if you ask for files from a known dataset like `dune:all`</em></p></li>
<li><p><em>core.runs[any] means check any of the runs associated with the file for being 5141</em></p></li>
<li><p><em>core.runs[any] in (5141, 5142, 5147) - any of these 3 runs</em></p></li>
<li><p><em>core.runs[any] = 5141- single run, equivalent: 5141 in core.runs</em></p></li>
<li><p><em>you can ask for multiple runs by using the `in (X,Y)` syntax</em></p></li>
</ul>
</li>
</ul>
</div>
<div class="section" id="example-save-a-dataset-or-definition-query">
<h2>Example: Save a dataset or definition query<a class="headerlink" href="#example-save-a-dataset-or-definition-query" title="Permalink to this heading">¶</a></h2>
<p>If you are interested in everything physics from <cite>protodune-sp</cite>, you might want to save a generic dataset or query which you can then reuse in further filtered queries. Then as you narrow thing down you can build additional datasets.</p>
<ul>
<li><p><em>samweb</em></p>
<p>in sam you save a definition, which is the query</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>samweb<span class="w"> </span>create-definition<span class="w"> </span>schellma-protodune-sp-physics-generic<span class="w"> </span><span class="se">\</span>
<span class="s2">"file_type detector and run_type 'protodune-sp' and data_stream physics"</span><span class="w"> </span><span class="sb">`</span>
</pre></div>
</div>
<p>You can then ask for:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>samweb<span class="w"> </span>list-files<span class="w"> </span><span class="s2">"defname:schellma-protodune-sp-physics-generic \</span>
<span class="s2"> and data_tier raw and run_number 5141"</span><span class="w"> </span>--summary
</pre></div>
</div>
</div></blockquote>
<p><em>Note: a sam definition is a query, not a list of files and can change, for example if more data are added. You need to make a `snapshot` to make a list that does not change.</em></p>
<p><em>Another note: sam also prepends the user name to the definition so that you can’t mess up official queries. This is handled in metacat by the introduction of namespaces.</em></p>
</li>
<li><p>metacat</p>
<p>To run a MQL query and create a new dataset with the query results:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>metacat<span class="w"> </span>dataset<span class="w"> </span>create<span class="w"> </span>-f<span class="w"> </span><span class="s2">"files from dune:all where \</span>
<span class="s2">..."</span><span class="w"> </span><dataset_namespace>:<dataset_name>
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>metacat<span class="w"> </span>dataset<span class="w"> </span>create<span class="w"> </span>-f<span class="w"> </span>@file_with_mql_query.txt<span class="w"> </span><span class="se">\</span>
<dataset_namespace>:<dataset_name><span class="w"> </span><dataset<span class="w"> </span>description>
</pre></div>
</div>
</div></blockquote>
<p>To run a query and add matching files to an existing dataset:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>metacat<span class="w"> </span>dataset<span class="w"> </span>add-files<span class="w"> </span>-q<span class="w"> </span><span class="s2">"files from dune:all where ..."</span><span class="w"> </span><dataset_namespace>:<dataset_name>
metacat<span class="w"> </span>dataset<span class="w"> </span>add-files<span class="w"> </span>-q<span class="w"> </span>@file_with_mql_query.txt<span class="w"> </span><dataset_namespace>:<dataset_name>
</pre></div>
</div>
<p>check it by querying the files in the dataset</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>metacat<span class="w"> </span>query<span class="w"> </span>-s<span class="w"> </span><span class="s2">"files from schellma:protodune-sp-physics-generic"</span>
metacat<span class="w"> </span>dataset<span class="w"> </span>show<span class="w"> </span>schellma:protodune-sp-physics-generic
children<span class="w"> </span>:
created_timestamp<span class="w"> </span>:<span class="w"> </span><span class="m">2022</span>-10-08<span class="w"> </span><span class="m">11</span>:41:54
creator<span class="w"> </span>:<span class="w"> </span>schellma
description<span class="w"> </span>:<span class="w"> </span>files<span class="w"> </span>from<span class="w"> </span>dune:all<span class="w"> </span>where<span class="w"> </span>core.file_type<span class="o">=</span>detector<span class="w"> </span>and<span class="w"> </span>core.run_type<span class="o">=</span><span class="s1">'protodune-sp'</span><span class="w"> </span>and<span class="w"> </span>core.data_stream<span class="o">=</span>physics
file_count<span class="w"> </span>:<span class="w"> </span><span class="m">772631</span>
file_meta_requirements<span class="w"> </span>:<span class="w"> </span><span class="o">{}</span>
frozen<span class="w"> </span>:<span class="w"> </span>False
metadata<span class="w"> </span>:<span class="w"> </span><span class="o">{}</span>
monotonic<span class="w"> </span>:<span class="w"> </span>False
name<span class="w"> </span>:<span class="w"> </span>protodune-sp-physics-generic
namespace<span class="w"> </span>:<span class="w"> </span>schellma
parents<span class="w"> </span>:
</pre></div>
</div>
<p>You can then ask for the subset from a particular data tier and run number.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>metacat<span class="w"> </span>query<span class="w"> </span><span class="s2">"files from schellma:protodune-sp-physics-generic \</span>
<span class="s2">where core.runs[all]=5141 and core.data_tier=raw"</span>
</pre></div>
</div>
</li>
</ul>
</div>
<div class="section" id="find-only-the-files-not-processed-with-a-version-of-code">
<h2>Find only the files not processed with a version of code<a class="headerlink" href="#find-only-the-files-not-processed-with-a-version-of-code" title="Permalink to this heading">¶</a></h2>
<ul>
<li><p>samweb</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>samweb<span class="w"> </span>list-files<span class="w"> </span><span class="s2">"defname:schellma-protodune-sp-physics-generic \</span>
<span class="s2"> and data_tier raw and run_number 5141 minus \</span>
<span class="s2"> isparentof:(defname:schellma-protodune-sp-physics-generic\</span>
<span class="s2"> and data_tier 'full-reconstructed' and run_number 5141 and version v08_27_% )"</span><span class="w"> </span>--summary
File<span class="w"> </span>count:<span class="w"> </span><span class="m">12</span>
Total<span class="w"> </span>size:<span class="w"> </span><span class="m">95354212618</span>
Event<span class="w"> </span>count:<span class="w"> </span><span class="m">1241</span>
</pre></div>
</div>
</li>
<li><p>metacat</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>metacat<span class="w"> </span>query<span class="w"> </span>-s<span class="w"> </span><span class="s2">"files from schellma:protodune-sp-physics-generic \</span>
<span class="s2">where core.data_tier=raw and 5141 in core.runs - parents(files \</span>
<span class="s2">from schellma:protodune-sp-physics-generic where 5141 in core.runs \</span>
<span class="s2">and core.data_tier='full-reconstructed' and core.application.version~'v08_27_.*')"</span>
<span class="m">12</span><span class="w"> </span>files
</pre></div>
</div>
</li>
</ul>
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<div>
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Sam to Metacat Conversion guide</a><ul>
<li><a class="reference internal" href="#get-metacat-started">Get metacat started</a></li>
<li><a class="reference internal" href="#example-get-the-raw-data-from-given-protodune-sp-detector-runs">Example: Get the raw data from given protodune-sp detector runs</a></li>
<li><a class="reference internal" href="#example-save-a-dataset-or-definition-query">Example: Save a dataset or definition query</a></li>
<li><a class="reference internal" href="#find-only-the-files-not-processed-with-a-version-of-code">Find only the files not processed with a version of code</a></li>
</ul>
</li>
</ul>
</div>
<div>
<h4>Previous topic</h4>
<p class="topless"><a href="Intro.html"
title="previous chapter">Introduction</a></p>
</div>
<div>
<h4>Next topic</h4>
<p class="topless"><a href="metadatameaning.html"
title="next chapter">Metadata categories and samweb->metacat conversion</a></p>
</div>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/sam2metacat.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="metadatameaning.html" title="Metadata categories and samweb->metacat conversion"
>next</a> |</li>
<li class="right" >
<a href="Intro.html" title="Introduction"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">DataCatalogDocs 0.2 documentation</a> »</li>
<li class="nav-item nav-item-this"><a href="">Sam to Metacat Conversion guide</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
© Copyright 2023, Fermi National Accelerator Laboratory.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
</body>
</html>