|
31 | 31 | "Making a model involves providing a model specification. A model specification describes:\n", |
32 | 32 | "- model meta-data\n", |
33 | 33 | "- privacy protection and other parameter values\n", |
34 | | - "- data sources for reference data\n", |
| 34 | + "- datasources for reference data\n", |
35 | 35 | "- what are the random variables and their states\n", |
36 | 36 | "- what are the cross-tables\n", |
37 | 37 | "- what are the entities, their fields, and relationships\n", |
38 | 38 | "- simulation parameters and values.\n", |
39 | 39 | "\n", |
40 | 40 | "The steps to make a model are as follows.\n", |
41 | | - "1. Datasets specified in the model specification will be loaded based on the specified data sources.\n", |
| 41 | + "1. Datasets specified in the model specification will be loaded based on the specified datasources.\n", |
42 | 42 | "2. Clean cross-tables are computed.\n", |
43 | 43 | "3. Privacy protection is applied to create noisy cross-tables.\n", |
44 | 44 | "4. The noisy cross-tables are used to create a probabilistic graphical model (PGM) for each entity.\n", |
|
47 | 47 | "The result of making a model is a collection of files which are placed into a model definition folder. The files are:\n", |
48 | 48 | "1. a JSON definition of the model ('model_spec.json')\n", |
49 | 49 | "2. a JSON definition of an index of model components ('model_index.json')\n", |
50 | | - "2. a JSON definition of the synthetic data simulator ('simulator_spec.json')\n", |
| 50 | + "3. a JSON definition of the synthetic data simulator ('simulator_spec.json')\n", |
51 | 51 | "4. Compiled Knowledge PGMs for each entity ('pgms/{_entity_}.py')\n", |
52 | 52 | "5. noisy cross-tables, if requested for saving ('noisy_cross_tables/{_cross_table_}.pk')\n", |
53 | 53 | "6. clean cross-tables, if requested for saving ('clean_cross_tables/{_cross_table_}.pk')\n", |
|
121 | 121 | "start_time": "2025-11-11T03:49:36.477856Z" |
122 | 122 | }, |
123 | 123 | "execution": { |
124 | | - "iopub.execute_input": "2025-11-19T09:29:28.778980Z", |
125 | | - "iopub.status.busy": "2025-11-19T09:29:28.778980Z", |
126 | | - "iopub.status.idle": "2025-11-19T09:29:28.788590Z", |
127 | | - "shell.execute_reply": "2025-11-19T09:29:28.788590Z" |
| 124 | + "iopub.execute_input": "2025-11-19T21:35:50.091245Z", |
| 125 | + "iopub.status.busy": "2025-11-19T21:35:50.091245Z", |
| 126 | + "iopub.status.idle": "2025-11-19T21:35:50.100528Z", |
| 127 | + "shell.execute_reply": "2025-11-19T21:35:50.100528Z" |
128 | 128 | } |
129 | 129 | }, |
130 | 130 | "outputs": [ |
|
190 | 190 | "start_time": "2025-11-11T03:49:36.499060Z" |
191 | 191 | }, |
192 | 192 | "execution": { |
193 | | - "iopub.execute_input": "2025-11-19T09:29:28.790595Z", |
194 | | - "iopub.status.busy": "2025-11-19T09:29:28.790595Z", |
195 | | - "iopub.status.idle": "2025-11-19T09:29:29.013056Z", |
196 | | - "shell.execute_reply": "2025-11-19T09:29:29.013056Z" |
| 193 | + "iopub.execute_input": "2025-11-19T21:35:50.102532Z", |
| 194 | + "iopub.status.busy": "2025-11-19T21:35:50.102532Z", |
| 195 | + "iopub.status.idle": "2025-11-19T21:35:50.320792Z", |
| 196 | + "shell.execute_reply": "2025-11-19T21:35:50.320792Z" |
197 | 197 | } |
198 | 198 | }, |
199 | 199 | "outputs": [ |
|
233 | 233 | " }\n", |
234 | 234 | " },\n", |
235 | 235 | " \"rvs\": {\n", |
236 | | - " \"X\": {\n", |
| 236 | + " \"Z\": {\n", |
237 | 237 | " \"states\": \"infer_distinct\",\n", |
238 | 238 | " \"ensure_none\": false\n", |
239 | 239 | " },\n", |
240 | 240 | " \"Y\": {\n", |
241 | 241 | " \"states\": \"infer_distinct\",\n", |
242 | 242 | " \"ensure_none\": false\n", |
243 | 243 | " },\n", |
244 | | - " \"Z\": {\n", |
| 244 | + " \"X\": {\n", |
245 | 245 | " \"states\": \"infer_distinct\",\n", |
246 | 246 | " \"ensure_none\": false\n", |
247 | 247 | " }\n", |
248 | 248 | " },\n", |
249 | 249 | " \"crosstabs\": {\n", |
250 | | - " \"_X\": {\n", |
| 250 | + " \"_Z\": {\n", |
251 | 251 | " \"rvs\": [\n", |
252 | | - " \"X\"\n", |
| 252 | + " \"Z\"\n", |
253 | 253 | " ],\n", |
254 | 254 | " \"datasource\": \"xyz\",\n", |
255 | 255 | " \"epsilon\": 0.1,\n", |
|
265 | 265 | " \"min_cell_size\": 0.0,\n", |
266 | 266 | " \"max_add_rows\": 1000000\n", |
267 | 267 | " },\n", |
268 | | - " \"_Z\": {\n", |
| 268 | + " \"_X\": {\n", |
269 | 269 | " \"rvs\": [\n", |
270 | | - " \"Z\"\n", |
| 270 | + " \"X\"\n", |
271 | 271 | " ],\n", |
272 | 272 | " \"datasource\": \"xyz\",\n", |
273 | 273 | " \"epsilon\": 0.1,\n", |
|
281 | 281 | " \"count_field_name\": \"_count_\",\n", |
282 | 282 | " \"foreign_field_name\": null,\n", |
283 | 283 | " \"fields\": {\n", |
284 | | - " \"X\": {\n", |
| 284 | + " \"Z\": {\n", |
285 | 285 | " \"type\": \"sample\",\n", |
286 | | - " \"rv_name\": \"X\"\n", |
| 286 | + " \"rv_name\": \"Z\"\n", |
287 | 287 | " },\n", |
288 | 288 | " \"Y\": {\n", |
289 | 289 | " \"type\": \"sample\",\n", |
290 | 290 | " \"rv_name\": \"Y\"\n", |
291 | 291 | " },\n", |
292 | | - " \"Z\": {\n", |
| 292 | + " \"X\": {\n", |
293 | 293 | " \"type\": \"sample\",\n", |
294 | | - " \"rv_name\": \"Z\"\n", |
| 294 | + " \"rv_name\": \"X\"\n", |
295 | 295 | " }\n", |
296 | 296 | " },\n", |
297 | 297 | " \"cardinality\": [],\n", |
|
320 | 320 | "source": [ |
321 | 321 | "This spec file defines one datasource, \"xyz\" that includes three random variables, \"X\", \"Y\" and \"Z\".\n", |
322 | 322 | "\n", |
323 | | - "System random variable can be explicitly defined in a spec file using an \"rvs\" section. The demo `spec_tiny.py` does not include this section so one is internally created with a random variable defined for all random variables seen in all data sources. Each random variable definition needs to define the possible states of the random variable. Including `states: infer_distinct` at the top level of spec file dictionary means that `states: infer_distinct` will be inherited for every random variable definition. The value `infer_distinct` means that the possible values of a random variable will be defined as \"all distinct values seen for that random variable in the datasources.\"\n", |
| 323 | + "System random variables can be explicitly defined in a spec file using section \"rvs\". The demo `spec_tiny.py` does not include this section so one is internally created with a random variable defined for all random variables seen in all datasources. Each random variable definition needs to define the possible states of the random variable. Including `states: infer_distinct` at the top level of spec file dictionary means that `states: infer_distinct` will be inherited for every random variable definition. The value `infer_distinct` means that the possible values of a random variable will be defined as \"all distinct values seen for that random variable in the datasources.\"\n", |
324 | 324 | "\n" |
325 | 325 | ] |
326 | 326 | }, |
|
354 | 354 | "start_time": "2025-11-11T03:49:37.115213Z" |
355 | 355 | }, |
356 | 356 | "execution": { |
357 | | - "iopub.execute_input": "2025-11-19T09:29:29.015061Z", |
358 | | - "iopub.status.busy": "2025-11-19T09:29:29.015061Z", |
359 | | - "iopub.status.idle": "2025-11-19T09:29:29.019192Z", |
360 | | - "shell.execute_reply": "2025-11-19T09:29:29.019192Z" |
| 357 | + "iopub.execute_input": "2025-11-19T21:35:50.322911Z", |
| 358 | + "iopub.status.busy": "2025-11-19T21:35:50.322911Z", |
| 359 | + "iopub.status.idle": "2025-11-19T21:35:50.327511Z", |
| 360 | + "shell.execute_reply": "2025-11-19T21:35:50.327511Z" |
361 | 361 | } |
362 | 362 | }, |
363 | 363 | "outputs": [ |
|
394 | 394 | "start_time": "2025-11-11T03:49:37.127892Z" |
395 | 395 | }, |
396 | 396 | "execution": { |
397 | | - "iopub.execute_input": "2025-11-19T09:29:29.021198Z", |
398 | | - "iopub.status.busy": "2025-11-19T09:29:29.021198Z", |
399 | | - "iopub.status.idle": "2025-11-19T09:29:29.686709Z", |
400 | | - "shell.execute_reply": "2025-11-19T09:29:29.686709Z" |
| 397 | + "iopub.execute_input": "2025-11-19T21:35:50.329516Z", |
| 398 | + "iopub.status.busy": "2025-11-19T21:35:50.329516Z", |
| 399 | + "iopub.status.idle": "2025-11-19T21:35:50.992145Z", |
| 400 | + "shell.execute_reply": "2025-11-19T21:35:50.992145Z" |
401 | 401 | } |
402 | 402 | }, |
403 | 403 | "outputs": [ |
|
0 commit comments