|
| 1 | +# ADLS Gen2 Parity Implementation Plan |
| 2 | + |
| 3 | +## Context |
| 4 | + |
| 5 | +Azurite currently has a **thin DFS proxy layer** (port 10004) that translates a small subset of ADLS Gen2 DFS REST API calls to Blob REST API calls via HTTP proxying (axios). This covers only filesystem (container) create/delete/HEAD and account listing. Full ADLS Gen2 parity requires native support for path (file/directory) operations, the append-then-flush write pattern, rename/move, ACLs, and list paths — none of which can be achieved by simple query-parameter rewriting. |
| 6 | + |
| 7 | +## Architectural Decision: Hybrid (Native DFS Handlers + Shared Stores) |
| 8 | + |
| 9 | +Replace the HTTP proxy with a **native Express pipeline** in the DFS server that directly accesses `IBlobMetadataStore` and `IExtentStore` — the same store instances used by the blob server. |
| 10 | + |
| 11 | +``` |
| 12 | +Port 10000 (Blob API) → Blob Handlers → IBlobMetadataStore + IExtentStore |
| 13 | +Port 10004 (DFS API) → DFS Handlers → same IBlobMetadataStore + IExtentStore |
| 14 | +``` |
| 15 | + |
| 16 | +**Why not keep proxying?** DFS operations like List Paths, Create Directory, Rename, ACLs, and append-then-flush have no single blob API equivalent. Proxying would require multi-call orchestration, lose atomicity, and add latency. |
| 17 | + |
| 18 | +### Directory Model |
| 19 | + |
| 20 | +Directories stored as **zero-length BlockBlobs with `hdi_isfolder=true` metadata** — matching Azure's real internal behavior. No separate table needed. |
| 21 | + |
| 22 | +### ACL Storage |
| 23 | + |
| 24 | +New fields on `BlobModel`: `dfsAclOwner`, `dfsAclGroup`, `dfsAclPermissions`, `dfsAcl`. LokiJS is schemaless (just add fields); SQL needs ALTER TABLE. |
| 25 | + |
| 26 | +--- |
| 27 | + |
| 28 | +## Phase 0: Foundation — Shared Store Access & HNS Flag |
| 29 | + |
| 30 | +**Goal:** Wire DFS server to share stores with blob server; enable HNS mode. |
| 31 | + |
| 32 | +| File | Change | |
| 33 | +|------|--------| |
| 34 | +| `src/blob/utils/constants.ts` | Set `EMULATOR_ACCOUNT_ISHIERARCHICALNAMESPACEENABLED = true` (or make configurable) | |
| 35 | +| `src/blob/DfsProxyServer.ts` → rename to `DfsServer.ts` | Accept `IBlobMetadataStore` + `IExtentStore` in constructor | |
| 36 | +| `src/blob/DfsProxyConfiguration.ts` → rename to `DfsConfiguration.ts` | Remove upstream host/port fields (no longer proxying) | |
| 37 | +| `src/blob/BlobServer.ts` | Expose `metadataStore` and `extentStore` via public getters | |
| 38 | +| `src/azurite.ts` | Pass shared stores to both BlobServer and DfsServer | |
| 39 | +| `src/blob/main.ts` | Same wiring for standalone blob+dfs mode | |
| 40 | +| `src/blob/DfsRequestListenerFactory.ts` | Rewrite: replace axios proxy with native Express pipeline + DFS routing | |
| 41 | +| `src/blob/IBlobEnvironment.ts`, `BlobEnvironment.ts`, `src/common/Environment.ts`, `VSCEnvironment.ts` | Add `--enableHierarchicalNamespace` option | |
| 42 | + |
| 43 | +**Deliverable:** DFS server starts, shares data with blob, existing filesystem tests pass via direct store access. |
| 44 | + |
| 45 | +--- |
| 46 | + |
| 47 | +## Phase 1: Path CRUD + List Paths |
| 48 | + |
| 49 | +**Goal:** Create/delete/read files and directories, list paths — the core operations most ADLS Gen2 SDKs depend on. |
| 50 | + |
| 51 | +### New files to create |
| 52 | + |
| 53 | +| File | Purpose | |
| 54 | +|------|---------| |
| 55 | +| `src/blob/dfs/DfsContext.ts` | DFS request context (account, filesystem, path) — analogous to `BlobStorageContext` | |
| 56 | +| `src/blob/dfs/DfsOperation.ts` | Enum of DFS operations for dispatch | |
| 57 | +| `src/blob/dfs/DfsDispatchMiddleware.ts` | Routes requests by `resource` param, `action` param, method, and headers | |
| 58 | +| `src/blob/dfs/DfsErrorFactory.ts` | JSON error responses (`PathNotFound`, `DirectoryNotEmpty`, etc.) | |
| 59 | +| `src/blob/dfs/DfsSerializer.ts` | JSON response serialization (DFS uses JSON, not XML) | |
| 60 | +| `src/blob/dfs/handlers/FilesystemHandler.ts` | Filesystem ops → container store operations | |
| 61 | +| `src/blob/dfs/handlers/PathHandler.ts` | Path create/delete/read/getProperties + listPaths | |
| 62 | + |
| 63 | +### Operations implemented |
| 64 | + |
| 65 | +- **Create Path** (`PUT ?resource=file|directory`): Creates zero-length BlockBlob; directories get `hdi_isfolder=true` metadata; auto-creates intermediate directories |
| 66 | +- **Delete Path** (`DELETE`): Files → `deleteBlob()`; directories with `recursive=true` → delete all blobs with prefix; `recursive=false` → 409 if non-empty |
| 67 | +- **Get Path Properties** (`HEAD`): Returns `x-ms-resource-type: file|directory` header |
| 68 | +- **Read Path** (`GET`): Streams file content via `downloadBlob()` (follows `BlobHandler.download()` pattern) |
| 69 | +- **List Paths** (`GET ?resource=filesystem&directory=...&recursive=true|false`): JSON response with `paths` array; uses `listBlobs()` with prefix/delimiter; supports continuation via `x-ms-continuation` |
| 70 | + |
| 71 | +### Existing files modified |
| 72 | + |
| 73 | +| File | Change | |
| 74 | +|------|--------| |
| 75 | +| `src/blob/persistence/IBlobMetadataStore.ts` | Add `dfsResourceType`, ACL fields to `BlobModel` / `IBlobAdditionalProperties` | |
| 76 | +| `src/blob/persistence/LokiBlobMetadataStore.ts` | No schema changes needed (schemaless) | |
| 77 | +| `src/blob/persistence/SqlBlobMetadataStore.ts` | Add columns: `dfsResourceType`, `dfsAclOwner`, `dfsAclGroup`, `dfsAclPermissions`, `dfsAcl` | |
| 78 | + |
| 79 | +### Tests |
| 80 | + |
| 81 | +Extend `tests/blob/dfsProxy.test.ts`: |
| 82 | +- Create file / directory, verify as blob |
| 83 | +- Delete file / empty dir / non-empty dir with recursive |
| 84 | +- Get properties with `x-ms-resource-type` |
| 85 | +- Read file content |
| 86 | +- List paths recursive and non-recursive |
| 87 | +- Cross-API: create via DFS → read via Blob API and vice versa |
| 88 | + |
| 89 | +--- |
| 90 | + |
| 91 | +## Phase 2: Append-Flush Write Pattern |
| 92 | + |
| 93 | +**Goal:** Implement the DFS file write model (create empty → append chunks → flush to commit). |
| 94 | + |
| 95 | +### Key insight |
| 96 | + |
| 97 | +DFS append-then-flush maps directly to existing **BlockBlob uncommitted blocks** infrastructure: each `action=append` becomes a `stageBlock()`, and `action=flush` becomes `commitBlockList()`. No new persistence methods needed. |
| 98 | + |
| 99 | +### Changes to `src/blob/dfs/handlers/PathHandler.ts` |
| 100 | + |
| 101 | +- **`updatePath_Append(position, body)`**: Write body to `IExtentStore` as extent chunk; record as uncommitted block via `metadataStore.stageBlock()`; validate `position` matches current append offset; return 202 |
| 102 | +- **`updatePath_Flush(position, close)`**: Commit all staged blocks via `metadataStore.commitBlockList()`; update content length to `position`; return 200 with updated ETag |
| 103 | + |
| 104 | +### Tests |
| 105 | + |
| 106 | +- Create → append 3 chunks → flush → read back, verify content |
| 107 | +- Append with wrong position → 400 |
| 108 | +- Large file (multi-MB) append |
| 109 | + |
| 110 | +--- |
| 111 | + |
| 112 | +## Phase 3: Rename/Move Path |
| 113 | + |
| 114 | +**Goal:** Atomic rename for files and directories. |
| 115 | + |
| 116 | +### New persistence methods |
| 117 | + |
| 118 | +| Method | Description | |
| 119 | +|--------|-------------| |
| 120 | +| `IBlobMetadataStore.renameBlob(src, dest)` | Atomic rename of single blob (metadata-only, no extent copy) | |
| 121 | +| `IBlobMetadataStore.renameBlobsByPrefix(srcPrefix, destPrefix)` | Atomic rename of all blobs matching prefix (for directory rename) | |
| 122 | + |
| 123 | +### PathHandler addition |
| 124 | + |
| 125 | +- **`renamePath(x-ms-rename-source)`**: Parse source header → for files: `renameBlob()`; for directories: `renameBlobsByPrefix()`. Supports cross-filesystem rename and conditional headers. |
| 126 | + |
| 127 | +### Persistence implementations |
| 128 | + |
| 129 | +- **LokiJS**: Update document `containerName` and `name` properties |
| 130 | +- **SQL**: `UPDATE ... SET name = REPLACE(name, oldPrefix, newPrefix) WHERE name LIKE 'prefix%'` in transaction |
| 131 | + |
| 132 | +### Tests |
| 133 | + |
| 134 | +- Rename file within filesystem / across filesystems |
| 135 | +- Rename directory (verify children moved) |
| 136 | +- Rename non-existent → 404 |
| 137 | +- Rename with conditional headers |
| 138 | + |
| 139 | +--- |
| 140 | + |
| 141 | +## Phase 4: ACL Operations |
| 142 | + |
| 143 | +**Goal:** POSIX ACL get/set for emulator parity. |
| 144 | + |
| 145 | +### PathHandler additions |
| 146 | + |
| 147 | +- **`getAccessControl()`**: Read ACL fields from blob record → return as `x-ms-owner`, `x-ms-group`, `x-ms-permissions`, `x-ms-acl` headers. Defaults: `$superuser`/`$superuser`/`rwxr-x---` |
| 148 | +- **`setAccessControl(owner, group, permissions, acl)`**: Validate ACL format → update blob record |
| 149 | +- **`setAccessControlRecursive(mode, acl)`**: `mode` = set|modify|remove; iterate blobs under prefix; support continuation; return JSON with `directoriesSuccessful`, `filesSuccessful`, `failureCount` |
| 150 | + |
| 151 | +### Tests |
| 152 | + |
| 153 | +- Set/get ACL on file and directory |
| 154 | +- Recursive ACL set on directory tree |
| 155 | +- Default ACL values on new paths |
| 156 | + |
| 157 | +--- |
| 158 | + |
| 159 | +## Phase 5: Polish & Remaining Operations |
| 160 | + |
| 161 | +- **Set Filesystem Properties** (`PATCH ?resource=filesystem`) → `setContainerMetadata()` |
| 162 | +- **`x-ms-properties` encoding/decoding** — new `src/blob/dfs/DfsPropertyEncoding.ts` utility (base64 key=value pairs) |
| 163 | +- **DFS JSON error format**: `{"error":{"code":"...","message":"..."}}` |
| 164 | +- **Lease support** on DFS paths (reuse blob lease infrastructure) |
| 165 | +- **SAS validation** on DFS endpoints (reuse existing authenticators) |
| 166 | +- **Content-MD5/CRC64 validation** on append |
| 167 | + |
| 168 | +--- |
| 169 | + |
| 170 | +## Verification Plan |
| 171 | + |
| 172 | +1. **Unit tests**: Extend `tests/blob/dfsProxy.test.ts` per phase |
| 173 | +2. **Cross-API tests**: Verify DFS-created data is visible via Blob API and vice versa |
| 174 | +3. **SDK integration**: Test with `@azure/storage-file-datalake` Node.js SDK against the emulator |
| 175 | +4. **Manual smoke test**: Run Azurite, use Azure Storage Explorer with DFS endpoint |
| 176 | +5. **Existing blob tests**: Ensure `npm test` still passes (no regression) |
| 177 | + |
| 178 | +--- |
| 179 | + |
| 180 | +## Critical Reference Files |
| 181 | + |
| 182 | +- `src/blob/handlers/ContainerHandler.ts` — pattern for handler ↔ store interaction |
| 183 | +- `src/blob/handlers/BlockBlobHandler.ts` — `stageBlock`/`commitBlockList` for append-flush reuse |
| 184 | +- `src/blob/handlers/BlobHandler.ts` — `download()` pattern for Read Path |
| 185 | +- `src/blob/persistence/IBlobMetadataStore.ts` — store interface to extend |
| 186 | +- `src/blob/generated/handlers/` — handler interface patterns |
| 187 | +- `src/blob/middlewares/blobStorageContext.middleware.ts` — context extraction pattern for DfsContext |
0 commit comments