Skip to content

Commit 7dfb9e8

Browse files
committed
ADSL Gen 2
Adds ADSL Gen 2 feature parity to Azurite. See docs/design/ADLS-gen2-parity.md file for details.
1 parent 0a6b4d3 commit 7dfb9e8

25 files changed

+2515
-23
lines changed

README.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,8 @@ Following extension configurations are supported:
186186

187187
- `azurite.blobHost` Blob service listening endpoint, by default 127.0.0.1
188188
- `azurite.blobPort` Blob service listening port, by default 10000
189+
- `azurite.dfsHost` DFS service listening endpoint, by default 127.0.0.1
190+
- `azurite.dfsPort` DFS service listening port, by default 10004
189191
- `azurite.blobKeepAliveTimeout` Blob service keep alive timeout in seconds, by default 5
190192
- `azurite.queueHost` Queue service listening endpoint, by default 127.0.0.1
191193
- `azurite.queuePort` Queue service listening port, by default 10001
@@ -214,17 +216,18 @@ Following extension configurations are supported:
214216
> Note. Find more docker images tags in <https://mcr.microsoft.com/v2/azure-storage/azurite/tags/list>
215217
216218
```bash
217-
docker run -p 10000:10000 -p 10001:10001 -p 10002:10002 mcr.microsoft.com/azure-storage/azurite
219+
docker run -p 10000:10000 -p 10004:10004 -p 10001:10001 -p 10002:10002 mcr.microsoft.com/azure-storage/azurite
218220
```
219221

220222
`-p 10000:10000` will expose blob service's default listening port.
223+
`-p 10004:10004` will expose dfs service's default listening port.
221224
`-p 10001:10001` will expose queue service's default listening port.
222225
`-p 10002:10002` will expose table service's default listening port.
223226

224227
Or just run blob service:
225228

226229
```bash
227-
docker run -p 10000:10000 mcr.microsoft.com/azure-storage/azurite azurite-blob --blobHost 0.0.0.0
230+
docker run -p 10000:10000 -p 10004:10004 mcr.microsoft.com/azure-storage/azurite azurite-blob --blobHost 0.0.0.0 --dfsHost 0.0.0.0
228231
```
229232

230233
#### Run Azurite V3 docker image with customized persisted data location
@@ -317,6 +320,7 @@ You can customize the listening address per your requirements.
317320
318321
```cmd
319322
--blobHost 127.0.0.1
323+
--dfsHost 127.0.0.1
320324
--queueHost 127.0.0.1
321325
--tableHost 127.0.0.1
322326
```
@@ -325,13 +329,14 @@ You can customize the listening address per your requirements.
325329

326330
```cmd
327331
--blobHost 0.0.0.0
332+
--dfsHost 0.0.0.0
328333
--queueHost 0.0.0.0
329334
--tableHost 0.0.0.0
330335
```
331336

332337
### Listening Port Configuration
333338

334-
Optional. By default, Azurite V3 will listen to 10000 as blob service port, and 10001 as queue service port, and 10002 as the table service port.
339+
Optional. By default, Azurite V3 will listen to 10000 as blob service port, 10004 as dfs service port, 10001 as queue service port, and 10002 as the table service port.
335340
You can customize the listening port per your requirements.
336341

337342
> Warning: After using a customized port, you need to update connection string or configurations correspondingly in your Storage Tools or SDKs.
@@ -341,6 +346,7 @@ You can customize the listening port per your requirements.
341346

342347
```cmd
343348
--blobPort 8888
349+
--dfsPort 8889
344350
--queuePort 9999
345351
--tablePort 11111
346352
```
@@ -349,6 +355,7 @@ You can customize the listening port per your requirements.
349355

350356
```cmd
351357
--blobPort 0
358+
--dfsPort 0
352359
--queuePort 0
353360
--tablePort 0
354361
```

docs/designs/ADLS-gen2-parity.md

Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
# ADLS Gen2 Parity Implementation Plan
2+
3+
## Context
4+
5+
Azurite currently has a **thin DFS proxy layer** (port 10004) that translates a small subset of ADLS Gen2 DFS REST API calls to Blob REST API calls via HTTP proxying (axios). This covers only filesystem (container) create/delete/HEAD and account listing. Full ADLS Gen2 parity requires native support for path (file/directory) operations, the append-then-flush write pattern, rename/move, ACLs, and list paths — none of which can be achieved by simple query-parameter rewriting.
6+
7+
## Architectural Decision: Hybrid (Native DFS Handlers + Shared Stores)
8+
9+
Replace the HTTP proxy with a **native Express pipeline** in the DFS server that directly accesses `IBlobMetadataStore` and `IExtentStore` — the same store instances used by the blob server.
10+
11+
```
12+
Port 10000 (Blob API) → Blob Handlers → IBlobMetadataStore + IExtentStore
13+
Port 10004 (DFS API) → DFS Handlers → same IBlobMetadataStore + IExtentStore
14+
```
15+
16+
**Why not keep proxying?** DFS operations like List Paths, Create Directory, Rename, ACLs, and append-then-flush have no single blob API equivalent. Proxying would require multi-call orchestration, lose atomicity, and add latency.
17+
18+
### Directory Model
19+
20+
Directories stored as **zero-length BlockBlobs with `hdi_isfolder=true` metadata** — matching Azure's real internal behavior. No separate table needed.
21+
22+
### ACL Storage
23+
24+
New fields on `BlobModel`: `dfsAclOwner`, `dfsAclGroup`, `dfsAclPermissions`, `dfsAcl`. LokiJS is schemaless (just add fields); SQL needs ALTER TABLE.
25+
26+
---
27+
28+
## Phase 0: Foundation — Shared Store Access & HNS Flag
29+
30+
**Goal:** Wire DFS server to share stores with blob server; enable HNS mode.
31+
32+
| File | Change |
33+
|------|--------|
34+
| `src/blob/utils/constants.ts` | Set `EMULATOR_ACCOUNT_ISHIERARCHICALNAMESPACEENABLED = true` (or make configurable) |
35+
| `src/blob/DfsProxyServer.ts` → rename to `DfsServer.ts` | Accept `IBlobMetadataStore` + `IExtentStore` in constructor |
36+
| `src/blob/DfsProxyConfiguration.ts` → rename to `DfsConfiguration.ts` | Remove upstream host/port fields (no longer proxying) |
37+
| `src/blob/BlobServer.ts` | Expose `metadataStore` and `extentStore` via public getters |
38+
| `src/azurite.ts` | Pass shared stores to both BlobServer and DfsServer |
39+
| `src/blob/main.ts` | Same wiring for standalone blob+dfs mode |
40+
| `src/blob/DfsRequestListenerFactory.ts` | Rewrite: replace axios proxy with native Express pipeline + DFS routing |
41+
| `src/blob/IBlobEnvironment.ts`, `BlobEnvironment.ts`, `src/common/Environment.ts`, `VSCEnvironment.ts` | Add `--enableHierarchicalNamespace` option |
42+
43+
**Deliverable:** DFS server starts, shares data with blob, existing filesystem tests pass via direct store access.
44+
45+
---
46+
47+
## Phase 1: Path CRUD + List Paths
48+
49+
**Goal:** Create/delete/read files and directories, list paths — the core operations most ADLS Gen2 SDKs depend on.
50+
51+
### New files to create
52+
53+
| File | Purpose |
54+
|------|---------|
55+
| `src/blob/dfs/DfsContext.ts` | DFS request context (account, filesystem, path) — analogous to `BlobStorageContext` |
56+
| `src/blob/dfs/DfsOperation.ts` | Enum of DFS operations for dispatch |
57+
| `src/blob/dfs/DfsDispatchMiddleware.ts` | Routes requests by `resource` param, `action` param, method, and headers |
58+
| `src/blob/dfs/DfsErrorFactory.ts` | JSON error responses (`PathNotFound`, `DirectoryNotEmpty`, etc.) |
59+
| `src/blob/dfs/DfsSerializer.ts` | JSON response serialization (DFS uses JSON, not XML) |
60+
| `src/blob/dfs/handlers/FilesystemHandler.ts` | Filesystem ops → container store operations |
61+
| `src/blob/dfs/handlers/PathHandler.ts` | Path create/delete/read/getProperties + listPaths |
62+
63+
### Operations implemented
64+
65+
- **Create Path** (`PUT ?resource=file|directory`): Creates zero-length BlockBlob; directories get `hdi_isfolder=true` metadata; auto-creates intermediate directories
66+
- **Delete Path** (`DELETE`): Files → `deleteBlob()`; directories with `recursive=true` → delete all blobs with prefix; `recursive=false` → 409 if non-empty
67+
- **Get Path Properties** (`HEAD`): Returns `x-ms-resource-type: file|directory` header
68+
- **Read Path** (`GET`): Streams file content via `downloadBlob()` (follows `BlobHandler.download()` pattern)
69+
- **List Paths** (`GET ?resource=filesystem&directory=...&recursive=true|false`): JSON response with `paths` array; uses `listBlobs()` with prefix/delimiter; supports continuation via `x-ms-continuation`
70+
71+
### Existing files modified
72+
73+
| File | Change |
74+
|------|--------|
75+
| `src/blob/persistence/IBlobMetadataStore.ts` | Add `dfsResourceType`, ACL fields to `BlobModel` / `IBlobAdditionalProperties` |
76+
| `src/blob/persistence/LokiBlobMetadataStore.ts` | No schema changes needed (schemaless) |
77+
| `src/blob/persistence/SqlBlobMetadataStore.ts` | Add columns: `dfsResourceType`, `dfsAclOwner`, `dfsAclGroup`, `dfsAclPermissions`, `dfsAcl` |
78+
79+
### Tests
80+
81+
Extend `tests/blob/dfsProxy.test.ts`:
82+
- Create file / directory, verify as blob
83+
- Delete file / empty dir / non-empty dir with recursive
84+
- Get properties with `x-ms-resource-type`
85+
- Read file content
86+
- List paths recursive and non-recursive
87+
- Cross-API: create via DFS → read via Blob API and vice versa
88+
89+
---
90+
91+
## Phase 2: Append-Flush Write Pattern
92+
93+
**Goal:** Implement the DFS file write model (create empty → append chunks → flush to commit).
94+
95+
### Key insight
96+
97+
DFS append-then-flush maps directly to existing **BlockBlob uncommitted blocks** infrastructure: each `action=append` becomes a `stageBlock()`, and `action=flush` becomes `commitBlockList()`. No new persistence methods needed.
98+
99+
### Changes to `src/blob/dfs/handlers/PathHandler.ts`
100+
101+
- **`updatePath_Append(position, body)`**: Write body to `IExtentStore` as extent chunk; record as uncommitted block via `metadataStore.stageBlock()`; validate `position` matches current append offset; return 202
102+
- **`updatePath_Flush(position, close)`**: Commit all staged blocks via `metadataStore.commitBlockList()`; update content length to `position`; return 200 with updated ETag
103+
104+
### Tests
105+
106+
- Create → append 3 chunks → flush → read back, verify content
107+
- Append with wrong position → 400
108+
- Large file (multi-MB) append
109+
110+
---
111+
112+
## Phase 3: Rename/Move Path
113+
114+
**Goal:** Atomic rename for files and directories.
115+
116+
### New persistence methods
117+
118+
| Method | Description |
119+
|--------|-------------|
120+
| `IBlobMetadataStore.renameBlob(src, dest)` | Atomic rename of single blob (metadata-only, no extent copy) |
121+
| `IBlobMetadataStore.renameBlobsByPrefix(srcPrefix, destPrefix)` | Atomic rename of all blobs matching prefix (for directory rename) |
122+
123+
### PathHandler addition
124+
125+
- **`renamePath(x-ms-rename-source)`**: Parse source header → for files: `renameBlob()`; for directories: `renameBlobsByPrefix()`. Supports cross-filesystem rename and conditional headers.
126+
127+
### Persistence implementations
128+
129+
- **LokiJS**: Update document `containerName` and `name` properties
130+
- **SQL**: `UPDATE ... SET name = REPLACE(name, oldPrefix, newPrefix) WHERE name LIKE 'prefix%'` in transaction
131+
132+
### Tests
133+
134+
- Rename file within filesystem / across filesystems
135+
- Rename directory (verify children moved)
136+
- Rename non-existent → 404
137+
- Rename with conditional headers
138+
139+
---
140+
141+
## Phase 4: ACL Operations
142+
143+
**Goal:** POSIX ACL get/set for emulator parity.
144+
145+
### PathHandler additions
146+
147+
- **`getAccessControl()`**: Read ACL fields from blob record → return as `x-ms-owner`, `x-ms-group`, `x-ms-permissions`, `x-ms-acl` headers. Defaults: `$superuser`/`$superuser`/`rwxr-x---`
148+
- **`setAccessControl(owner, group, permissions, acl)`**: Validate ACL format → update blob record
149+
- **`setAccessControlRecursive(mode, acl)`**: `mode` = set|modify|remove; iterate blobs under prefix; support continuation; return JSON with `directoriesSuccessful`, `filesSuccessful`, `failureCount`
150+
151+
### Tests
152+
153+
- Set/get ACL on file and directory
154+
- Recursive ACL set on directory tree
155+
- Default ACL values on new paths
156+
157+
---
158+
159+
## Phase 5: Polish & Remaining Operations
160+
161+
- **Set Filesystem Properties** (`PATCH ?resource=filesystem`) → `setContainerMetadata()`
162+
- **`x-ms-properties` encoding/decoding** — new `src/blob/dfs/DfsPropertyEncoding.ts` utility (base64 key=value pairs)
163+
- **DFS JSON error format**: `{"error":{"code":"...","message":"..."}}`
164+
- **Lease support** on DFS paths (reuse blob lease infrastructure)
165+
- **SAS validation** on DFS endpoints (reuse existing authenticators)
166+
- **Content-MD5/CRC64 validation** on append
167+
168+
---
169+
170+
## Verification Plan
171+
172+
1. **Unit tests**: Extend `tests/blob/dfsProxy.test.ts` per phase
173+
2. **Cross-API tests**: Verify DFS-created data is visible via Blob API and vice versa
174+
3. **SDK integration**: Test with `@azure/storage-file-datalake` Node.js SDK against the emulator
175+
4. **Manual smoke test**: Run Azurite, use Azure Storage Explorer with DFS endpoint
176+
5. **Existing blob tests**: Ensure `npm test` still passes (no regression)
177+
178+
---
179+
180+
## Critical Reference Files
181+
182+
- `src/blob/handlers/ContainerHandler.ts` — pattern for handler ↔ store interaction
183+
- `src/blob/handlers/BlockBlobHandler.ts``stageBlock`/`commitBlockList` for append-flush reuse
184+
- `src/blob/handlers/BlobHandler.ts``download()` pattern for Read Path
185+
- `src/blob/persistence/IBlobMetadataStore.ts` — store interface to extend
186+
- `src/blob/generated/handlers/` — handler interface patterns
187+
- `src/blob/middlewares/blobStorageContext.middleware.ts` — context extraction pattern for DfsContext

package.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,16 @@
208208
"default": 10000,
209209
"description": "Blob service listening port, by default 10000"
210210
},
211+
"azurite.dfsHost": {
212+
"type": "string",
213+
"default": "127.0.0.1",
214+
"description": "DFS service listening endpoint, by default 127.0.0.1"
215+
},
216+
"azurite.dfsPort": {
217+
"type": "number",
218+
"default": 10004,
219+
"description": "DFS service listening port, by default 10004"
220+
},
211221
"azurite.blobKeepAliveTimeout": {
212222
"type": "number",
213223
"default": 5,

src/azurite.ts

Lines changed: 36 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ import {
1818
} from "./queue/utils/constants";
1919
import SqlBlobServer from "./blob/SqlBlobServer";
2020
import BlobServer from "./blob/BlobServer";
21+
import DfsServer from "./blob/DfsServer";
22+
import DfsConfiguration from "./blob/DfsConfiguration";
2123

2224
import TableConfiguration from "./table/TableConfiguration";
2325
import TableServer from "./table/TableServer";
@@ -30,11 +32,14 @@ import { AzuriteTelemetryClient } from "./common/Telemetry";
3032

3133
function shutdown(
3234
blobServer: BlobServer | SqlBlobServer,
35+
dfsServer: DfsServer,
3336
queueServer: QueueServer,
3437
tableServer: TableServer
3538
) {
3639
const blobBeforeCloseMessage = `Azurite Blob service is closing...`;
3740
const blobAfterCloseMessage = `Azurite Blob service successfully closed`;
41+
const dfsBeforeCloseMessage = `Azurite DFS service is closing...`;
42+
const dfsAfterCloseMessage = `Azurite DFS service successfully closed`;
3843
const queueBeforeCloseMessage = `Azurite Queue service is closing...`;
3944
const queueAfterCloseMessage = `Azurite Queue service successfully closed`;
4045
const tableBeforeCloseMessage = `Azurite Table service is closing...`;
@@ -47,6 +52,11 @@ function shutdown(
4752
console.log(blobAfterCloseMessage);
4853
});
4954

55+
console.log(dfsBeforeCloseMessage);
56+
dfsServer.close().then(() => {
57+
console.log(dfsAfterCloseMessage);
58+
});
59+
5060
console.log(queueBeforeCloseMessage);
5161
queueServer.close().then(() => {
5262
console.log(queueAfterCloseMessage);
@@ -79,6 +89,21 @@ async function main() {
7989
const blobServerFactory = new BlobServerFactory();
8090
const blobServer = await blobServerFactory.createServer(env);
8191
const blobConfig = blobServer.config;
92+
const dfsConfig = new DfsConfiguration(
93+
env.dfsHost(),
94+
env.dfsPort(),
95+
env.blobKeepAliveTimeout(),
96+
env.cert(),
97+
env.key(),
98+
env.pwd()
99+
);
100+
const blobServerAny = blobServer as any;
101+
const dfsServer = new DfsServer(
102+
dfsConfig,
103+
blobServerAny.metadataStore,
104+
blobServerAny.extentStore,
105+
blobServerAny.accountDataStore
106+
);
82107

83108
// TODO: Align with blob DEFAULT_BLOB_PERSISTENCE_ARRAY
84109
// TODO: Join for all paths in the array
@@ -150,6 +175,14 @@ async function main() {
150175
`Azurite Blob service is successfully listening at ${blobServer.getHttpServerAddress()}`
151176
);
152177

178+
console.log(
179+
`Azurite DFS service is starting at ${dfsConfig.getHttpServerAddress()}`
180+
);
181+
await dfsServer.start();
182+
console.log(
183+
`Azurite DFS service is successfully listening at ${dfsServer.getHttpServerAddress()}`
184+
);
185+
153186
// Start server
154187
console.log(
155188
`Azurite Queue service is starting at ${queueConfig.getHttpServerAddress()}`
@@ -175,11 +208,11 @@ async function main() {
175208
process
176209
.once("message", (msg) => {
177210
if (msg === "shutdown") {
178-
shutdown(blobServer, queueServer, tableServer);
211+
shutdown(blobServer, dfsServer, queueServer, tableServer);
179212
}
180213
})
181-
.once("SIGINT", () => shutdown(blobServer, queueServer, tableServer))
182-
.once("SIGTERM", () => shutdown(blobServer, queueServer, tableServer));
214+
.once("SIGINT", () => shutdown(blobServer, dfsServer, queueServer, tableServer))
215+
.once("SIGTERM", () => shutdown(blobServer, dfsServer, queueServer, tableServer));
183216
}
184217

185218
main().catch((err) => {

0 commit comments

Comments
 (0)