Skip to content

Commit 6189123

Browse files
committed
Merge branch 'bc/sha1-256-interop-02' into seen
The code to maintain mapping between object names in multiple hash functions is being added, written in Rust. * bc/sha1-256-interop-02: SQUASH??? downgrade build.rs syntax object-file-convert: always make sure object ID algo is valid rust: add a small wrapper around the hashfile code rust: add a new binary loose object map format rust: add functionality to hash an object rust: add a build.rs script for tests hash: expose hash context functions to Rust write-or-die: add an fsync component for the loose object map csum-file: define hashwrite's count as a uint32_t hash: add a function to look up hash algo structs rust: add a hash algorithm abstraction rust: add a ObjectID struct hash: use uint32_t for object_id algorithm conversion: don't crash when no destination algo repository: require Rust support for interoperability
2 parents 0bda323 + d614cbc commit 6189123

24 files changed

+1619
-74
lines changed

Documentation/gitformat-loose.adoc

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ SYNOPSIS
1010
--------
1111
[verse]
1212
$GIT_DIR/objects/[0-9a-f][0-9a-f]/*
13+
$GIT_DIR/objects/loose-object-idx
14+
$GIT_DIR/objects/loose-map/map-*.map
1315
1416
DESCRIPTION
1517
-----------
@@ -48,6 +50,108 @@ stored under
4850
Similarly, a blob containing the contents `abc` would have the uncompressed
4951
data of `blob 3\0abc`.
5052
53+
== Loose object mapping
54+
55+
When the `compatObjectFormat` option is used, Git needs to store a mapping
56+
between the repository's main algorithm and the compatibility algorithm. There
57+
are two formats for this: the legacy mapping and the modern mapping.
58+
59+
=== Legacy mapping
60+
61+
The compatibility mapping is stored in a file called
62+
`$GIT_DIR/objects/loose-object-idx`. The format of this file looks like this:
63+
64+
# loose-object-idx
65+
(main-name SP compat-name LF)*
66+
67+
`main-name` refers to hexadecimal object ID of the object in the main
68+
repository format and `compat-name` refers to the same thing, but for the
69+
compatibility format.
70+
71+
This format is read if it exists but is not written.
72+
73+
Note that carriage returns are not permitted in this file, regardless of the
74+
host system or configuration.
75+
76+
=== Modern mapping
77+
78+
The modern mapping consists of a set of files under `$GIT_DIR/objects/loose`
79+
ending in `.map`. The portion of the filename before the extension is that of
80+
the hash checksum in hex format.
81+
82+
`git pack-objects` will repack existing entries into one file, removing any
83+
unnecessary objects, such as obsolete shallow entries or loose objects that
84+
have been packed.
85+
86+
==== Mapping file format
87+
88+
- A header appears at the beginning and consists of the following:
89+
* A 4-byte mapping signature: `LMAP`
90+
* 4-byte version number: 1
91+
* 4-byte length of the header section.
92+
* 4-byte number of objects declared in this map file.
93+
* 4-byte number of object formats declared in this map file.
94+
* For each object format:
95+
** 4-byte format identifier (e.g., `sha1` for SHA-1)
96+
** 4-byte length in bytes of shortened object names. This is the
97+
shortest possible length needed to make names in the shortened
98+
object name table unambiguous.
99+
** 8-byte integer, recording where tables relating to this format
100+
are stored in this index file, as an offset from the beginning.
101+
* 8-byte offset to the trailer from the beginning of this file.
102+
* Zero or more additional key/value pairs (4-byte key, 4-byte value), which
103+
may optionally declare one or more chunks. No chunks are currently
104+
defined. Readers must ignore unrecognized keys.
105+
- Zero or more NUL bytes. These are used to improve the alignment of the
106+
4-byte quantities below.
107+
- Tables for the first object format:
108+
* A sorted table of shortened object names. These are prefixes of the names
109+
of all objects in this file, packed together without offset values to
110+
reduce the cache footprint of the binary search for a specific object name.
111+
* A sorted table of full object names.
112+
* A table of 4-byte metadata values.
113+
* Zero or more chunks. A chunk starts with a four-byte chunk identifier and
114+
a four-byte parameter (which, if unneeded, is all zeros) and an eight-byte
115+
size (not including the identifier, parameter, or size), plus the chunk
116+
data.
117+
- Zero or more NUL bytes.
118+
- Tables for subsequent object formats:
119+
* A sorted table of shortened object names. These are prefixes of the names
120+
of all objects in this file, packed together without offset values to
121+
reduce the cache footprint of the binary search for a specific object name.
122+
* A table of full object names in the order specified by the first object format.
123+
* A table of 4-byte values mapping object name order to the order of the
124+
first object format. For an object in the table of sorted shortened object
125+
names, the value at the corresponding index in this table is the index in
126+
the previous table for that same object.
127+
* Zero or more NUL bytes.
128+
- The trailer consists of the following:
129+
* Hash checksum of all of the above.
130+
131+
The lower six bits of each metadata table contain a type field indicating the
132+
reason that this object is stored:
133+
134+
0::
135+
Reserved.
136+
1::
137+
This object is stored as a loose object in the repository.
138+
2::
139+
This object is a shallow entry. The mapping refers to a shallow value
140+
returned by a remote server.
141+
3::
142+
This object is a submodule entry. The mapping refers to the commit stored
143+
representing a submodule.
144+
145+
Other data may be stored in this field in the future. Bits that are not used
146+
must be zero.
147+
148+
All 4-byte numbers are in network order and must be 4-byte aligned in the file,
149+
so the NUL padding may be required in some cases.
150+
151+
Note that the hash at the end of the file is in whatever the repository's main
152+
algorithm is. In the usual case when there are multiple algorithms, the main
153+
algorithm will be SHA-256 and the compatibility algorithm will be SHA-1.
154+
51155
GIT
52156
---
53157
Part of the linkgit:git[1] suite

Makefile

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1534,7 +1534,10 @@ CLAR_TEST_OBJS += $(UNIT_TEST_DIR)/unit-test.o
15341534

15351535
UNIT_TEST_OBJS += $(UNIT_TEST_DIR)/test-lib.o
15361536

1537+
RUST_SOURCES += src/csum_file.rs
1538+
RUST_SOURCES += src/hash.rs
15371539
RUST_SOURCES += src/lib.rs
1540+
RUST_SOURCES += src/loose.rs
15381541
RUST_SOURCES += src/varint.rs
15391542

15401543
GIT-VERSION-FILE: FORCE
@@ -2978,7 +2981,7 @@ scalar$X: scalar.o GIT-LDFLAGS $(GITLIBS)
29782981
$(LIB_FILE): $(LIB_OBJS)
29792982
$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
29802983

2981-
$(RUST_LIB): Cargo.toml $(RUST_SOURCES)
2984+
$(RUST_LIB): Cargo.toml $(RUST_SOURCES) $(XDIFF_LIB) $(LIB_FILE) $(REFTABLE_LIB)
29822985
$(QUIET_CARGO)cargo build $(CARGO_ARGS)
29832986

29842987
.PHONY: rust

build.rs

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
// This program is free software; you can redistribute it and/or modify
2+
// it under the terms of the GNU General Public License as published by
3+
// the Free Software Foundation: version 2 of the License, dated June 1991.
4+
//
5+
// This program is distributed in the hope that it will be useful,
6+
// but WITHOUT ANY WARRANTY; without even the implied warranty of
7+
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
8+
// GNU General Public License for more details.
9+
//
10+
// You should have received a copy of the GNU General Public License along
11+
// with this program; if not, see <https://www.gnu.org/licenses/>.
12+
13+
fn main() {
14+
println!("cargo:rustc-link-search=.");
15+
println!("cargo:rustc-link-search=reftable");
16+
println!("cargo:rustc-link-search=xdiff");
17+
println!("cargo:rustc-link-lib=git");
18+
println!("cargo:rustc-link-lib=reftable");
19+
println!("cargo:rustc-link-lib=z");
20+
println!("cargo:rustc-link-lib=xdiff");
21+
}

csum-file.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ void discard_hashfile(struct hashfile *f)
110110
free_hashfile(f);
111111
}
112112

113-
void hashwrite(struct hashfile *f, const void *buf, unsigned int count)
113+
void hashwrite(struct hashfile *f, const void *buf, uint32_t count)
114114
{
115115
while (count) {
116116
unsigned left = f->buffer_len - f->offset;

csum-file.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ void free_hashfile(struct hashfile *f);
6363
*/
6464
int finalize_hashfile(struct hashfile *, unsigned char *, enum fsync_component, unsigned int);
6565
void discard_hashfile(struct hashfile *);
66-
void hashwrite(struct hashfile *, const void *, unsigned int);
66+
void hashwrite(struct hashfile *, const void *, uint32_t);
6767
void hashflush(struct hashfile *f);
6868
void crc32_begin(struct hashfile *);
6969
uint32_t crc32_end(struct hashfile *);

hash.c

Lines changed: 43 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -241,7 +241,47 @@ const char *empty_tree_oid_hex(const struct git_hash_algo *algop)
241241
return oid_to_hex_r(buf, algop->empty_tree);
242242
}
243243

244-
int hash_algo_by_name(const char *name)
244+
const struct git_hash_algo *hash_algo_ptr_by_offset(uint32_t algo)
245+
{
246+
return &hash_algos[algo];
247+
}
248+
249+
struct git_hash_ctx *git_hash_alloc(void)
250+
{
251+
return malloc(sizeof(struct git_hash_ctx));
252+
}
253+
254+
void git_hash_free(struct git_hash_ctx *ctx)
255+
{
256+
free(ctx);
257+
}
258+
259+
void git_hash_init(struct git_hash_ctx *ctx, const struct git_hash_algo *algop)
260+
{
261+
algop->init_fn(ctx);
262+
}
263+
264+
void git_hash_clone(struct git_hash_ctx *dst, const struct git_hash_ctx *src)
265+
{
266+
src->algop->clone_fn(dst, src);
267+
}
268+
269+
void git_hash_update(struct git_hash_ctx *ctx, const void *in, size_t len)
270+
{
271+
ctx->algop->update_fn(ctx, in, len);
272+
}
273+
274+
void git_hash_final(unsigned char *hash, struct git_hash_ctx *ctx)
275+
{
276+
ctx->algop->final_fn(hash, ctx);
277+
}
278+
279+
void git_hash_final_oid(struct object_id *oid, struct git_hash_ctx *ctx)
280+
{
281+
ctx->algop->final_oid_fn(oid, ctx);
282+
}
283+
284+
uint32_t hash_algo_by_name(const char *name)
245285
{
246286
if (!name)
247287
return GIT_HASH_UNKNOWN;
@@ -251,15 +291,15 @@ int hash_algo_by_name(const char *name)
251291
return GIT_HASH_UNKNOWN;
252292
}
253293

254-
int hash_algo_by_id(uint32_t format_id)
294+
uint32_t hash_algo_by_id(uint32_t format_id)
255295
{
256296
for (size_t i = 1; i < GIT_HASH_NALGOS; i++)
257297
if (format_id == hash_algos[i].format_id)
258298
return i;
259299
return GIT_HASH_UNKNOWN;
260300
}
261301

262-
int hash_algo_by_length(size_t len)
302+
uint32_t hash_algo_by_length(size_t len)
263303
{
264304
for (size_t i = 1; i < GIT_HASH_NALGOS; i++)
265305
if (len == hash_algos[i].rawsz)

hash.h

Lines changed: 13 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -211,7 +211,7 @@ static inline void git_SHA256_Clone(git_SHA256_CTX *dst, const git_SHA256_CTX *s
211211

212212
struct object_id {
213213
unsigned char hash[GIT_MAX_RAWSZ];
214-
int algo; /* XXX requires 4-byte alignment */
214+
uint32_t algo; /* XXX requires 4-byte alignment */
215215
};
216216

217217
#define GET_OID_QUIETLY 01
@@ -320,37 +320,25 @@ struct git_hash_algo {
320320
};
321321
extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS];
322322

323-
static inline void git_hash_clone(struct git_hash_ctx *dst, const struct git_hash_ctx *src)
324-
{
325-
src->algop->clone_fn(dst, src);
326-
}
327-
328-
static inline void git_hash_update(struct git_hash_ctx *ctx, const void *in, size_t len)
329-
{
330-
ctx->algop->update_fn(ctx, in, len);
331-
}
332-
333-
static inline void git_hash_final(unsigned char *hash, struct git_hash_ctx *ctx)
334-
{
335-
ctx->algop->final_fn(hash, ctx);
336-
}
337-
338-
static inline void git_hash_final_oid(struct object_id *oid, struct git_hash_ctx *ctx)
339-
{
340-
ctx->algop->final_oid_fn(oid, ctx);
341-
}
342-
323+
void git_hash_init(struct git_hash_ctx *ctx, const struct git_hash_algo *algop);
324+
void git_hash_clone(struct git_hash_ctx *dst, const struct git_hash_ctx *src);
325+
void git_hash_update(struct git_hash_ctx *ctx, const void *in, size_t len);
326+
void git_hash_final(unsigned char *hash, struct git_hash_ctx *ctx);
327+
void git_hash_final_oid(struct object_id *oid, struct git_hash_ctx *ctx);
328+
const struct git_hash_algo *hash_algo_ptr_by_offset(uint32_t algo);
329+
struct git_hash_ctx *git_hash_alloc(void);
330+
void git_hash_free(struct git_hash_ctx *ctx);
343331
/*
344332
* Return a GIT_HASH_* constant based on the name. Returns GIT_HASH_UNKNOWN if
345333
* the name doesn't match a known algorithm.
346334
*/
347-
int hash_algo_by_name(const char *name);
335+
uint32_t hash_algo_by_name(const char *name);
348336
/* Identical, except based on the format ID. */
349-
int hash_algo_by_id(uint32_t format_id);
337+
uint32_t hash_algo_by_id(uint32_t format_id);
350338
/* Identical, except based on the length. */
351-
int hash_algo_by_length(size_t len);
339+
uint32_t hash_algo_by_length(size_t len);
352340
/* Identical, except for a pointer to struct git_hash_algo. */
353-
static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
341+
static inline uint32_t hash_algo_by_ptr(const struct git_hash_algo *p)
354342
{
355343
size_t i;
356344
for (i = 0; i < GIT_HASH_NALGOS; i++) {

object-file-convert.c

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,17 +13,25 @@
1313
#include "gpg-interface.h"
1414
#include "object-file-convert.h"
1515

16-
int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
16+
int repo_oid_to_algop(struct repository *repo, const struct object_id *srcoid,
1717
const struct git_hash_algo *to, struct object_id *dest)
1818
{
1919
/*
2020
* If the source algorithm is not set, then we're using the
2121
* default hash algorithm for that object.
2222
*/
2323
const struct git_hash_algo *from =
24-
src->algo ? &hash_algos[src->algo] : repo->hash_algo;
24+
srcoid->algo ? &hash_algos[srcoid->algo] : repo->hash_algo;
25+
struct object_id temp;
26+
const struct object_id *src = srcoid;
2527

26-
if (from == to) {
28+
if (!srcoid->algo) {
29+
oidcpy(&temp, srcoid);
30+
temp.algo = hash_algo_by_ptr(repo->hash_algo);
31+
src = &temp;
32+
}
33+
34+
if (from == to || !to) {
2735
if (src != dest)
2836
oidcpy(dest, src);
2937
return 0;

oidtree.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ struct oidtree_iter_data {
1010
oidtree_iter fn;
1111
void *arg;
1212
size_t *last_nibble_at;
13-
int algo;
13+
uint32_t algo;
1414
uint8_t last_byte;
1515
};
1616

repository.c

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
#include "repository.h"
44
#include "odb.h"
55
#include "config.h"
6+
#include "gettext.h"
67
#include "object.h"
78
#include "lockfile.h"
89
#include "path.h"
@@ -38,7 +39,7 @@ struct repository *the_repository = &the_repo;
3839
static void set_default_hash_algo(struct repository *repo)
3940
{
4041
const char *hash_name;
41-
int algo;
42+
uint32_t algo;
4243

4344
hash_name = getenv("GIT_TEST_DEFAULT_HASH_ALGO");
4445
if (!hash_name)
@@ -185,18 +186,24 @@ void repo_set_gitdir(struct repository *repo,
185186
repo->gitdir, "index");
186187
}
187188

188-
void repo_set_hash_algo(struct repository *repo, int hash_algo)
189+
void repo_set_hash_algo(struct repository *repo, uint32_t hash_algo)
189190
{
190191
repo->hash_algo = &hash_algos[hash_algo];
191192
}
192193

193-
void repo_set_compat_hash_algo(struct repository *repo, int algo)
194+
void repo_set_compat_hash_algo(struct repository *repo, uint32_t algo)
194195
{
196+
#ifdef WITH_RUST
195197
if (hash_algo_by_ptr(repo->hash_algo) == algo)
196198
BUG("hash_algo and compat_hash_algo match");
197199
repo->compat_hash_algo = algo ? &hash_algos[algo] : NULL;
198200
if (repo->compat_hash_algo)
199201
repo_read_loose_object_map(repo);
202+
#else
203+
(void)repo;
204+
if (algo)
205+
die(_("compatibility hash algorithm support requires Rust"));
206+
#endif
200207
}
201208

202209
void repo_set_ref_storage_format(struct repository *repo,

0 commit comments

Comments
 (0)