Skip to content

GGUF: ggml backend support for writing tensor data#1033

Open
JohannesGaessler wants to merge 2 commits intoggml-org:masterfrom
JohannesGaessler:gguf-backend-write
Open

GGUF: ggml backend support for writing tensor data#1033
JohannesGaessler wants to merge 2 commits intoggml-org:masterfrom
JohannesGaessler:gguf-backend-write

Conversation

@JohannesGaessler
Copy link
Copy Markdown
Contributor

This PR adds ggml backend support for writing tensor data to a GGUF file. Currently a workaround is needed where the data is first copied to new tensors with data in RAM, which the GGUF code can then access via memcpy. This PR makes it so that instead a fake tensor is reconstructed from gguf_tensor_info which can then be passed to ggml_backend_tensor_get. I'm not sure whether this is the best solution; a lot of the fileds in gguf_tensor_info are the same as in ggml_tensor, is there a reason why you couldn't just directly store a ggml_tensor as one of the fields in gguf_tensor_info?

@slaren
Copy link
Copy Markdown
Member

slaren commented Dec 1, 2024

It should be ok to store the tensor in gguf_tensor_info, but I think it would require a refactor to avoid duplicating the data since the gguf loader also uses this struct to load the tensor info.

@JohannesGaessler
Copy link
Copy Markdown
Contributor Author

I did a refactor to store a ggml_tensor instead of effectively mirrored fields. It seems to work correctly for MNIST but I think I'll open a PR in the llama.cpp repository to ensure that it works there as well (there are also some slight API changes that I would suggest). While I'm at it I'll also tackle #1038 as well.

Comment on lines +6397 to +6433
/* if (info->n_dims > GGML_MAX_DIMS) { */
/* fprintf(stderr, "%s: invalid number of dimensions (%" PRIu32 ")\n", __func__, info->n_dims); */
/* return false; */
/* } */

/* if (info->type < 0 || info->type >= GGML_TYPE_COUNT) { */
/* fprintf(stderr, "%s: invalid type (%d)\n", __func__, info->type); */
/* return false; */
/* } */

/* if (strlen(info->name.data) >= GGML_MAX_NAME) { */
/* fprintf(stderr, "%s: tensor '%s' name is too long\n", __func__, info->name.data); */
/* return false; */
/* } */

/* for (uint32_t i = 0; i < info->n_dims; ++i) { */
/* if (info->ne[i] <= 0) { */
/* fprintf(stderr, "%s: invalid number of elements (%" PRIu64 ")\n", __func__, info->ne[i]); */
/* return false; */
/* } */
/* } */

/* // prevent overflow for total number of elements */
/* if (INT64_MAX/info->ne[1] <= info->ne[0]) { */
/* fprintf(stderr, "%s: invalid number of elements (%" PRIu64 ")\n", __func__, info->ne[1]); */
/* return false; */
/* } */

/* if (INT64_MAX/info->ne[2] <= info->ne[0]*info->ne[1]) { */
/* fprintf(stderr, "%s: invalid number of elements (%" PRIu64 ")\n", __func__, info->ne[2]); */
/* return false; */
/* } */

/* if (INT64_MAX/info->ne[3] <= info->ne[0]*info->ne[1]*info->ne[2]) { */
/* fprintf(stderr, "%s: invalid number of elements (%" PRIu64 ")\n", __func__, info->ne[3]); */
/* return false; */
/* } */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these checks commented?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just something I did for a WIP version. I have a version with more changes and the checks re-eenabled on my local machine. I'll make a PR to llama.cpp either today or tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants