Skip to content

Conversation

@Aiee
Copy link

@Aiee Aiee commented Nov 4, 2025

Rationale for this change

Explicitly showing the default null spellings in the document would be convenient.
Source:

ConvertOptions ConvertOptions::Defaults() {
auto options = ConvertOptions();
// Same default null / true / false spellings as in Pandas.
options.null_values = {"", "#N/A", "#N/A N/A", "#NA", "-1.#IND", "-1.#QNAN",
"-NaN", "-nan", "1.#IND", "1.#QNAN", "N/A", "NA",
"NULL", "NaN", "n/a", "nan", "null"};
options.true_values = {"1", "True", "TRUE", "true"};
options.false_values = {"0", "False", "FALSE", "false"};
return options;
}

What changes are included in this PR?

Only the document.

Are these changes tested?

No need to test.

Are there any user-facing changes?

No.

Expanded the default list of recognized null spellings in the documentation.
@github-actions
Copy link

github-actions bot commented Nov 4, 2025

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@Aiee Aiee changed the title [Doc] Update null values documentation in csv.rst MINOR: Update null values documentation in csv.rst Nov 4, 2025
@raulcd
Copy link
Member

raulcd commented Nov 5, 2025

@github-actions crossbow submit preview-docs

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

Revision: 7945ea3

Submitted crossbow builds: ursacomputing/crossbow @ actions-d7b30c40c1

Task Status
preview-docs GitHub Actions

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, it feels too specific to me and prone to getting outdated soon if we end up doing any other spelling. I am not sure this is something we want to add to the documentation.
@AlenkaF @thisisnic thoughts?
In any case the linter is failing because those require double backticks ``""`` instead of single `""`
See rendered docs not showing correctly here:
http://crossbow.voltrondata.com/pr_docs/48048/cpp/csv.html#nulls

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Nov 5, 2025
@Aiee
Copy link
Author

Aiee commented Nov 5, 2025

Thanks for the PR, it feels too specific to me and prone to getting outdated soon if we end up doing any other spelling. I am not sure this is something we want to add to the documentation. @AlenkaF @thisisnic thoughts? In any case the linter is failing because those require double backticks "" instead of single "" See rendered docs not showing correctly here: http://crossbow.voltrondata.com/pr_docs/48048/cpp/csv.html#nulls

Hi, thanks for the reply. My initial motivation is that I found it was hard to find the specific null spelling (also for true_values and `false_values') without diving into the source code. I thought it might be helpful to state it in the document since these values haven't been changed for a long time. But sure, it adds cost for future maintenance.

@AlenkaF
Copy link
Member

AlenkaF commented Nov 5, 2025

It is a bit of a paint to find relevant information for this specific case.

What if we update the header file here:

/// Create conversion options with default values, including conventional
/// values for `null_values`, `true_values` and `false_values`
static ConvertOptions Defaults();

with additional information about what are "default null values" (plus others) and then link the API docs section here:
https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N5arrow3csv14ConvertOptions8DefaultsEv

to the Nulls section in the User Guide?
http://crossbow.voltrondata.com/pr_docs/48048/cpp/csv.html#nulls

Adding such a specific list directly to the docs does look like a possible future issue with outdated information.

@github-actions github-actions bot added Component: C++ awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 6, 2025
@Aiee
Copy link
Author

Aiee commented Nov 6, 2025

It is a bit of a paint to find relevant information for this specific case.

What if we update the header file here:

/// Create conversion options with default values, including conventional
/// values for `null_values`, `true_values` and `false_values`
static ConvertOptions Defaults();

with additional information about what are "default null values" (plus others) and then link the API docs section here: arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N5arrow3csv14ConvertOptions8DefaultsEv

to the Nulls section in the User Guide? http://crossbow.voltrondata.com/pr_docs/48048/cpp/csv.html#nulls

Adding such a specific list directly to the docs does look like a possible future issue with outdated information.

Hi, I've reverted the doc change and update the header file as suggested. Please take a look.

/// Create conversion options with default values, including conventional
/// values for `null_values`, `true_values` and `false_values`
///
/// Default null values: see http://crossbow.voltrondata.com/pr_docs/48048/cpp/csv.html#nulls
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can not link to the preview docs. Here, in the header file, we can actually add the list of the conventional null values. Then this section can be linked from the csv.html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants