-
Notifications
You must be signed in to change notification settings - Fork 857
add --count and --batch args for data_export.py
#522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks @jiangyinzuo. What is the actual use case for adding these parameters? The idea of |
| tuple: A tuple containing properties as a dictionary and an h5py file object. | ||
| """ | ||
| for root, _, files in os.walk(build_result_filepath(dataset, count)): | ||
| for root, _, files in os.walk(build_result_filepath(dataset, count, batch_mode=batch_mode)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If data_export.py is expected to export all the results that are found, then here build_result_filepath(...) should return all the sub-directories.
However, in def build_result_filepath(...):
ann-benchmarks/ann_benchmarks/results.py
Lines 11 to 38 in 3982de0
| def build_result_filepath(dataset_name: Optional[str] = None, | |
| count: Optional[int] = None, | |
| definition: Optional[Definition] = None, | |
| query_arguments: Optional[Any] = None, | |
| batch_mode: bool = False) -> str: | |
| """ | |
| Constructs the filepath for storing the results. | |
| Args: | |
| dataset_name (str, optional): The name of the dataset. | |
| count (int, optional): The count of records. | |
| definition (Definition, optional): The definition of the algorithm. | |
| query_arguments (Any, optional): Additional arguments for the query. | |
| batch_mode (bool, optional): If True, the batch mode is activated. | |
| Returns: | |
| str: The constructed filepath. | |
| """ | |
| d = ["results"] | |
| if dataset_name: | |
| d.append(dataset_name) | |
| if count: | |
| d.append(str(count)) | |
| if definition: | |
| d.append(definition.algorithm + ("-batch" if batch_mode else "")) | |
| data = definition.arguments + query_arguments | |
| d.append(re.sub(r"\W+", "_", json.dumps(data, sort_keys=True)).strip("_") + ".hdf5") | |
| return os.path.join(*d) |
If count and batch_mode are None, directories of specific counts and batch mode will be ignored.
So if we expect data_export.py to export all the results, including every count argument, batch mode, and non-batch mode, maybe we should implement a new function for def load_all_results(...)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think the problem is that
ann-benchmarks/ann_benchmarks/results.py
Lines 90 to 97 in 3982de0
| continue | |
| try: | |
| with h5py.File(os.path.join(root, filename), "r+") as f: | |
| properties = dict(f.attrs) | |
| if batch_mode != properties["batch_mode"]: | |
| continue | |
| yield properties, f | |
| except Exception: |
doesn't differentiate between batch_mode being True or False, or batch_mode being None.
Since definition is None in the part above, it should try to load both the batch/non-batch/different count values.
|
I decided to mark this PR as a draft until it is ready. |
also convert
Xto numpy.array in glove dataset function