-
Notifications
You must be signed in to change notification settings - Fork 34
Description
I'm running some code which used to work. Now I get an error. I think the URL for the participant registration list is wrong.
Steps to reproduce
- create a new Python virtual environment (3.12.3)
pip install nemosis- create a new empty folder for the data cache
- Run this script:
from nemosis import static_table
nemosis_data_cache = "/home/matthew/Data/nemosis/"
static_pd: "pd.DataFrame" = static_table(
"Generators and Scheduled Loads", nemosis_data_cache
)
print(static_pd)
Expected behavior: Script runs without error, and prints the static data.
Actual behavior:
INFO: Retrieving static table Generators and Scheduled Loads
Traceback (most recent call last):
File "/home/matthew/Downloads/nemosis-dep/main.py", line 5, in <module>
static_pd: "pd.DataFrame" = static_table(
^^^^^^^^^^^^^
File "/home/matthew/Downloads/nemosis-dep/venv/lib/python3.12/site-packages/nemosis/data_fetch_methods.py", line 297, in static_table
table = static_file_reader_map[table_name](path_and_name, table_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/matthew/Downloads/nemosis-dep/venv/lib/python3.12/site-packages/nemosis/data_fetch_methods.py", line 404, in _read_excel
xls = _pd.ExcelFile(path_and_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/matthew/Downloads/nemosis-dep/venv/lib/python3.12/site-packages/pandas/io/excel/_base.py", line 1554, in __init__
raise ValueError(
ValueError: Excel file format cannot be determined, you must specify an engine manually.
Debugging Info
The installed libraries are:
- pandas: 2.3.3
- nemosis: 3.8.1
- openpyxl: 3.1.5
- pyxlsb: 1.0.10
The docs for pd.ExcelFile say that something changed in version 1.2.0, but that was superseded years ago.
The Cause
The AEMO participant registration file is a .xlsx. It has been that way for over a year
However the code here has a url ending in .xls.
I do not know where this came from. When I open that URL with firefox, I see the file prompt suggests a .xlsx ending.
So I think we were passing a filename to Pandas which ended with .xls, but the content was actually .xlsx. Maybe Pandas was previously smart enough to figure it out, but now expects us to be more explicit. Although when I try to use older versions of Pandas and xlrd, I get the same error.
Proposed Solution
I do not think we need to actually specify an engine. Rather we should just fix the URL.
Either:
- append an
xto the URL; or - replace the whole URL with
https://www.aemo.com.au/-/media/files/electricity/nem/participant_information/nem-registration-and-exemption-list.xlsx
And change the filenames we save this to (e.g. here)