Releases: aws/aws-sdk-pandas
AWS Data Wrangler 2.16.0
Noteworthy
⚠️ For platforms without PyArrow 7 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
New Functionalities
Enhancements
- add test infrastructure for oracle database #1274
- revisiting S3 Select performance #1287
- migrate test infra from cdk v1 to cdk v2 #1288
- to_sql() make column names quoted identifiers to allow sql keywords #1392
- throw NoFilesFound exception on 404 #1290
- fast executemany #1299
- add precombine key to upsert method for Redshift #1304
- pass precombine to redshift.copy() #1319
- use DataFrame column names in INSERT statement for UPSERT operation #1317
- add data_source param to athena.repair_table #1324
- modify athena2quicksight datatypes to allow startswith for varchar #1332
- add TagColumnOperation to quicksight.create_athena_dataset #1342
- enable list timestream databases and tables #1345
- enable s3.to_parquet to receive "zstd" compression type #1369
- create a way to perform PartiQL queries to a Dynamo DB table #1390
- s3 proxy support with data wrangler #1361
Documentation
- be more explicit about awswrangler.s3.to_parquet overwrite behavior #1300
- fix Python Version in Readme #1302
Bug Fix
- set encoding to utf-8 when no encoding is specified when reading/writing to s3 #1257
- fix Redshift Locking Behavior #1305
- specify cfn deletion policy for sqlserver and oracle instances #1378
- to_sql() make column names quoted identifiers to allow sql keywords #1392
- fix extension dtype index handling #1333
- fix issue with redshift.to_sql() method when mode set to "upsert" and schema contains a hyphen #1360
- timestream - array cols to str #1368
- read_parquet Does Not Throw Error for Missing Column #1370
Thanks
We thank the following contributors/users for their work on this release:
@bnimam, @IldarAlmakaev, @syokoysn, @IldarAlmakaev, @thomasniebler, @maxdavidson91, @takeknock, @Sleekbobby1011, @snikolakis, @willsmith28, @malachi-constant, @cnfait, @jaidisido, @kukushking
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!
AWS Data Wrangler 2.15.1
Noteworthy
⚠️ Dropped Python 3.6 support
⚠️ For platforms without PyArrow 7 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Patch
- Add
sparql
extra & makeSPARQLWrapper
dependency optional #1252
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!
AWS Data Wrangler 2.15.0
Noteworthy
⚠️ Dropped Python 3.6 support
⚠️ For platforms without PyArrow 7 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
New Functionalities
- Amazon Neptune module 🚀 #1084 Check out the tutorial. Thanks to @bechbd & @sakti-mishra !
- ARM64 Support for Python 3.8 and 3.9 layers 🔥 #1129 Many thanks @cnfait !
Enhancements
- Timestream module - support multi-measure records #1214
- Warnings for implicit float conversion of nulls in to_parquet #1221
- Support additional sql params in Redshift COPY operation #1210
- Add create_ctas_table to Athena module #1207
- S3 Proxy support #1206
- Add Athena get_named_query_statement #1183
- Add manifest parameter to 'redshift.copy_from_files' method #1164
Documentation
Bug Fix
- Give precedence to user path for Athena UNLOAD S3 Output Location #1216
- Honor User specified workgroup in athena.read_sql_query with unload_approach=True #1178
- Support map type in Redshift copy #1185
- data_api.rds.read_sql_query() does not preserve data type when column is all NULLS - switches to Boolean #1158
- Allow decimal values within struct when writing to parquet #1179
Thanks
We thank the following contributors/users for their work on this release:
@bechbd, @sakti-mishra, @mateogianolio, @jasadams, @malachi-constant, @cnfait, @jaidisido, @kukushking
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!
AWS Data Wrangler 2.14.0
Caveats
⚠️ For platforms without PyArrow 6 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
New Functionalities
- Support Athena Unload 🚀 #1038
Enhancements
- Add the
ExcludeColumnSchema=True
argument to the glue.get_partitions call to reduce response size #1094 - Add PyArrow flavor argument to
write_parquet
viapyarrow_additional_kwargs
#1057 - Add
rename_duplicate_columns
andhandle_duplicate_columns
flag tosanitize_dataframe_columns_names
method #1124 - Add
timestamp_as_object
argument to all databaseread_sql_table
methods #1130 - Add
ignore_null
to read_parquet_metadata method #1125
Documentation
- Improve documentation on installing SAR Lambda layers with the CDK #1097
- Fix broken link to tutorial in
to_parquet
method #1058
Bug Fix
- Ensure that partition locations retrieved from AWS Glue always end in a "/" #1094
- Fix bucketing overflow issue in Athena #1086
Thanks
We thank the following contributors/users for their work on this release:
@dennyau, @kailukowiak, @lucasmo, @moykeen, @RigoIce, @vlieven, @kepler, @mdavis-xyz, @ConstantinoSchillebeeckx, @kukushking, @jaidisido
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!
AWS Data Wrangler 2.13.0
Caveats
⚠️ For platforms without PyArrow 6 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Breaking changes
- Fix sanitize methods to align with Glue/Hive naming conventions #579
New Functionalities
- AWS Lake Formation Governed Tables 🚀 #570
- Support for Python 3.10 🔥 #973
- Add partitioning to JSON datasets #962
- Add ability to use unbuffered cursor for large MySQL datasets #928
Enhancements
- Add awswrangler.s3.list_buckets #997
- Add partitions_parameters to catalog partitions methods #1035
- Refactor pagination config in list objects #955
- Add error message to EmptyDataframe exception #991
Documentation
- Clarify docs & add tutorial on schema evolution for CSV datasets #964
Bug Fix
- catalog.add_column() without column_comment triggers exception #1017
- catalog.create_parquet_table Key in dictionary does not always exist #998
- Fix Catalog StorageDescriptor get #969
Thanks
We thank the following contributors/users for their work on this release:
@csabz09, @Falydoor, @moritzkoerber, @maxispeicher, @kukushking, @jaidisido
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!
AWS Data Wrangler 2.12.1
Caveats
⚠️ For platforms without PyArrow 5 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Patch
- Removing unnecessary dev dependencies from main #961
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!
AWS Data Wrangler 2.12.0
Caveats
⚠️ For platforms without PyArrow 5 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
New Functionalities
- Add Support for Opensearch #891 🔥 Check out the tutorial. Many thanks to @AssafMentzer and @mureddy19 for this contribution
Enhancements
- redshift.read_sql_query - handle empty table corner case #874
- Refactor read parquet table to reduce file list scan based on available partitions #878
- Shrink lambda layer with strip command #884
- Enabling DynamoDB endpoint URL #887
- EMR jobs concurrency #889
- Add feature to allow custom AMI for EMR #907
- wr.redshift.unload_to_files empty the S3 folder instead of overwriting existing files #914
- Add catalog_id arg to wr.catalog.does_table_exist #920
- Ad enpoint_url for AWS Secrets Manager #929
Documentation
- Update docs for awswrangler.s3.to_csv #868
Bug Fix
- wr.mysql.to_sql with use_column_names=True when column names are reserved words #918
Thanks
We thank the following contributors/users for their work on this release:
@AssafMentzer, @mureddy19, @isichei, @DonnaArt, @kukushking, @jaidisido
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!
AWS Data Wrangler 2.11.0
Caveats
⚠️ For platforms without PyArrow 5 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
New Functionalities
- Redshift and RDS Data Api Support #828 🚀 Check out the tutorial. Many thanks to @pwithams for this contribution
Enhancements
Documentation
- Clarifying structure of SSM secrets in
connect
methods #871
Bug Fix
- Use botocores' Loader and ServiceModel to extract accepted kwargs #832
Thanks
We thank the following contributors/users for their work on this release:
@pwithams, @maxispeicher, @kukushking, @jaidisido
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!
AWS Data Wrangler 2.10.0
Caveats
⚠️ For platforms without PyArrow 4 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Enhancements
- Add upsert support for Postgresql #807
- Add schema evolution parameter to
wr.s3.to_csv
#787 - Enable order by in CTAS Athena queries #785
- Add header to
wr.s3.to_csv
when dataset = True #765 - Add
CSV
as unload format towr.redshift.unload_files
#761
Bug Fix
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @kukushking, @jaidisido, @mohdaliiqbal
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!
AWS Data Wrangler 2.9.0
Caveats
⚠️ For platforms without PyArrow 4 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Documentation
- Added S3 Select tutorial #748
- Clarified wr.s3.to_csv docs #730
Enhancements
- Enable server-side predicate filtering using
S3 Select
🚀 #678 - Support
VersionId
parameter for S3 read operations #721 - Enable prefix in output S3 files for
wr.redshift.unload_to_files
#729 - Add option to skip commit on
wr.redshift.to_sql
#705 - Move integration test infrastructure to CDK 🎉 #706
Bug Fix
- Wait until athena query results bucket is created #735
- Remove explicit Excel engine configuration #742
- Fix bucketing types #719
- Change end_time to UTC #720
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @kukushking, @jaidisido
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!