Skip to content

Releases: aws/aws-sdk-pandas

AWS Data Wrangler 2.8.0

19 May 13:40
b13fcd8
Compare
Choose a tag to compare

Caveats

⚠️ For platforms without PyArrow 4 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Documentation

  • Install Lambda Layers and Python wheels from public S3 bucket 🎉 #666
  • Clarified docs around potential in-place mutation of dataframe when using to_parquet #669

Enhancements

  • Enable parallel s3 downloads (~20% speedup) 🚀 #644
  • Apache Arrow 4.0.0 support (enables ARM instances support as well) #557
  • Enable LOCK before concurrent COPY calls in Redshift #665
  • Make use of Pyarrow iter_batches (>= 3.0.0 only) #660
  • Enable additional options when overwriting Redshift table (drop, truncate, cascade) #671
  • Reuse s3 client across threads for s3 range requests #684

Bug Fix

  • Add dtypes for empty ctas athena queries #659
  • Add Serde properties when creating CSV table #672
  • Pass SSL properties from Glue Connection to MySQL #554

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @kukushking, @igorborgest, @gballardin, @eferm, @jaklan, @Falydoor, @chariottrider, @chriscugliotta, @konradsemsch, @gvermillion, @russellbrooks, @mshober.


P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

AWS Data Wrangler 2.7.0

15 Apr 17:17
fd1b62f
Compare
Choose a tag to compare

Caveats

⚠️ For platforms without PyArrow 3 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Documentation

  • Updated documentation to clarify wr.athena.read_sql_query params argument use #609

New Functionalities

  • Supporting MySQL upserts #608
  • Enable prepending S3 parquet files with a prefix in wr.s3.write.to_parquet #617
  • Add exist_ok flag to safely create a Glue database #642
  • Add "Unsupported Pyarrow type" exception #639

Bug Fix

  • Fix chunked mode in wr.s3.read_parquet_table #627
  • Fix missing \ character from wr.s3.read_parquet_table method #638
  • Support postgres as an engine value #630
  • Add default workgroup result configuration #633
  • Raise exception when merge_upsert_table fails or data_quality is insufficient #601
  • Fixing nested structure bug in athena2pyarrow method #612

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @igorborgest, @mattboyd-aws, @vlieven, @bentkibler, @adarsh-chauhan, @impredicative, @nmduarteus, @JoshCrosby, @TakumiHaruta, @zdk123, @tuannguyen0901, @jiteshsoni, @luminita.


P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!

AWS Data Wrangler 2.6.0

16 Mar 18:50
Compare
Choose a tag to compare

Caveats

⚠️ For platforms without PyArrow 3 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Enhancements

  • Added a chunksize parameter to the to_sql function. Default set to 200. Decreased insertion time from 120 to 1 second #599
  • path argument is now optional in s3.to_parquet and s3.to_csv functions #586
  • Added a map_types boolean (set to True by default) to convert pyarrow DataTypes to pandas ExtensionDtypes #580
  • Added optional ctas_database_name argument to store ctas_temporary_table in an alternative database #576

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @igorborgest, @ilyanoskov, @VashMKS, @jmahlik, @dimapod, @Reeska


P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!

AWS Data Wrangler 2.5.0

03 Mar 16:59
Compare
Choose a tag to compare

Caveats

⚠️ For platforms without PyArrow 3 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Documentation

Enhancements

  • Support for ExpectedBucketOwner #562

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @impredicative, @adarsh-chauhan, @Malkard.


P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!

AWS Data Wrangler 2.4.0 (Docs updated)

04 Feb 13:24
Compare
Choose a tag to compare

Caveats

⚠️ For platforms without PyArrow 3 support (e.g. EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Documentation

  • Update to include PyArrow 3 caveats for EMR and Glue PySpark Job. #546 #547

New Functionalities

  • Redshift COPY now supports the new SUPER type (i.e. SERIALIZETOJSON) #514
  • S3 Upload/download files #506
  • Include dataset BUCKETING for s3 datasets writing #443
  • Enable Merge Upsert for existing Glue Tables on Primary Keys #503
  • Support Requester Pays S3 Buckets #430
  • Add botocore Config to wr.config #535

Enhancements

  • Pandas 1.2.1 support #525
  • Numpy 1.20.0 support
  • Apache Arrow 3.0.0 support #531
  • Python 3.9 support #454

Bug Fix

  • Return DataFrame with unique index for Athena CTAS queries #527
  • Remove unnecessary schema inference. #524

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @danielwo, @jiteshsoni, @igorborgest, @njdanielsen, @eric-valente, @gvermillion, @zseder, @gdbassett, @orenmazor, @senorkrabs, @Natalie-Caruana, @dragonH, @nikwerhypoport, @hwangji.


P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!

AWS Data Wrangler 2.4.0

03 Feb 23:26
Compare
Choose a tag to compare

New Functionalities

  • Redshift COPY now supports the new SUPER type (i.e. SERIALIZETOJSON) #514
  • S3 Upload/download files #506
  • Include dataset BUCKETING for s3 datasets writing #443
  • Enable Merge Upsert for existing Glue Tables on Primary Keys #503
  • Support Requester Pays S3 Buckets #430
  • Add botocore Config to wr.config #535

Enhancements

  • Pandas 1.2.1 support #525
  • Numpy 1.20.0 support
  • Apache Arrow 3.0.0 support #531
  • Python 3.9 support #454

Bug Fix

  • Return DataFrame with unique index for Athena CTAS queries #527
  • Remove unnecessary schema inference. #524

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @danielwo, @jiteshsoni, @igorborgest, @njdanielsen, @eric-valente, @gvermillion, @zseder, @gdbassett, @orenmazor, @senorkrabs, @Natalie-Caruana.


P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!

AWS Data Wrangler 2.3.0

10 Jan 14:36
Compare
Choose a tag to compare

New Functionalities

  • DynamoDB support #448
  • SQLServer support (Driver must be installed separately) #356
  • Excel files support #419 #509
  • Amazon S3 Access Point support #393
  • Amazon Chime initial support #494
  • Write compressed CSV and JSON files on S3 #308 #359 #412

Enhancements

  • Add query parameters for Athena #432
  • Add metadata caching for Athena #461
  • Add suffix filters for s3.read_parquet_table() #495

Bug Fix

  • Fix keep_files behavior for failed Redshift COPY executions #505

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @danielwo, @jiteshsoni, @gvermillion, @rodalarcon, @imanebosch, @dwbelliston, @tochandrashekhar, @kylepierce, @njdanielsen, @jasadams, @gtossou, @JasonSanchez, @kokes, @hanan-vian @igorborgest.


P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!

AWS Data Wrangler 2.2.0

23 Dec 00:05
Compare
Choose a tag to compare

New Functionalities

  • Add aws_access_key_id, aws_secret_access_key, aws_session_token and boto3_session for Redshift copy/unload #484

Bug Fix

  • Remove dtype print statement #487

Thanks

We thank the following contributors/users for their work on this release:

@danielwo, @thetimbecker, @njdanielsen, @igorborgest.


P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!

AWS Data Wrangler 2.1.0

21 Dec 11:11
Compare
Choose a tag to compare

New Functionalities

  • Add secretmanager module and support for databases connections #402
con = wr.redshift.connect(secret_id="my-secret", dbname="my-db")
df = wr.redshift.read_sql_query("SELECT ...", con=con)
con.close()

Bug Fix

  • Fix connection attributes quoting for wr.*.connect() #481
  • Fix parquet table append for nested struct columns #480

Thanks

We thank the following contributors/users for their work on this release:

@danielwo, @nmduarteus, @nivf33, @kinghuang, @igorborgest.


P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!

AWS Data Wrangler 2.0.1

11 Dec 10:58
Compare
Choose a tag to compare

New Functionalities

Enhancements

Thanks

We thank the following contributors/users for their work on this release:

@danielwo, @igorborgest.


P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!