Skip to content

Commit ee1809a

Browse files
committed
Bumping version to 0.3.0
1 parent f045512 commit ee1809a

37 files changed

+8620
-112
lines changed

README.md

Lines changed: 48 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -2,73 +2,74 @@
22

33
> DataFrames on AWS
44
5-
[![Release](https://img.shields.io/badge/release-0.2.6-brightgreen.svg)](https://pypi.org/project/awswrangler/)
5+
[![Release](https://img.shields.io/badge/release-0.3.0-brightgreen.svg)](https://pypi.org/project/awswrangler/)
66
[![Downloads](https://img.shields.io/pypi/dm/awswrangler.svg)](https://pypi.org/project/awswrangler/)
77
[![Python Version](https://img.shields.io/badge/python-3.6%20%7C%203.7-brightgreen.svg)](https://pypi.org/project/awswrangler/)
88
[![Documentation Status](https://readthedocs.org/projects/aws-data-wrangler/badge/?version=latest)](https://aws-data-wrangler.readthedocs.io/en/latest/?badge=latest)
99
[![Coverage](https://img.shields.io/badge/coverage-89%25-brightgreen.svg)](https://pypi.org/project/awswrangler/)
1010
[![Average time to resolve an issue](http://isitmaintained.com/badge/resolution/awslabs/aws-data-wrangler.svg)](http://isitmaintained.com/project/awslabs/aws-data-wrangler "Average time to resolve an issue")
1111
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
1212

13-
**[Read the Docs!](https://aws-data-wrangler.readthedocs.io)**
13+
## [Read the Docs](https://aws-data-wrangler.readthedocs.io)
1414

15-
**[Read the Tutorials](https://github.com/awslabs/aws-data-wrangler/tree/master/tutorials): [Catalog & Metadata](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/catalog_and_metadata.ipynb) | [Athena Nested](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/athena_nested.ipynb) | [S3 Write Modes](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/s3_write_modes.ipynb)**
15+
## [Read the Tutorials](https://github.com/awslabs/aws-data-wrangler/tree/master/tutorials)
16+
- [Catalog & Metadata](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/catalog_and_metadata.ipynb)
17+
- [Athena Nested](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/athena_nested.ipynb)
18+
- [S3 Write Modes](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/s3_write_modes.ipynb)
1619

17-
---
18-
19-
*Contents:* **[Use Cases](#Use-Cases)** | **[Installation](#Installation)** | **[Examples](#Examples)** | **[Diving Deep](#Diving-Deep)** | **[Step By Step](#Step-By-Step)** | **[Contributing](#Contributing)**
20-
21-
---
20+
## Contents
21+
- [Use Cases](#Use-Cases)
22+
- [Installation](#Installation)
23+
- [Examples](#Examples)
24+
- [Diving Deep](#Diving-Deep)
25+
- [Step By Step](#Step-By-Step)
26+
- [Contributing](#Contributing)
2227

2328
## Use Cases
2429

2530
### Pandas
2631

27-
* Pandas -> Parquet (S3) (Parallel)
28-
* Pandas -> CSV (S3) (Parallel)
29-
* Pandas -> Glue Catalog Table
30-
* Pandas -> Athena (Parallel)
31-
* Pandas -> Redshift (Append/Overwrite/Upsert) (Parallel)
32-
* Pandas -> Aurora (MySQL/PostgreSQL) (Append/Overwrite) (Via S3) (NEW :star:)
33-
* Parquet (S3) -> Pandas (Parallel)
34-
* CSV (S3) -> Pandas (One shot or Batching)
35-
* Glue Catalog Table -> Pandas (Parallel)
36-
* Athena -> Pandas (One shot, Batching or Parallel)
37-
* Redshift -> Pandas (Parallel)
38-
* CloudWatch Logs Insights -> Pandas
39-
* Aurora -> Pandas (MySQL) (Via S3) (NEW :star:)
40-
* Encrypt Pandas Dataframes on S3 with KMS keys
41-
* Glue Databases Metadata -> Pandas (Jupyter output compatible)
42-
* Glue Table Metadata -> Pandas (Jupyter output compatible)
32+
| FROM | TO | Features |
33+
|--------------------------|-----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
34+
| Pandas DataFrame | Amazon S3 | Parquet, CSV, Partitions, Parallelism, Overwrite/Append/Partitions-Upsert modes,<br>KMS Encryption, Glue Metadata (Athena, Spectrum, Spark, Hive, Presto) |
35+
| Amazon S3 | Pandas DataFrame| Parquet (Pushdown filters), CSV, Partitions, Parallelism,<br>KMS Encryption, Multiple files |
36+
| Amazon Athena | Pandas DataFrame| Workgroups, S3 output path, Encryption, and two different engines:<br><br>- ctas_approach=False **->** Batching and restrict memory environments<br>- ctas_approach=True **->** Blazing fast, parallelism and enhanced data types |
37+
| Pandas DataFrame | Amazon Redshift | Blazing fast using parallel parquet on S3 behind the scenes<br>Append/Overwrite/Upsert modes |
38+
| Amazon Redshift | Pandas DataFrame| Blazing fast using parallel parquet on S3 behind the scenes |
39+
| Pandas DataFrame | Amazon Aurora | Supported engines: MySQL, PostgreSQL<br>Blazing fast using parallel CSV on S3 behind the scenes<br>Append/Overwrite modes |
40+
| Amazon Aurora | Pandas DataFrame| Supported engines: MySQL<br>Blazing fast using parallel CSV on S3 behind the scenes |
41+
| CloudWatch Logs Insights | Pandas DataFrame| Query results |
42+
| Glue Catalog | Pandas DataFrame| List and get Tables details. Good fit with Jupyter Notebooks. |
4343

4444
### PySpark
4545

46-
* PySpark -> Redshift (Parallel)
47-
* Register Glue table from Dataframe stored on S3
48-
* Flatten nested DataFrames
46+
| FROM | TO | Features |
47+
|-----------------------------|---------------------------|------------------------------------------------------------------------------------------|
48+
| PySpark DataFrame | Amazon Redshift | Blazing fast using parallel parquet on S3 behind the scenesAppend/Overwrite/Upsert modes |
49+
| PySpark DataFrame | Glue Catalog | Register Parquet or CSV DataFrame on Glue Catalog |
50+
| Nested PySpark<br>DataFrame | Flat PySpark<br>DataFrames| Flatten structs and break up arrays in child tables |
4951

5052
### General
5153

52-
* List S3 objects (Parallel)
53-
* Delete S3 objects (Parallel)
54-
* Delete listed S3 objects (Parallel)
55-
* Delete NOT listed S3 objects (Parallel)
56-
* Copy listed S3 objects (Parallel)
57-
* Get the size of S3 objects (Parallel)
58-
* Get CloudWatch Logs Insights query results
59-
* Load partitions on Athena/Glue table (repair table)
60-
* Create EMR cluster (For humans)
61-
* Terminate EMR cluster
62-
* Get EMR cluster state
63-
* Submit EMR step(s) (For humans)
64-
* Get EMR step state
65-
* Get EMR step state
66-
* Athena query to receive the result as python primitives (*Iterable[Dict[str, Any]*)
67-
* Load and Unzip SageMaker jobs outputs
68-
* Load and Unzip SageMaker models
69-
* Redshift -> Parquet (S3)
70-
* Aurora -> CSV (S3) (MySQL) (NEW :star:)
71-
* Get Glue Metadata
54+
| Feature | Details |
55+
|---------------------------------------------|-------------------------------------|
56+
| List S3 objects | e.g. wr.s3.list_objects("s3://...") |
57+
| Delete S3 objects | Parallel |
58+
| Delete listed S3 objects | Parallel |
59+
| Delete NOT listed S3 objects | Parallel |
60+
| Copy listed S3 objects | Parallel |
61+
| Get the size of S3 objects | Parallel |
62+
| Get CloudWatch Logs Insights query results | |
63+
| Load partitions on Athena/Glue table | Through "MSCK REPAIR TABLE" |
64+
| Create EMR cluster | "For humans" |
65+
| Terminate EMR cluster | "For humans" |
66+
| Get EMR cluster state | "For humans" |
67+
| Submit EMR step(s) | "For humans" |
68+
| Get EMR step state | "For humans" |
69+
| Query Athena to receive python primitives | Returns *Iterable[Dict[str, Any]* |
70+
| Load and Unzip SageMaker jobs outputs | |
71+
| Dump Amazon Redshift as Parquet files on S3 | |
72+
| Dump Amazon Aurora as CSV files on S3 | Only for MySQL engine |
7273

7374
## Installation
7475

awswrangler/__version__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
__title__ = "awswrangler"
22
__description__ = "DataFrames on AWS."
3-
__version__ = "0.2.6"
3+
__version__ = "0.3.0"
44
__license__ = "Apache License 2.0"

awswrangler/pandas.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -831,7 +831,7 @@ def _cast_pandas(dataframe: pd.DataFrame, cast_columns: Dict[str, str]) -> pd.Da
831831
elif pandas_type == "date":
832832
dataframe[col] = pd.to_datetime(dataframe[col]).dt.date.replace(to_replace={pd.NaT: None})
833833
else:
834-
dataframe[col] = dataframe[col].astype(pandas_type, skipna=True)
834+
dataframe[col] = dataframe[col].astype(pandas_type)
835835
return dataframe
836836

837837
@staticmethod
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
awswrangler.dynamodb module
2+
===========================
3+
4+
.. automodule:: awswrangler.dynamodb
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:

docs/source/api/awswrangler.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Submodules
1010
awswrangler.aurora
1111
awswrangler.cloudwatchlogs
1212
awswrangler.data_types
13+
awswrangler.dynamodb
1314
awswrangler.emr
1415
awswrangler.exceptions
1516
awswrangler.glue

0 commit comments

Comments
 (0)