Skip to content

Commit e953e84

Browse files
committed
Initial commit
0 parents  commit e953e84

27 files changed

+1956
-0
lines changed

Documentation/AWSAthenaApiDoc.md

Lines changed: 456 additions & 0 deletions
Large diffs are not rendered by default.

Documentation/Authentication.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# Authentication
2+
3+
To access the AWS™ service it is necessary to authenticate with AWS. This can be accomplished in two ways:
4+
1. Using the default AWS Credential Provider Chain, to iterate through the default AWS authentication methods. This is the default authentication mechanism.
5+
2. By explicitly instantiating a AWS CredentialsProvider.
6+
7+
Particularly if using other AWS tools or services the first methods can be more convenient as one can have a common authentication process.
8+
9+
## Credential Provider Chain
10+
When a client is initialized, by default, it attempts to find AWS credentials by using the default credential provider chain as implemented by the AWS SDK. This looks for credentials in this order:
11+
12+
1. Environment variables: *AWS_ACCESS_KEY_ID*, *AWS_REGION* and *AWS_SECRET_ACCESS_KEY*.
13+
2. Java system properties: *aws.accessKeyId* and *aws.secretKey*.
14+
3. The default credential profiles file, typically store in *~/.aws/credentials* (Linux) or *c:\\Users\\username\\.aws\\* (Windows) and shared by many of the AWS SDKs and by the AWS CLI. A credentials file can be created by using the aws configure command provided by the AWS CLI, or by editing the file with a text editor. For information about the credentials file format, see AWS Credentials File Format: <https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html#credentials-file-format>.
15+
4. Amazon™ ECS™ container credentials are loaded from the Amazon ECS if the environment variable *AWS_CONTAINER_CREDENTIALS_RELATIVE_URI* is set.
16+
5. Instance profile credentials as used on EC2™ instances, and delivered through the Amazon EC2 metadata service.
17+
18+
For more information on the credential provider chain see: <https://docs.aws.amazon.com/sdk-for-java/v2/developer-guide/credentials.html>
19+
20+
21+
### Using Environment variables
22+
The environment variables *AWS_ACCESS_KEY_ID*, *AWS_REGION* and *AWS_SECRET_ACCESS_KEY* must be set in the process context used to start MATLAB®, that is the must be set before MATLAB is started and cannot be set using the MATLAB *setenv* command as they must be set in the context of the MATLAB JVM. One can verify if they have been set correctly using the following command rather than the MATLAB *getenv* command:
23+
```
24+
java.lang.System.getenv('AWS_REGION')
25+
26+
ans =
27+
28+
us-west-1
29+
```
30+
If they have not been set, a Java exception is raised from the provider chain.
31+
32+
### Using IAM Role based access
33+
When running on EC2 an EC2 instance may *not* have an IAM Role associated with it to allow access to a given resource. If the EC2 instance IAM Role is not there or is improperly configured, an error will occur.
34+
35+
To attach IAM Role to existing EC2 instance, please see: <https://aws.amazon.com/blogs/security/easily-replace-or-attach-an-iam-role-to-an-existing-ec2-instance-by-using-the-ec2-console/>
36+
37+
38+
## Instantiating a `CredentialProvider`
39+
The AWS SDK provides a series of different `CredentialProvider`s, and this API implements a few of them.
40+
41+
Use the utility class to instantiate a `CredentialProvider`, and then pass this argument
42+
to the initialize function.
43+
44+
```matlab
45+
credProvider = aws.auth.CredentialProvider.getSessionCredentialProvider(...
46+
'A<REDACTED>Q', ... % id
47+
'Z<REDACTED>4', ... % key
48+
'F<REDACTED>F'); % token
49+
50+
ath = aws.athena.AthenaClient();
51+
ath.initialize('credentialsprovider', credProvider);
52+
```
53+
54+
If you need to add a region, add this as a text argument:
55+
```matlab
56+
ath.initialize('region', 'eu-central-1', ...
57+
'credentialsprovider', credProvider)
58+
```
59+
If none of the arguments are used, the methods will try to deduce the correct region and credentials provider with the underlying libraries.
60+
61+
The methods currently provided by this class are
62+
63+
* `getBasicCredentialProvider`
64+
* `getProfileCredentialProvider`
65+
* `getInstanceProfileCredentialProvider`
66+
* `getSessionCredentialProvider`
67+
* `getJsonFileCredentialProvider`
68+
69+
and they all return an object that implements the Java interface
70+
`software.amazon.awssdk.auth.credentials.AwsCredentialsProvider`
71+
This means, that a user can easily instantiate and provide another
72+
object if needed.
73+
74+
75+
76+
77+
[//]: # (Copyright 2019 The MathWorks, Inc.)

Documentation/BasicUsage.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
## Basic Usage
2+
3+
This example assumes a working AWS account and an S3 storage.
4+
It will execute a simple SQL statement on some data (located on S3) and
5+
write the results to another S3 bucket. An Athena database must also configured.
6+
7+
This example uses the data from a MATLAB example file that can be found here (execute the following code in MATLAB):
8+
```matlab
9+
which airlinesmall.csv
10+
```
11+
12+
### Database setup
13+
To prepare for the demo, copy this data to a bucket on S3, and then
14+
create an Athena database from this data. In this example, it's called
15+
`MyAirlines.airlines`.
16+
Refer to the [AWS Athena](https://aws.amazon.com/athena/) pages for how to setup an Athena database.
17+
To facilitate creating this database, it can be helpful to look at the
18+
information from the CSV file providing the data.
19+
```matlab
20+
ds = datastore('airlinesmall.csv');
21+
dbt=cellfun(@fmtToDBType, ds.TextscanFormats, 'Uni', 0);
22+
names = ds.VariableNames;
23+
both = [names;dbt];
24+
disp(sprintf('%s %s, ', both{:}))
25+
```
26+
Use the above output for defining the types and columns of the Athena
27+
database (*the `fmtToDBType`* file is present in the `Examples` directory).
28+
29+
Lastly, a bucketfor storing the results is needed, e.g.
30+
```
31+
s3://testingathena/outputs/
32+
```
33+
### Authentication
34+
If *AWS CLI* is available on machine running MATLAB, there will probably be a file like `~/.aws/credentials` on the machine.
35+
If the credentials there are valid, it should be possible to start off without any issues. If not, there are other ways to
36+
authenticate (see [Authentication](Authentication.md)).
37+
38+
39+
### Running the code
40+
Setup variables
41+
```matlab
42+
dbName = 'MyAirlines.airlines';
43+
resultBucket = 's3://testingathena/outputs/';
44+
distLimit = 1000;
45+
```
46+
Connect to the client
47+
```matlab
48+
ath = aws.athena.AthenaClient();
49+
ath.Database = dbName;
50+
ath.initialize
51+
```
52+
53+
54+
Create and execute a query
55+
```matlab
56+
queryFar = sprintf('SELECT UniqueCarrier, distance FROM %s WHERE distance > %d;', ...
57+
dbName, distLimit);
58+
resultIDFar = ath.submitQuery(queryFar, resultBucket);
59+
```
60+
This function will return quickly, with a result string like *94079584-26b3-4caa-92cc-91fa94291bd4*, but the request may still be running.
61+
The status of a running request can be checked like this:
62+
```matlab
63+
status = char(ath.getStatusOfQuery(resultIDFar));
64+
```
65+
which will show the current state of the query (**SUCCEEDED**, **RUNNING**, etc.).
66+
When the query has succeeded, the resulting files can be found in S3, but these
67+
files can also be retrieved directly from MATLAB. The result will have the name
68+
```matlab
69+
resFile = sprintf('%s/%s.csv', resultBucket, char(resultIDFar))
70+
```
71+
```
72+
resFile =
73+
's3://testingathena/outputs/94079584-26b3-4caa-92cc-91fa94291bd4.csv'
74+
```
75+
This file can be read using a datastore. The datastore, however,
76+
will rely on having the AWS keys available in environment variables, so first
77+
do something like this:
78+
```matlab
79+
setenv('AWS_REGION', 'eu-central-1')
80+
setenv('AWS_ACCESS_KEY_ID', 'A<RETRACTED>Z')
81+
setenv('AWS_SECRET_ACCESS_KEY', 'B<RETRACTED>X')
82+
```
83+
The MATLAB documentation for how to *"Work with remote data"* describes this in more detail.
84+
85+
After this,the data can be read from the datastore.
86+
```matlab
87+
ds = datastore(resFile);
88+
ds.NumHeaderLines = 1;
89+
farResult = ds.readall();
90+
```
91+
92+
### Athena limits
93+
There are [limitations](https://docs.aws.amazon.com/athena/latest/ug/service-limits.html) to how many queries can be run in Athena.
94+
If the limit is exceeded,
95+
the submitted query will fail with a message similar to this one.
96+
97+
Problems executing Athena query:
98+
com.amazonaws.services.athena.model.AmazonAthenaException:
99+
Rate exceeded (Service: AmazonAthena; Status Code: 400;
100+
Error Code: ThrottlingException;
101+
Request ID: 5740d70a-e53d-4cb4-9c40-695cf31d828c)
102+
103+
This must be handled by the application.

Documentation/Installation.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Installation
2+
3+
## Installing on Windows®, macOS® and Linux
4+
The easiest way to install this package and all required dependencies is to clone the top-level repository using:
5+
6+
```bash
7+
git clone --recursive https://github.com/mathworks-ref-arch/mathworks-aws-support.git
8+
```
9+
10+
### Build the AWS SDK for Java components
11+
The MATLAB code uses the AWS SDK for Java and can be built using:
12+
```bash
13+
cd matlab-aws-athena/Software/Java
14+
mvn clean package
15+
```
16+
17+
Once built, use the ```matlab-aws-athena/Software/MATLAB/startup.m``` function to initialize the interface which will use the AWS Credentials Provider Chain to authenticate. Please see the [relevant documentation](Authentication.md) on how to specify the credentials.
18+
19+
The package is now ready for use. MATLAB can be configured to call ```startup.m``` on start if preferred so that the package is always available automatically. For further details see: [https://www.mathworks.com/help/matlab/ref/startup.html](https://www.mathworks.com/help/matlab/ref/startup.html)
20+
21+
[//]: # (Copyright 2019 The MathWorks, Inc.)

Documentation/Logging.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Logging - Controlling command output
2+
The Client uses a logging framework which is similar in many regards to the well known log4j framework. It supports logging levels of:
3+
* verbose - detailed messages that are useful during development and testing but are likely to be too detailed in day-to-day usage.
4+
* debug - default level with minimal output, recommended as a default output level.
5+
* warning - warnings indicative of potential problems, messages are displayed in red in the style of a built in MATLAB® warnings.
6+
* error - error messages with critical problems, messages trigger stack trace like output and execution is halted.
7+
8+
The default console logging level is *debug*. The logging library can log to both the MATLAB console and to a file. By default it logs to the MATLAB console only. One can set the levels used for logging separately for both, thus one could log in detail to a file and in less detail to the console and consult the detailed log only for postmortem purposes.
9+
10+
Once a Client, which creates a singleton logger object, has been created one can change the default values as follows:
11+
```
12+
logObj = Logger.getLogger();
13+
logObj.DisplayLevel = 'verbose';
14+
write(logObj,'verbose','My verbose message');
15+
My verbose message
16+
```
17+
18+
To enable logging to a file a log file path and name must be set:
19+
```
20+
% provide a name path for a log file
21+
logObj.LogFile = 'MyLogFile.log';
22+
23+
% set the log level for the log file
24+
logObj.LogFileLevel = 'verbose';
25+
```
26+
By default a filename is *not* set and no log file is produced. The logging level used in the file output can be set independently as shown. By default this level is set to *warning*.
27+
28+
29+
Logger methods are:
30+
* clearMessages(obj) - Clears the log messages currently stored in the Logger object.
31+
* clearLogFile(obj) - Clears the log messages currently stored in the log file.
32+
* write(obj,Level,MessageText) - Writes a message to the log.
33+
34+
Logger variables are:
35+
* LogFileLevel - The level of log messages that will be saved to the log file.
36+
* DisplayLevel - The level of log messages that will be displayed in the command window.
37+
* LogFile - The file name or path to the log file. If empty, nothing will be logged to file.
38+
* Messages - Structure array containing log messages.
39+
* MsgPrefix - This message prefix may be used in error logging if an errorStruct identifier is not set.
40+
41+
The errorStruct identifier is a character vector that specifies a component and a mnemonic label for an error or warning. The format of a simple identifier is: component:mnemonic
42+
43+
A colon separates the two parts of the identifier: component and mnemonic. If the identifier uses more than one component, then additional colons are required to separate them. A message identifier must always contain at least one colon, e.g:
44+
* MATLAB:rmpath:DirNotFound
45+
* MATLAB:odearguments:InconsistentDataType.
46+
47+
Both the component and mnemonic fields must adhere to the following syntax rules:
48+
* No white space (space or tab characters) is allowed anywhere in the identifier.
49+
* The first character must be alphabetic, either uppercase or lowercase.
50+
* The remaining characters can be alphanumeric or an underscore.
51+
* There is no length limitation to either the component or mnemonic.
52+
* The identifier can also be an empty character vector.
53+
54+
Messages logged to the console with a level of warning are displayed as if they were native MATLAB warnings and similarly error level messages are displayed as if they were native errors.
55+
56+
For full details see: *matlab-aws-common/Software/MATLAB/app/functions/Logger.m*.
57+
58+
[//]: # (Copyright 2019 The MathWorks, Inc.)

Documentation/README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# MATLAB Interface *for AWS Athena*
2+
3+
## Contents
4+
1. [Installation](Installation.md)
5+
2. [Authentication](Authentication.md)
6+
3. [Basic Usage](BasicUsage.md)
7+
4. [Logging](Logging.md)
8+
5. [API Documentation](AWSAthenaApiDoc.md)
9+
10+
[//]: # (Copyright 2019 The MathWorks, Inc.)

LICENSE.TXT

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Copyright (c) 2016, The MathWorks, Inc.
2+
All rights reserved.
3+
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
4+
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
5+
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
6+
3. In all cases, the software is, and all modifications and derivatives of the software shall be, licensed to you solely for use in conjunction with MathWorks products and service offerings.
7+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

README.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
2+
# MATLAB Interface *for AWS Athena*
3+
MATLAB® Interface for Amazon Web Services Athena™ Service.
4+
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. This package provides a basic interface to a subset of Athena features
5+
from within MATLAB.
6+
7+
## Requirements
8+
### MathWorks products
9+
* Requires MATLAB release R2017b or later.
10+
* AWS Common utilities found at https://github.com/mathworks-ref-arch/matlab-aws-common
11+
12+
### 3rd party products
13+
* Amazon Web Services account
14+
15+
To build a required JAR file:
16+
* [Maven](https://maven.apache.org/)
17+
* JDK 8+
18+
19+
## Getting Started
20+
Please refer to the [Documentation](Documentation/README.md) to get started.
21+
The [Installation Instructions](Documentation/Installation.md) and [Basic Usage](Documentation/BasicUsage.md) documents provide detailed instructions on setting up and using the interface. The easiest way to
22+
fetch this repository and all required dependencies is to clone the top-level repository using:
23+
24+
```bash
25+
git clone --recursive https://github.com/mathworks-ref-arch/mathworks-aws-support.git
26+
```
27+
28+
### Build the AWS SDK for Java components
29+
The MATLAB code uses the AWS SDK for Java and can be built using:
30+
```bash
31+
cd matlab-aws-athena/Software/Java
32+
mvn clean package
33+
```
34+
35+
Once built, use the ```matlab-aws-athena/Software/MATLAB/startup.m``` function to initialize the interface which will use the AWS Credentials Provider Chain to authenticate. Please see the [relevant documentation](Documentation/Authentication.md) on how to specify the credentials.
36+
37+
### Using the interface
38+
39+
40+
```matlab
41+
% Create some data needed in the examples
42+
dbName = 'MyAirlines.airlines';
43+
resultBucket = 's3://testing/airlineresult';
44+
distLimit = 1000;
45+
46+
% Create the client object and authenticate using
47+
% the AWS Default Provider Chain
48+
ath = aws.athena.AthenaClient();
49+
ath.Database = dbName;
50+
ath.initialize
51+
52+
53+
% Create a SQL statement and execute it (asynchronously)
54+
queryFar = sprintf('SELECT UniqueCarrier, distance FROM %s WHERE distance > %d;', ...
55+
dbName, distLimit);
56+
resultIDFar = ath.submitQuery(queryFar, resultBucket);
57+
58+
% Check the status, and make sure it says 'SUCCEEDED'
59+
status = char(ath.getStatusOfQuery(resultIDFar));
60+
61+
% At this point, we can read the results by using a MATLAB datastore
62+
resFile = sprintf('%s/%s.csv', resultBucket, char(resultIDFar));
63+
ds = datastore(resFile);
64+
ds.NumHeaderLines = 1;
65+
farResult = ds.readall();
66+
67+
```
68+
69+
## Supported Products:
70+
1. [MATLAB](https://www.mathworks.com/products/matlab.html) (R2017b or later)
71+
2. [MATLAB Compiler™](https://www.mathworks.com/products/compiler.html) and [MATLAB Compiler SDK™](https://www.mathworks.com/products/matlab-compiler-sdk.html) (R2017b or later)
72+
3. [MATLAB Production Server™](https://www.mathworks.com/products/matlab-production-server.html) (R2017b or later)
73+
4. [MATLAB Parallel Server™](https://www.mathworks.com/products/distriben.html) (R2017b or later)
74+
75+
## License
76+
The license for the MATLAB Interface *for AWS DynamoDB* is available in the [LICENSE.TXT](LICENSE.TXT) file in this GitHub repository. This package uses certain third-party content which is licensed under separate license agreements. See the [pom.xml](Software/Java/pom.xml) file for third-party software downloaded at build time.
77+
78+
## Enhancement Request
79+
Provide suggestions for additional features or capabilities using the following link:
80+
https://www.mathworks.com/products/reference-architectures/request-new-reference-architectures.html
81+
82+
## Support
83+
84+
85+
[//]: # (Copyright 2019 The MathWorks, Inc.)

RELEASENOTES.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# MATLAB Interface *for AWS Athena* - Release Notes
2+
3+
## Release 0.2.0 -- Initial public release
4+
5+

0 commit comments

Comments
 (0)