Skip to content

Reddit Source downloader #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions source-reddit-fetcher/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
FROM airbyte/python-connector-base:1.1.0

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . ./airbyte/integration_code
RUN pip install ./airbyte/integration_code


# The entrypoint and default env vars are already set in the base image
ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]
80 changes: 80 additions & 0 deletions source-reddit-fetcher/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Reddit Source


## Usage

This connector fetch information from a selected subreddit.

### Configuration

The connector takes the following input:

- `client_id` - Reddit's client ID for your account
- `client_secret` - Reddit's client secret for your account
- `username` - your Reddit username that has been used to generate the `client_id` and `client_secret`
- `days` - used to calculate the stop date. Before that date votes and comments will not be monitored. Calculation: $stop\ date = today - days$

### Output

The connector will return the following:

* [Posts](./source-reddit-fetcher/schemas/posts.json)
* [PostVotes](./source-reddit-fetcher/schemas/posts_votes.json)
* [Comments](./source-reddit-fetcher/schemas/comments.json)
* [CommentsVotes](./source-reddit-fetcher/schemas/comments_votes.json)

## Local development

### Prerequisites

#### Activate Virtual Environment and install dependencies

From this connector directory, create a virtual environment:

```
python -m venv .venv
```

```
source .venv/bin/activate
pip install -r requirements.txt
```

### Locally running the connector

```
python main.py spec
python main.py check --config sample_files/config-example.json
python main.py discover --config sample_files/config-example.json
python main.py read --config sample_files/config-example.json --catalog sample_files/configured_catalog.json
```

### Locally running the connector docker image

```bash
docker build -t airbyte/source-reddit-fetcher:dev .
# Running the spec command against your patched connector
docker run airbyte/source-reddit-fetcher:dev spec
```

#### Run

Then run any of the connector commands as follows:

#### Linux / MAC OS

```
docker run --rm airbyte/source-reddit-fetcher:dev spec
docker run --rm -v $(pwd)/sample_files:/sample_files airbyte/source-reddit-fetcher:dev check --config /sample_files/config-example.json
docker run --rm -v $(pwd)/sample_files:/sample_files airbyte/source-reddit-fetcher:dev discover --config /sample_files/config-example.json
docker run --rm -v $(pwd)/sample_files:/sample_files airbyte/source-reddit-fetcher:dev read --config /sample_files/config-example.json --catalog /sample_files/configured_catalog.json
```

### Windows

```
docker run --rm airbyte/source-reddit-fetcher:dev spec
docker run --rm -v "$PWD\sample_files:/sample_files" airbyte/source-reddit-fetcher:dev check --config /sample_files/config-example.json
docker run --rm -v "$PWD\sample_files:/sample_files" airbyte/source-reddit-fetcher:dev discover --config /sample_files/config-example.json
docker run --rm -v "$PWD\sample_files:/sample_files" airbyte/source-reddit-fetcher:dev read --config /sample_files/config-example.json --catalog /sample_files/configured_catalog.json
```
4 changes: 4 additions & 0 deletions source-reddit-fetcher/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from source_reddit_fetcher.run import run

if __name__ == "__main__":
run()
25 changes: 25 additions & 0 deletions source-reddit-fetcher/metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
data:
allowedHosts:
registries:
oss:
enabled: false
cloud:
enabled: false
connectorBuildOptions:
baseImage: docker.io/airbyte/python-connector-base:1.0.0@sha256:dd17e347fbda94f7c3abff539be298a65af2d7fc27a307d89297df1081a45c27
connectorSubtype: api
connectorType: source
definitionId: 1c448bfb-8950-478c-9ae0-f03aaaf4e920
dockerImageTag: '0.0.1'
dockerRepository: harbor.status.im/bi/airbyte/source-reddit-fetcher
githubIssueLabel: source-reddit-fetcher
icon: twitter-fetcher.svg
license: MIT
name: Reddit Data Extractor
releaseDate: TODO
supportLevel: community
releaseStage: alpha
documentationUrl: https://docs.bi.status.im/extractions/reddit.html
tags:
- language:python
metadataSpecVersion: "1.0"
1 change: 1 addition & 0 deletions source-reddit-fetcher/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pandas
7 changes: 7 additions & 0 deletions source-reddit-fetcher/sample_files/config-example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"client_id": "your_reddit_client_id",
"client_secret": "your_reddit_client_secret",
"username": "your_reddit_username",
"subreddit": "subreddit_name_to_be_monitored",
"days": 31
}
46 changes: 46 additions & 0 deletions source-reddit-fetcher/sample_files/configured_catalog.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
{
"streams": [
{
"stream": {
"name": "posts",
"json_schema": {
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object"
},
"supported_sync_modes": [
"full_refresh", "incremental"
]
},
"sync_mode": "incremental",
"destination_sync_mode": "append"
},
{
"stream": {
"name": "posts_votes",
"json_schema": {
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object"
},
"supported_sync_modes": [
"incremental"
]
},
"sync_mode": "incremental",
"destination_sync_mode": "append"
},
{
"stream": {
"name": "comments",
"json_schema": {
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object"
},
"supported_sync_modes": [
"full_refresh", "incremental"
]
},
"sync_mode": "full_refresh",
"destination_sync_mode": "overwrite"
}
]
}
35 changes: 35 additions & 0 deletions source-reddit-fetcher/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#


from setuptools import find_packages, setup

MAIN_REQUIREMENTS = [
"airbyte-cdk~=0.2",
]

TEST_REQUIREMENTS = [
"requests-mock~=1.9.3",
"pytest~=6.2",
"pytest-mock~=3.6.1",
"connector-acceptance-test",
]

setup(
name="source-reddit-fetcher",
description="Source implementation for Reddit.",
author="Status",
author_email="[email protected]",
packages=find_packages(),
install_requires=MAIN_REQUIREMENTS,
package_data={"": ["*.json", "*.yaml", "schemas/*.json", "schemas/shared/*.json"]},
extras_require={
"tests": TEST_REQUIREMENTS,
},
entry_points={
"console_scripts": [
"source-reddit-connector=source_reddit_fetcher.run:run",
],
},
)
Empty file.
7 changes: 7 additions & 0 deletions source-reddit-fetcher/source_reddit_fetcher/run.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
import sys
from airbyte_cdk.entrypoint import launch
from .source import SourceRedditFetcher

def run():
source = SourceRedditFetcher()
launch(source, sys.argv[1:])
49 changes: 49 additions & 0 deletions source-reddit-fetcher/source_reddit_fetcher/schemas/comments.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"id": {
"type": "string"
},
"post_id": {
"type": "string"
},
"subreddit": {
"type": "string"
},
"comment_id": {
"type": "string"
},
"created_timestamp": {
"type": "string",
"format": "date-time"
},
"timezone": {
"type": "string"
},
"parent_id": {
"type": "string"
},
"author": {
"type": "string"
},
"text": {
"type": "string"
},
"html_text": {
"type": "string"
},
"url": {
"type": "string"
},
"ups": {
"type": "integer"
},
"downs": {
"type": "integer"
},
"score": {
"type": "integer"
}
}
}
44 changes: 44 additions & 0 deletions source-reddit-fetcher/source_reddit_fetcher/schemas/posts.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"id": {
"type": "string"
},
"kind_tag": {
"type": "string"
},
"kind_name": {
"type": "string"
},
"subreddit": {
"type": "string"
},
"post_id": {
"type": "string"
},
"post_url": {
"type": "string",
"format": "uri"
},
"created_timestamp": {
"type": "string",
"format": "date-time"
},
"timezone": {
"type": "string"
},
"title": {
"type": "string"
},
"text": {
"type": "string"
},
"html_text": {
"type": "string"
},
"author": {
"type": "string"
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"id": {
"type": ["null", "string"]
},
"post_id": {
"type": ["null", "string"]
},
"kind_name": {
"type": ["null", "string"]
},
"kind": {
"type": ["null", "string"]
},
"ups": {
"type": ["null", "integer"]
},
"downs": {
"type": ["null", "integer"]
},
"upvote_ratio": {
"type": ["null", "number"]
},
"score": {
"type": ["null", "integer"]
}
}
}
Loading