Skip to content

Commit 14a5b78

Browse files
authored
Merge pull request #31 from vsoch/add/headers-exporters
Adding user agent header, regex for URL watcher tasks
2 parents e444423 + 9ee4b70 commit 14a5b78

File tree

7 files changed

+129
-31
lines changed

7 files changed

+129
-31
lines changed

.github/AUTHORS.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Maintainers
2+
3+
- [@vsoch](https://www.github.com/vsoch)
4+
5+
# Contributors
6+
7+
- [@SCHKN](https://www.github.com/SCHKN)

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Critical items to know are:
1313
- changed behaviour
1414

1515
## [master](https://github.com/vsoch/watchme/tree/master)
16+
- Adding option for regular expression for URL wachers, user agent header (0.0.16)
1617
- requests is missing from install dependencies (0.0.15)
1718
- small bug fixes (0.0.14)
1819
- added headers, params, and json args for post and get urls. (0.0.13)

docs/_docs/examples/index.md

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,9 @@ permalink: /examples/index.html
55
order: 1
66
---
77

8-
We will have more examples and details, but for now, here are the example watcher
9-
repos:
8+
## Repository Examples
9+
10+
Here you can find example watcher repos:
1011

1112
- [system](https://github.com/vsoch/watchme-system) for system, sensors, users, and networking monitoring using psutils tasks.
1213
- [air-quality](https://github.com/vsoch/watchme-air-quality) for watching a metric across a few cities.
@@ -17,3 +18,29 @@ For either of the above, you can easily install and activate the watcher to run
1718
your machine! See [here](https://vsoch.github.io/watchme/getting-started/#how-do-i-get-a-watcher).
1819
For specific details about creating the watchers in question, see the README markdowns
1920
in the repositories.
21+
22+
## Configuration Examples
23+
24+
The following example configurations are contributed by users over time. If you
25+
have an example to contribute, please [open an issue](https://www.github.com/{{ site.repo }}/issues)
26+
to share it.
27+
28+
### URL Watchers
29+
30+
The following are examples for [URL watchers](https://vsoch.github.io/watchme/watchers/urls/).
31+
In the following example, the user is using the `get_url_selection` task to extract
32+
a number (note the regular expression) from the text resulting from selecting the
33+
class `.local-temp`. For this version of WatchMe the User-Agent header was not
34+
automatically added, so he added it here as a `header_*` parameter.
35+
36+
```
37+
[task-temperature]
38+
url = https://www.accuweather.com/en/lu/luxembourg/228714/weather-forecast/228714
39+
selection = .local-temp
40+
get_text = true
41+
func = get_url_selection
42+
active = true
43+
type = urls
44+
regex = [0-9]+
45+
header_user-agent = Mozilla/5.0
46+
```

docs/_docs/watcher-tasks/urls.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,16 @@ A urls task has the following parameters shared across functions.
3838
| url | Yes |undefined|url@https://www.reddit.com/r/hpc| validated starts with http |
3939
| func | No |get_task |func@download_task| must be defined in tasks.py |
4040

41+
42+
#### Task Headers
43+
44+
For some tasks, you can add one or more headers to the request by specifying `header_<name>`.
45+
For example, to add the header "Token" I could do `header_Token=123456`.
46+
By default, each task has the User-Agent header added, as it typically helps.
47+
If you want to disable this, add the header_User-Agent to be empty, or change
48+
it to something else.
49+
50+
4151
#### Lists of URL Parameters
4252

4353
For the "Get" and "Get with selection" tasks, you might want to include url parameters. For example,
@@ -69,7 +79,6 @@ or to skip the third page call (page=3) for the name parameter, just leave it em
6979
url_param_name@V,V,,V,V,V,V
7080
```
7181

72-
7382
## Tasks Available
7483

7584
- [Get Task](#1-get-a-url-task) appropriate if you want to perform a GET (e.g., download a page)
@@ -117,7 +126,6 @@ If you specify "save_as" to be json, you will get a results.json unless you spec
117126
file name.
118127

119128

120-
121129
### 2. Post to a URL Task
122130

123131
This task will post to get changes from a URL, ideal for watching restful API

watchme/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
# with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
77

88

9-
__version__ = "0.0.15"
9+
__version__ = "0.0.16"
1010
AUTHOR = 'Vanessa Sochat'
1111
AUTHOR_EMAIL = '[email protected]'
1212
NAME = 'watchme'

watchme/watchers/urls/helpers.py

Lines changed: 62 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,13 @@ def get_params(kwargs, key='url_param_'):
3030
names = [x for x in kwargs if x.startswith(key)]
3131
for n in range(len(names)):
3232
name = names[n]
33+
3334
# Params are split by commas, with index corresponding to list index
3435
paramlist = kwargs.get(name).split(',')
36+
3537
# Remove the "url_param"
3638
name = name.replace(key, '', 1)
39+
3740
# Update the dictionary of dictionaries
3841
for i in range(len(paramlist)):
3942

@@ -54,27 +57,75 @@ def get_params(kwargs, key='url_param_'):
5457
return params
5558

5659

60+
def parse_success_response(response, kwargs):
61+
'''parse a successful response of 200, meaning we honor the user
62+
request to return json, search for a regular expression, or return
63+
raw text. This is used by the basic GET/POST functions. For parsing
64+
with beautiful soup, see "get_results" and "get_url_selection"
65+
66+
Parameters
67+
==========
68+
response: the requests (200) response
69+
kwargs: dictionary of keyword arguments provided to function
70+
'''
71+
result = None
72+
save_as = kwargs.get('save_as', 'json')
73+
regex = kwargs.get('regex')
74+
75+
# Returning the result as json will detect dictionary, and save json
76+
if save_as == "json":
77+
result = response.json()
78+
79+
# As an alternative, search for a regular expression
80+
elif regex not in ["", None]:
81+
match = re.search(regex, response.text)
82+
result = match.group()
83+
84+
# Otherwise, we return text
85+
else:
86+
result = response.text
87+
return result
88+
89+
5790
def get_headers(kwargs):
58-
'''Get a single set of headers from the kwargs dict.
91+
'''Get a single set of headers from the kwargs dict. A user agent is added
92+
as it is helpful in most cases.
5993
6094
Parameters
6195
==========
6296
kwargs: the dictionary of keyword arguments that may contain url
6397
parameters (format is url_param_<name>
6498
'''
65-
headers = {}
99+
headers = {"User-Agent": "Mozilla/5.0"}
66100

67101
for key, value in kwargs.items():
68102
if key.startswith('header_'):
69103
name = key.replace('header_', '', 1)
70-
headers[name] = value
104+
105+
# The header is defined with a value
106+
if value != None:
107+
headers[name] = value
108+
109+
# If the user wants to remove the User-Agent (or any) header
110+
elif value == None and name in headers:
111+
del headers[name]
71112

72113
return headers
73114

74115

75-
def get_results(url, selector, func=None, attributes=None, params={}, get_text=False, headers={}):
76-
'''given a url, a function, an optional selector, optional attributes, and a set (dict)
77-
of parameters, perform a request.
116+
def get_results(url,
117+
selector,
118+
func=None,
119+
attributes=None,
120+
params={},
121+
get_text=False,
122+
headers={},
123+
regex=None):
124+
125+
'''given a url, a function, an optional selector, optional attributes,
126+
and a set (dict) of parameters, perform a request. This function is
127+
used if the calling function needs special parsing of the html with
128+
beautiful soup. If only a post/get is needed, this is not necessary.
78129
79130
Parameters
80131
==========
@@ -103,6 +154,11 @@ def get_results(url, selector, func=None, attributes=None, params={}, get_text=F
103154
if attributes != None:
104155
[results.append(entry.get(x)) for x in attributes]
105156

157+
# Second priority for regular expression on text
158+
elif regex not in [None, ""]:
159+
match = re.search(regex, entry.text)
160+
results.append(match.group())
161+
106162
# Does the user want to get text?
107163
elif get_text == True:
108164
results.append(entry.text)

watchme/watchers/urls/tasks.py

Lines changed: 19 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,12 @@
1313
from .helpers import (
1414
get_params,
1515
get_results,
16-
get_headers
16+
get_headers,
17+
parse_success_response
1718
)
1819
from requests.exceptions import HTTPError
1920
import os
21+
import re
2022
import tempfile
2123
import requests
2224

@@ -30,6 +32,10 @@ def get_task(url, **kwargs):
3032
3133
REQUIRED:
3234
url: a url to return the page for
35+
36+
OPTIONAL
37+
regex: a regular expression to search the text for (not used w/ json)
38+
save_as: return the result to save as json
3339
'''
3440
results = []
3541
paramsets = get_params(kwargs)
@@ -39,16 +45,9 @@ def get_task(url, **kwargs):
3945
response = requests.get(url, params=params, headers=headers)
4046

4147
if response.status_code == 200:
42-
save_as = kwargs.get('save_as')
43-
44-
# Returning the result as json will detect dictionary, and save json
45-
if save_as == "json":
46-
result = response.json()
47-
48-
# Otherwise, we return text
49-
else:
50-
result = response.text
5148

49+
# Parse the response per the user's request
50+
result = parse_success_response(response, kwargs)
5251
results.append(result)
5352

5453
results = [x for x in results if x]
@@ -81,19 +80,15 @@ def post_task(url, **kwargs):
8180
response = requests.post(url, json=params, headers=headers)
8281
if response.status_code == 200:
8382

84-
save_as = kwargs.get('save_as', 'json')
85-
86-
# Returning the result as json will detect dictionary, and save json
87-
if save_as == "json":
88-
result = response.json()
89-
90-
# Otherwise, we return text
91-
else:
92-
result = response.text
83+
# Parse the response per the user's request
84+
result = parse_success_response(response, kwargs)
85+
results.append(result)
9386

9487
else:
9588
bot.error("%s: %s" %(response.status_code, response.reason))
9689

90+
results = [x for x in results if x]
91+
9792
# Return None if no results found
9893
if len(results) == 0:
9994
results = None
@@ -179,6 +174,9 @@ def get_url_selection(url, **kwargs):
179174
if kwargs.get('get_text') != None:
180175
get_text = True
181176

177+
# Are we searching for a regular expression in the result?
178+
regex = kwargs.get('regex')
179+
182180
# Does the user want to get one or more attributes?
183181
attributes = kwargs.get('attributes', None)
184182
if attributes != None:
@@ -198,7 +196,8 @@ def get_url_selection(url, **kwargs):
198196
headers=headers,
199197
attributes=attributes,
200198
params=params,
201-
get_text=get_text)
199+
get_text=get_text,
200+
regex=regex)
202201

203202
# No results
204203
if len(results) == 0:

0 commit comments

Comments
 (0)