You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/dev/adding-feeds.md
+120-7Lines changed: 120 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
<!-- comment
2
-
SPDX-FileCopyrightText: 2015-2023 Sebastian Wagner, Filip Pokorný
2
+
SPDX-FileCopyrightText: 2015-2021 nic.at GmbH, 2023 Filip Pokorný, 2025 Institute for Common Good Technology
3
3
SPDX-License-Identifier: AGPL-3.0-or-later
4
4
-->
5
5
@@ -30,12 +30,123 @@ Adding a feed doesn't necessarily require any programming experience. There are
30
30
31
31
If the data source utilizes some unusual way of distribution or uses a custom format for the data it might be necessary to develop specialized bot(s) for this particular data source. Always try to use existing bots before you start developing your own. Please also consider extending an existing bot if your use-case is close enough to it's features. If you are unsure which way to take, start an [issue](https://github.com/certtools/intelmq/issues) and you will receive guidance.
32
32
33
+
## Howto
34
+
35
+
### Choosing the collector
36
+
37
+
### Choosing the parser
38
+
39
+
### Classification
40
+
41
+
### Other static fields
42
+
43
+
* Feed accuracy
44
+
* TLP
45
+
* Event Description
46
+
* Target
47
+
* Text
48
+
* URL
49
+
* Protocol
50
+
* Application Protocol
51
+
* Transport Protocol
52
+
53
+
## Example Feeds
54
+
55
+
### Simple List
56
+
57
+
As an example, let's add the - very simple - feed *Toxic IP Addresses (CIDR)* by StopForumSpam to the documentation. The data URL is https://www.stopforumspam.com/downloads/toxic_ip_cidr.txt and contains a list of IP Network Ranges in CIDR notation, separated by newlines.
58
+
59
+
As the resource is available via HTTP, we will use the [HTTP Collector](../user/bots.md#intelmq.bots.collectors.http.collector_http) for the data retrieval and [Generic CSV Parser](../user/bots.md#intelmq.bots.parsers.generic.parser_csv) for parsing.
60
+
For the collector, we only specify the module to use (the HTTP collector, as seen on the bots documentation), an estimate on the feed accuracy (as it is a blacklist, not 100%, but still reasonably high), the resource URL to download and the rate limit of 1 hour, as there might be frequent updates.
61
+
62
+
For the parser we again specify the module name and the required parameter (columns) to map the input data field to the IntelMQ field `source.network`. Further we add some static field values which are equal for all data lines.
63
+
64
+
```
65
+
Stop Forum Spam:
66
+
Toxic IP Addresses:
67
+
description: IP Networks that are believed will only ever be used for abuse
The feeds description is at https://cert.pl/en/warning-list/ and it says the list of blocked domains is updated about every 5 minutes. In IntelMQ we usually don't need such high refresh rates, but setting it to half an hour is reasonable for most use cases.
105
+
The list is automatically composed, and the list contains domains for warnings so the accuracy is lower.
106
+
As the descriptions says the listed domains are websites, we can again assume the protocol is HTTP/TCP. Although the list is about phishing websites, it's use case is a warning/blacklist and therefore the classification is blacklist. In the event description we explain the kind of blacklist.
107
+
The most crucial part is the mapping of da columns to IntelMQ fields. In this case, they are given in Polish.
108
+
- `PozycjaRejestru`: Position in the Register. We do not need this in IntelMQ, so we save it as `extra.certpl_register`
109
+
- `AdresDomeny`: The domain address, lands in `source.fqdn`. This is the information we case about
110
+
- `DataWpisu`: The date of entry, and
111
+
- `DataWykreslenia`: The date of deletion
112
+
- This is a tricky situation we as have no clear indication at which time the information is current. Based on the feed description, if the deletion date would is not present, the time of fetching the data (`time.observation`) is closest to the meaning of `time.source`.
113
+
- Therefore, instead of using the Generic CSV Parser, a custom Parser or a downstream expert is required to accomplish this.
114
+
- For simplicity, we map these columns to `extra.first_seen` and `extra.expiration_date`. Both fields are already in use by other bots and feeds.
This is a list with potentially interesting data sources, which are either currently not supported or the usage is not clearly documented in IntelMQ. If you want to **contribute** new feeds to IntelMQ, this is a great place to start!
36
147
37
148
!!! note
38
-
Some of the following data sources might better serve as an expert bot for enriching processed events.
149
+
Some of the following data sources might also serve as an expert bot for enriching processed events.
39
150
40
151
- Lists of feeds:
41
152
-[threatfeeds.io](https://threatfeeds.io)
@@ -48,6 +159,7 @@ This is a list with potentially interesting data sources, which are either curre
48
159
- Some third party intelmq bots: [NRDCS IntelMQ fork](https://github.com/NRDCS/intelmq/tree/certlt/intelmq/bots)
0 commit comments