-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
We need to measure the storage requirements of each source+symbol in our system.
A "source" is an API endpoint that gives us data we're interested in (ie. public binance websocket server).
The task is to design and implement a system to automatically go through all of the sources in the sources array (Global variable in collector/sources.go) and measure the size of incoming data over a certain period.
It should:
- Take into account all sources + symbols in the table.
- (If a source isn't working feel free to skip it.)
- Make use of the subscribe or join functions to listen to new data.
- Measure the throughput of the data for each symbol.
- Be as fast as possible, but will obviously require more time to be more accurate.
- Record the lengths (In bytes) of all messages received by a source / symbol.
Ideas + advice:
- Could be written as a Go test or a standalone Go program in the tests folder.
- Most of that data from our sources will be in Json format.
- Data throughput will depend on time of day (Some sources have busier periods).
- Feel free to extend the Source struct and add more members if necessary.
- The results can be output as a csv or any open data format, no need to over complicate displaying the results.
- Some of the symbols have rate limits which will ban your client. You'll have to avoid tap into the many symbols in the same source automatically.
- The sources documentation will explain the rate limits they've set.
- Avoid inheritance, it's not needed or wanted. We use function pointers for reusability.
Bonus:
- If you find any obvious flaws or changes you would make that would be great.
- Make a system to determine the busiest periods for any given source.
- Ie binance might be busiest from 5pm-10pm PST
- All that's needed is for the measure tool to be able to record over very long periods of time.
- Sanity check for symbols in the table.
- Check if the source + symbol is valid (Doesn't return 404 or websocket subscription error) for all symbols in the table.
- Measure the impact of simple compression techniques.
- Lz4
- Bson
- Any automatic json->binary system.
- Light parsing with heuristics (ie most of the json data has a "results" member which has the data we're actually interested in so everything around it can be safely discarded)
- More if you want.
- Add an optional marshal function for the Source type.
- Marshal "compresses" a message from whatever format it gets into a smaller format while preserving all relevant data.
- For instance Json -> Bson would be a valid marshal function.
- Same thing for Json -> Lz4.
- It should be customizeable enough for even manual parsing (Source parsed by a custom funcion into a Go struct).
- Marshal "compresses" a message from whatever format it gets into a smaller format while preserving all relevant data.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed