Skip to content

Automatic measurement of source data usage. #1

@dowdt

Description

@dowdt

We need to measure the storage requirements of each source+symbol in our system.

A "source" is an API endpoint that gives us data we're interested in (ie. public binance websocket server).

The task is to design and implement a system to automatically go through all of the sources in the sources array (Global variable in collector/sources.go) and measure the size of incoming data over a certain period.

It should:

  • Take into account all sources + symbols in the table.
    • (If a source isn't working feel free to skip it.)
  • Make use of the subscribe or join functions to listen to new data.
  • Measure the throughput of the data for each symbol.
  • Be as fast as possible, but will obviously require more time to be more accurate.
  • Record the lengths (In bytes) of all messages received by a source / symbol.

Ideas + advice:

  • Could be written as a Go test or a standalone Go program in the tests folder.
  • Most of that data from our sources will be in Json format.
  • Data throughput will depend on time of day (Some sources have busier periods).
  • Feel free to extend the Source struct and add more members if necessary.
  • The results can be output as a csv or any open data format, no need to over complicate displaying the results.
  • Some of the symbols have rate limits which will ban your client. You'll have to avoid tap into the many symbols in the same source automatically.
    • The sources documentation will explain the rate limits they've set.
  • Avoid inheritance, it's not needed or wanted. We use function pointers for reusability.

Bonus:

  • If you find any obvious flaws or changes you would make that would be great.
  • Make a system to determine the busiest periods for any given source.
    • Ie binance might be busiest from 5pm-10pm PST
    • All that's needed is for the measure tool to be able to record over very long periods of time.
  • Sanity check for symbols in the table.
    • Check if the source + symbol is valid (Doesn't return 404 or websocket subscription error) for all symbols in the table.
  • Measure the impact of simple compression techniques.
    • Lz4
    • Bson
    • Any automatic json->binary system.
    • Light parsing with heuristics (ie most of the json data has a "results" member which has the data we're actually interested in so everything around it can be safely discarded)
    • More if you want.
  • Add an optional marshal function for the Source type.
    • Marshal "compresses" a message from whatever format it gets into a smaller format while preserving all relevant data.
      • For instance Json -> Bson would be a valid marshal function.
      • Same thing for Json -> Lz4.
      • It should be customizeable enough for even manual parsing (Source parsed by a custom funcion into a Go struct).

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions