-
Notifications
You must be signed in to change notification settings - Fork 22
feat(dgw): basic network monitoring #1446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Let maintainers know that an action is required on their side
|
ac51029
to
ec20f65
Compare
devolutions-gateway/src/lib.rs
Outdated
@@ -54,6 +57,7 @@ pub struct DgwState { | |||
pub recordings: recording::RecordingMessageSender, | |||
pub job_queue_handle: job_queue::JobQueueHandle, | |||
pub credential_store: credential::CredentialStoreHandle, | |||
pub monitoring_state: Arc<network_monitor::State> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I avoided my reflex to implement all functionality inside of an object, and instead have an opaque state object that is passed to my functions, since that seemed to be what was done here. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems ok to me. Should the network monitor be feature gated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you mean adding a compile-time feature flag, then I would suggest we don’t. If you suggest to hide the feature behind the unstable flag, then I agree we could do that so we can tweak the API across the minor releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did mean compile-time feature flag. I agree with what you are saying now.
crates/network-monitor/src/lib.rs
Outdated
let cancellation_token = CancellationToken::new(); | ||
let cancellation_monitor = cancellation_token.clone(); | ||
|
||
let definition_clone = (*definition).clone(); // TODO: is there a nicer way to do this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember right, I have to clone definition
here because its owner will not outlive the end of the function. But I suspect there might be a way to do map
so that definition
is already a clone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can clone before collecting.
pub async fn set_config(config: MonitorsConfig, state: Arc<State>) -> Result<(), SetConfigError> {
let file = fs::OpenOptions::new()
.create(true)
.write(true)
.truncate(true)
.open(get_monitor_config_path(&state.cache_path))?;
let mut config_write = state.config.write().await;
serde_json::to_writer_pretty(&file, &config)?;
let old_config = mem::replace(&mut *config_write, config);
// clone here
let new_config_set = config_write.monitors.clone().into_iter().collect::<HashSet<_>>();
let old_config_set = old_config.monitors.clone().into_iter().collect::<HashSet<_>>();
// `Iterator::cloned` here
let added = new_config_set.difference(&old_config_set).cloned();
let deleted = old_config_set.difference(&new_config_set);
let (new_cancellation_tokens, new_monitors): (Vec<_>, Vec<_>) = added
.into_iter()
.map(|definition| {
let cancellation_token = CancellationToken::new();
let cancellation_monitor = cancellation_token.clone();
// my personal preference is to use `Arc::clone`
// https://rust-lang.github.io/rust-clippy/stable/index.html#/clone_on_ref_ptr
let state = Arc::clone(&state);
// clone id before moving definition into the async block
let definition_id = definition.id.clone();
let monitor = async move {
loop {
let start_time = Utc::now();
let monitor_result = match &definition.probe {
ProbeType::Ping => do_ping_monitor(&definition).await,
ProbeType::TcpOpen => do_tcpopen_monitor(&definition).await,
ProbeType::Unknown(_) => return, // TODO: shouldn't happen, they should be filtered out. Create a separate ProbeType enum without Unknown?
};
state.log.write(monitor_result);
let elapsed = Utc::now() - start_time;
let next_run_in = (definition.interval as f64 - elapsed.as_seconds_f64()).clamp(1.0, f64::INFINITY);
select! {
_ = cancellation_monitor.cancelled() => { () }
_ = tokio::time::sleep(Duration::from_secs_f64(next_run_in)) => { () }
}
}
};
(
(definition_id, cancellation_token),
Box::pin(monitor) as Pin<Box<dyn Future<Output = ()> + Send>>,
)
})
.unzip();
let mut cancellation_tokens_write = state.cancellation_tokens.lock().await;
for definition in deleted {
cancellation_tokens_write[&definition.id].cancel();
cancellation_tokens_write.remove(&definition.id);
}
for (monitor_id, cancellation_token) in new_cancellation_tokens {
cancellation_tokens_write.insert(monitor_id, cancellation_token);
}
for monitoring_task in new_monitors {
tokio::spawn(monitoring_task);
}
Ok(())
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, what is the magic that would make this work? I've called cloned()
with a d, which I hadn't noticed existed probably due to how close it looks like just plain clone
, but once I access definition
in the async move
closure, then I get a plethora of borrow errors. Is there a change in your code that I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function in my last comment should work as is.
router.with_state(state) | ||
} | ||
|
||
#[debug_handler] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: This macro is only for debugging purposes, please remove it and the macros
feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guides for axum recommended always adding this macro to get proper compiler diagnostic messages. It seems convenient to keep those messages on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember that it was stated to be only intended for debugging and should not be used in production to avoid the compile-time overhead, and losing other optimizations opportunities.
It is nice to get better diagnostics, but it’s also a typical write once thing, and we’ll be compiling again and again, so I still stand for removing it once you get the correct signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see in the doc that now it does not affect the runtime performance when built with --release, so my info may be outdated, but I would still prefer we don’t make the builds heavier
devolutions-gateway/Cargo.toml
Outdated
@@ -115,6 +116,7 @@ etherparse = "0.15" | |||
|
|||
# For KDC proxy | |||
portpicker = "0.1" | |||
network-monitor = { version = "0.1.0", path = "../crates/network-monitor" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Move to the "In-house" section (this is unrelated to the KDC proxy)
devolutions-gateway/Cargo.toml
Outdated
@@ -86,7 +87,7 @@ ngrok = "0.13" | |||
# HTTP | |||
hyper = "1.6" | |||
hyper-util = { version = "0.1", features = ["tokio", "server", "server-auto"] } | |||
axum = { version = "0.8", default-features = false, features = ["http1", "json", "ws", "query", "tracing", "tower-log"] } | |||
axum = { version = "0.8", default-features = false, features = ["http1", "json", "ws", "query", "tracing", "tower-log", "macros"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: This feature is only for development purposes, let’s not include it in prod.
axum = { version = "0.8", default-features = false, features = ["http1", "json", "ws", "query", "tracing", "tower-log", "macros"] } | |
axum = { version = "0.8", default-features = false, features = ["http1", "json", "ws", "query", "tracing", "tower-log"] } |
crates/network-monitor/src/lib.rs
Outdated
let monitor_result = match definition_clone.probe { | ||
ProbeType::Ping => do_ping_monitor(&definition_clone).await, | ||
ProbeType::TcpOpen => do_tcpopen_monitor(&definition_clone).await, | ||
ProbeType::Unknown(_) => return, // TODO: shouldn't happen, they should be filtered out. Create a separate ProbeType enum without Unknown? | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When parsing the config, I have an Unknown
case in the enum for the sake of forward-compatibility. I'd like to filter these out at the start of this function, in order to avoid dealing with the Unknown case here. Can I do that without having to manually write a duplicate of the ProbeType
enum? Can I start with an unknown with no unknown case, and then derive a version that has it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe remove the Unknown
variant?
If it shouldn't happen, you can handle the error when deserializing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will happen as we introduce new monitor types. In the long-term I'd like gateway to return some json-structured warning the client that some if its monitor types weren't supported, so I do want them to be represented in the array of monitors. But after filtering them out I feel like the unknown case should be gone so that I don't have to handle it in my match
. Maybe I can compose my ProbeType enum somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think you could have two different enums for that…!
#[derive(Deserialize)]
#[serde(untagged)]
enum ProbeType {
Known(KnownProbeType),
Unknown(String),
}
#[derive(Deserialize)]
#[serde(rename_all = "snake_case")]
enum KnownProbeType {
Ping,
TcpOpen,
}
This cleanly separates the supported values (KnownProbeType
) from fallbacks (Unknown(String)
), letting serde
match any unrecognized string to the Unknown
variant.
I’ve used a similar pattern in the token module:
devolutions-gateway/devolutions-gateway/src/token.rs
Lines 141 to 146 in 25213d1
#[derive(Serialize, Deserialize, Clone, Debug, PartialEq, Eq)] | |
#[serde(untagged)] | |
pub enum ApplicationProtocol { | |
Known(Protocol), | |
Unknown(SmolStr), | |
} |
But, for maximum flexibility, I think you should defer the parsing:
- Deserialize into a more permissive intermediate type (e.g.
String
orserde_json::Value
) - Convert to a strongly typed value using serde_json::from_value (or a manual match)
- Gracefully handle invalid or unknown values (log them at debug! level and proceed)
This pattern helps keep your data models lenient at the edges while maintaining strong typing internally, which seems to be exactly what you need.
In fact, I’ve preferred this approach in similar contexts, I think it often results in cleaner separation between raw input and trusted data.
You might find some inspiration in how the preflight API module handles this:
https://github.com/Devolutions/devolutions-gateway/blob/master/devolutions-gateway/src/api/preflight.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will happen as we introduce new monitor types. In the long-term I'd like gateway to return some json-structured warning the client that some if its monitor types weren't supported, so I do want them to be represented in the array of monitors.
By the way, the preflight API does that too.
Please have a look at the cookbook as well: https://github.com/Devolutions/devolutions-gateway/blob/master/docs/COOKBOOK.md#preflight-api
(I would appreciate if you could add a section for the monitor API there as well!)
devolutions-gateway/src/lib.rs
Outdated
let temp_dir = TempDir::new("dgw-network-monitor-test").expect("Could not create temp dir"); | ||
let temp_path: Utf8PathBuf = Utf8Path::from_path(temp_dir.path()) | ||
.expect("TempDir gave us a garbage path") | ||
.to_path_buf(); | ||
|
||
let monitoring_state = Arc::new(network_monitor::State::mock(temp_path)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: I would prefer we do not include an extra dependency (tempdir) just for mocking (testing). I’ll come back on this later. (EDIT: it would be fine as a "dev-dependency", but it’s not one right now)
) -> Result<Json<MonitoringLogResponse>, http::StatusCode> { | ||
Ok(Json( | ||
MonitoringLogResponse { | ||
entries: network_monitor::drain_log(monitoring_state).into() | ||
} | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Not faillible, so you don’t need to return a Result
SetConfigError::Io(_) => http::StatusCode::INTERNAL_SERVER_ERROR, | ||
SetConfigError::Serde(_) => http::StatusCode::INTERNAL_SERVER_ERROR | ||
} | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Use crate::http::HttpError
instead.
devolutions-gateway/devolutions-gateway/src/api/jrl.rs
Lines 37 to 41 in 25213d1
) -> Result<(), HttpError> { | |
let conf = conf_handle.get_conf(); | |
let jrl_json = serde_json::to_string_pretty(&claims) | |
.map_err(HttpError::internal().with_msg("failed to serialize JRL").err())?; |
This will solve your TODO as well, the returned error is already logged automatically in the IntoResponse
of HttpError
} | ||
|
||
#[debug_handler] | ||
async fn handle_set_monitoring_config( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Please, add the openapi/utoipa documentation. You can check the crate::api::session
module or another one, for reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very cool! Some of my structs are in my network-monitor crate, so they need to have the I've just read some of your other comments which render this issue mootToSchema
trait. How should I proceed, should my crate also depend on utoipa? Should that dependency be gated behind a feature? If yes then how is the flag passed from the main application to the crates?
} | ||
|
||
#[derive(Debug, Clone, Serialize)] | ||
pub struct MonitoringLogResponse { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: Should not be part of the public API.
pub struct MonitoringLogResponse { | |
pub(crate) struct MonitoringLogResponse { |
(Or private, but you may need that in the openapi declaration.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! I just did a very high level review, I’ll look at the details of your crate later 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR! I left some notes.
I'd also suggest running cargo clippy -p network-monitor
and fixing the lints.
} | ||
} | ||
|
||
pub fn mock(cache_path: Utf8PathBuf) -> State { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add #[cfg(test)]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mock is created by the mock function in DgwState
, which is not gated in this way. I'm not sure why though, maybe it should be? @CBenoit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one in Devolutions Gateway is #[doc(hidden)] to express a similar intention, but if we gate with #[cfg(test)], then the function is not available for the integration tests…
It’s not ideal though, I don’t really like that. But for the purpose of getting stuff done, I would say that it’s not a big deal, and we can surely do something about that later
MonitorsConfig { monitors: Vec::new() } | ||
} | ||
|
||
fn mock() -> MonitorsConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add #[cfg(test)]
.
crates/network-monitor/src/lib.rs
Outdated
let cancellation_token = CancellationToken::new(); | ||
let cancellation_monitor = cancellation_token.clone(); | ||
|
||
let definition_clone = (*definition).clone(); // TODO: is there a nicer way to do this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can clone before collecting.
pub async fn set_config(config: MonitorsConfig, state: Arc<State>) -> Result<(), SetConfigError> {
let file = fs::OpenOptions::new()
.create(true)
.write(true)
.truncate(true)
.open(get_monitor_config_path(&state.cache_path))?;
let mut config_write = state.config.write().await;
serde_json::to_writer_pretty(&file, &config)?;
let old_config = mem::replace(&mut *config_write, config);
// clone here
let new_config_set = config_write.monitors.clone().into_iter().collect::<HashSet<_>>();
let old_config_set = old_config.monitors.clone().into_iter().collect::<HashSet<_>>();
// `Iterator::cloned` here
let added = new_config_set.difference(&old_config_set).cloned();
let deleted = old_config_set.difference(&new_config_set);
let (new_cancellation_tokens, new_monitors): (Vec<_>, Vec<_>) = added
.into_iter()
.map(|definition| {
let cancellation_token = CancellationToken::new();
let cancellation_monitor = cancellation_token.clone();
// my personal preference is to use `Arc::clone`
// https://rust-lang.github.io/rust-clippy/stable/index.html#/clone_on_ref_ptr
let state = Arc::clone(&state);
// clone id before moving definition into the async block
let definition_id = definition.id.clone();
let monitor = async move {
loop {
let start_time = Utc::now();
let monitor_result = match &definition.probe {
ProbeType::Ping => do_ping_monitor(&definition).await,
ProbeType::TcpOpen => do_tcpopen_monitor(&definition).await,
ProbeType::Unknown(_) => return, // TODO: shouldn't happen, they should be filtered out. Create a separate ProbeType enum without Unknown?
};
state.log.write(monitor_result);
let elapsed = Utc::now() - start_time;
let next_run_in = (definition.interval as f64 - elapsed.as_seconds_f64()).clamp(1.0, f64::INFINITY);
select! {
_ = cancellation_monitor.cancelled() => { () }
_ = tokio::time::sleep(Duration::from_secs_f64(next_run_in)) => { () }
}
}
};
(
(definition_id, cancellation_token),
Box::pin(monitor) as Pin<Box<dyn Future<Output = ()> + Send>>,
)
})
.unzip();
let mut cancellation_tokens_write = state.cancellation_tokens.lock().await;
for definition in deleted {
cancellation_tokens_write[&definition.id].cancel();
cancellation_tokens_write.remove(&definition.id);
}
for (monitor_id, cancellation_token) in new_cancellation_tokens {
cancellation_tokens_write.insert(monitor_id, cancellation_token);
}
for monitoring_task in new_monitors {
tokio::spawn(monitoring_task);
}
Ok(())
}
crates/network-monitor/src/lib.rs
Outdated
let monitor_result = match definition_clone.probe { | ||
ProbeType::Ping => do_ping_monitor(&definition_clone).await, | ||
ProbeType::TcpOpen => do_tcpopen_monitor(&definition_clone).await, | ||
ProbeType::Unknown(_) => return, // TODO: shouldn't happen, they should be filtered out. Create a separate ProbeType enum without Unknown? | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe remove the Unknown
variant?
If it shouldn't happen, you can handle the error when deserializing.
devolutions-gateway/src/lib.rs
Outdated
@@ -54,6 +57,7 @@ pub struct DgwState { | |||
pub recordings: recording::RecordingMessageSender, | |||
pub job_queue_handle: job_queue::JobQueueHandle, | |||
pub credential_store: credential::CredentialStoreHandle, | |||
pub monitoring_state: Arc<network_monitor::State> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems ok to me. Should the network monitor be feature gated?
} | ||
|
||
#[cfg(test)] | ||
mod test { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically tests
crates/network-monitor/Cargo.toml
Outdated
version = "0.1.0" | ||
edition = "2021" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can copy the Cargo.toml of other crates.
Version 0.0.0
, publish = false
, edition = 2024
, and other fields.
crates/network-monitor/src/lib.rs
Outdated
let start_time = Utc::now(); | ||
|
||
let ping_result = async || -> anyhow::Result<TimeDelta> { | ||
let runtime = network_scanner_net::runtime::Socket2Runtime::new(None)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Start this runtime is a bit expensive, I think it's better to put it outside of a single request, but keep it alive when a monitoring is requested, and reuse it from place to place. It's an Arc, so you can safely pass it around as a parameter with Arc::clone
if necessary
}, | ||
} | ||
|
||
impl Into<Vec<u8>> for Icmpv6Message { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Prefer implementing From<Icmpv6Message> for Vec<u8>
when feasible (as in this case). This provides a more idiomatic interface and enables .into()
automatically via the blanket Into
implementation.
issue/style: However, using From
/Into
for binary encoding or parsing is considered unidiomatic. These traits are meant to express type conversions, not serialization or protocol-specific encodings. Encoding logic belongs in explicitly-named methods or dedicated traits (e.g., encode()
or to_bytes()
), which make the semantics clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: While .into()
is concise, I strongly discourage its use in non-generic, concrete code because it obscures intent. Prefer TargetType::from(source_value)
:
- The destination type is explicit at the call site.
- It immediately signals that a conversion is taking place and to which type.
- It removes the need for type inference and reduces coupling to surrounding context, aiding readability and future refactoring.
.into()
is better reserved for generic contexts or builder-style code where the target type is already constrained. Outside of those, the clarity benefit of from()
is worth the extra characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highlighting this comment: https://github.com/Devolutions/devolutions-gateway/pull/1446/files#r2242524595
suggestion: Look in the config
module, there is a enable_unstable
option. For now, I suggest we gate this feature behind this option, so you can freely change the API as necessary until the 2025.3 release.
crates/network-monitor/src/lib.rs
Outdated
use camino::*; | ||
use chrono::{DateTime, TimeDelta, Utc}; | ||
use serde::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: I highly discourage using glob imports. When done without precise conventions it leads to code harder to review and explore. There are some exceptions, but we should not decide in an ad-hoc way.
crates/network-monitor/src/lib.rs
Outdated
#[derive(Eq, PartialEq, Hash, Clone, Serialize, Deserialize, Debug)] | ||
#[serde(rename_all = "camelCase")] | ||
pub enum ProbeType { | ||
Ping, | ||
TcpOpen, | ||
#[serde(untagged)] | ||
Unknown(String), | ||
} | ||
|
||
#[derive(Eq, PartialEq, Hash, Clone, Serialize, Deserialize, Debug)] | ||
pub struct MonitorDefinition { | ||
id: String, | ||
probe: ProbeType, | ||
address: String, | ||
interval: u64, | ||
timeout: u64, | ||
port: Option<i16>, | ||
} | ||
|
||
#[derive(PartialEq, Clone, Serialize, Deserialize, Debug)] | ||
pub struct MonitorResult { | ||
monitor_id: String, | ||
request_start_time: DateTime<Utc>, | ||
response_success: bool, | ||
response_messages: Option<String>, | ||
response_time: f64, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: So far, we’ve been separating serialization (models) from the actual business logic. In practice, this means that we don’t use serde
in the "library" crates, and we have separate types for the internal representations, and the "Gateway API boundary representation" ("DTO"). The idea is that the crate should be used like any other library, and should not take decision on how the data is represented between the systems (e.g.: DVLS <-> Gateway) at this level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good. My thought process was to keep everything tightly contained in my monitoring crate, but I see how keeping the DTOs in my crate goes against the current architecture. However I'm not sure how to proceed other than duplicating the structs and enums, would that be okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that’s okay! They may or may not diverge at some point in the future based on our needs; the benefit is that you can refactor the internal structs without affecting the system API by mistake, and you can modify the system API without altering the internal details. IIRC, the network scanner went through something like this.
} | ||
|
||
pub(crate) fn write(&self, data: T) { | ||
let mut entries = self.entries.lock().expect("Couldn't get LogQueue lock"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Rust errors should not be capitalized. Also, I suggest to stay concise for this specific failure point, explaining we got a poisoned mutex (highly unlikely, since we should not panic while holding the lock as far as I can see).
let mut entries = self.entries.lock().expect("Couldn't get LogQueue lock"); | |
let mut entries = self.entries.lock().expect("poisoned"); |
crates/network-monitor/src/lib.rs
Outdated
fn get_monitor_config_path(cache_path: &Utf8PathBuf) -> Utf8PathBuf { | ||
cache_path.join("monitors_cache.json") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: The current logic assumes that cache_path is a directory and appends a hardcoded filename internally. This creates ambiguity and limits flexibility, as the consumer cannot control the final file path or name. It also introduces hidden behavior that is not obvious to user of the API.
suggestion: This logic should be delegated to the consumer. The library should treat cache_path as the full path to the cache file, allowing consumers to specify both the location and the name.
rationale: Since the library only manages a single cache file, there's no strong reason to impose a fixed internal filename. This change would improve API clarity and make the library more adaptable to various use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree completely! My original code was reading the config dir path from the existing env variable, but I didn't rethink the implementation when I switched to passing the path through the state struct.
crates/network-monitor/Cargo.toml
Outdated
tokio = { version = "1.45", features= ["full"] } | ||
tokio-util = { version = "0.7", features = ["full"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: Do not use "full"
, especially in libraries, this brings in new dependencies and code we don’t need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did notice that "full"
is avoided in the codebase, but I wanted to ask: it feels tedious to toggle features with such granularity, why can't the compiler figure out what we're not using? I'm genuinely asking, probably I'm about to find out that the compiler is not that smart and that by doing this I'm going to murder our compile time due to monomorphisation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in the case of tokio, maybe we can enable "full"? Across our crates we currently enable:
fs, io-std, io-util, macros, net, parking_lot, process, rt, rt-multi-thread, signal, sync, time
I think that's every feature enabled by "full"!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consumers won't be including every crate in the workspace though.
I think you only need macros
for network-monitor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s a fair point for tokio, but as a good practice I still encourage you to specify only what is needed.
Allan point also holds true: all the features are enabled for Jetsocat I think, but not necessarily for all the other binaries.
Also, for tokio-util you are adding extra features which are bringing extra crates into the dependency graph.
Of course, the compiler, and more specifically the linker is able to remove most of the dead code.
Typically, it’s not really a problem about what is in the final artifacts we deliver.
However:
- I’m pretty sure the code still gets compiled first, before its removed by the linker
- For each build from scratch, we need to download more and more crates
- From scratch builds are done way more often than you may think (especially in the CI)
- Our from scratch build time is indeed already not ideal today, and adding a few seconds to the mess may seem like not much, but that’s part of the reason why the builds are getting significant longer on the long term
- I still have hopes to improve that someday, so this is saving time for future me
- As an aside, we also track ALL the dependencies for each releases of the Gateway (SBOM), because we care about security issues found in the dependencies, and even though the situation can already be considered not ideal (many, many dependencies in there), I still prefer minimizing as much as possible the risk of being alerted about a vulnerability or a supply chain attack in the one crate we didn’t actually need
I do think that the feature "full" is very useful though, especially when you’re prototyping or exploring and you don’t know exactly the feature set you need yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something seems wrong here. Let’s update the OpenAPI stuff in a separate PR.
@@ -42,5 +41,9 @@ pub fn make_router<S>(state: crate::DgwState) -> axum::Router<S> { | |||
); | |||
} | |||
|
|||
if state.conf_handle.get_conf().enable_network_monitoring { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Instead of adding a new configuration key, please use get_conf().debug.enable_unstable
. This is the generic flag we use for unstable stuff. For stabilizing, we just drop the related if blocks.
@@ -42,5 +41,9 @@ pub fn make_router<S>(state: crate::DgwState) -> axum::Router<S> { | |||
); | |||
} | |||
|
|||
if state.conf_handle.get_conf().enable_network_monitoring { | |||
router = router.nest("/jet/monitoring", monitoring::make_router(state.clone())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming: For the API naming, what about /jet/monitor
? This seems more aligned with the rest of the routes.
question: Would it make sense to put that under the /jet/net
endpoint? E.g.: /jet/net/monitor
? Or is it also intended to do something else outside of network monitoring?
devolutions-gateway/src/openapi.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I’m okay with adjusting the small details with OpenAPI in a follow up PR. It’s fine to just brush up the documentation. It’s something hard to get right on the first try.
note: If you have a Jira ticket, add a |
babby's first professional rust commit
Place network monitoring behind a config flag, add swagger docs, return partial errors, more idomatic from/into usages, separte DTO from internal datastructures, clean up Cargo.toml changes,
…g from chrono to time
a0829fd
to
9782e2b
Compare
Introduces an endpoint monitor that checks the uptime of a list of hosts provided through a remotely-submitted configuration file. The monitor results are saved to a temporary in-memory buffer and can be fetched by means of a REST endpoint.
The monitor system is structured as an agent, meant to be driven by a third party (for example our DVLS), so the configuration is ephemeral and monitor results are deleted from the buffer after being fetched. Gateway is not itself the source of truth for the monitor configuration, and it does not persist the log entries.
Two authenticated endpoints are introduced:
Example requests for testing:
curl http://127.0.0.1:7171/jet/monitoring/log/drain -X POST -H "Authorization: Bearer $dgwkey"
Remaining tasks for a basic scanner:
Issue: DGW-302