This example demonstrates distributed tracing and metrics collection for a HTTP client and server using the OpenCensus Java libraries with the Stackdriver exporter running on Google Cloud Platform (GCP). The HTTP integration is based on the OpenCensus Jetty integration guide. There are two parts to the application: server and client. They will each be run on separate Compute Engine virtual machines. The client will send a continuous stream of HTTP requests to the server. The code here is the basis for the solution Identifying causes of app latency with Stackdriver and OpenCensus.
A common cause of high latency is large payload. With each HTTP request the application will send either a small or a large payload with a 5% probability of sending a large payload. The client sends both GET and POST requests with the POST requests sending the same data back to the client.
The application also performs some downstream processing of the payload to simulate 'business logic' in a typical application. These application characteristics simulate the distribution of latency in a real application and demonstrates tools for discovering causes of tail latency. To understand if something specific to the application code path is responsible for latency you may want to combine both trace and log data. The OpenCensus Stackdriver Log Correlation is used by the application for this purpose.
Retrying of HTTP requests is a common strategy for overcoming occasional request failures, such as from server restarts. Shorter timeouts can avoid the user having to wait for a long time on a stale connection but setting the timeout too short may not give sufficient time to process occasionally larger payloads. The application allows setting the HTTP timeout to experiment with the tradeoff and how the results are manifested in trace and monitoring instrumentation.
The trace and monitoring data may be viewed to analyze latency characteristics. See if you can correlate high latency with large payload. The example also allows for experimentation on optimizing efficiency by varying request load and measuring the effect on latency.
The steps described here can be run on a Linux or Mac OS command line or the GCP Cloud Shell.
- Install Java 8 or later OpenJDK and Maven.
- Select or create a GCP project. Go to the Project Selector page
- Make sure that billing is enabled for your Google Cloud Platform project. Learn how to enable billing.
- Download and install the Google Cloud SDK.
Clone this repository to your environment with the command
git clone https://github.com/GoogleCloudPlatform/opencensus-jetty-demo.gitEdit the environment variables in the file setup.env and import them into your development environment. Make sure that you change GOOGLE_CLOUD_PROJECT to the project id of your project.
cd opencensus-jetty-demo
source setup.envSet the GCP project to be the default
gcloud config set project $GOOGLE_CLOUD_PROJECTEnable the Stackdriver, Storage, logging, and BigQuery APIs:
gcloud services enable stackdriver.googleapis.com \
cloudtrace.googleapis.com \
compute.googleapis.com \
storage-api.googleapis.com \
logging.googleapis.com \
bigquery-json.googleapis.comThis example runs on Compute Engine using the credentials of the default Compute Engine service account. This service account has sufficient permissions to write the log, monitoring, and trace data.
Install the Java 8 OpenJDK and Maven:
sudo apt-get install maven openjdk-8-jdk -yThe code makes use of Google Cloud Storage (GCS). Create a bucket to store some test files
gcloud storage buckets create gs://$BUCKETGenerate the example JSON files for the test application.
cd util
python make_json.pyThis generates two JSON files that the test application will use to send from client to server. Sending the large file from client to server should show higher latency to to the payload size. Upload the files to the bucket just created.
gcloud storage cp small_file.json large_file.json gs://$BUCKET/
cd ..The server code and build file is contained in the server directory.
cd serverBuild the war file for to run the server.
mvn clean packageThe trace-log correlation on the server is done by a logging enhancer for the
Cloud Logging
Logback LoggingAppender,
which is configured via the file src/main/resources/logback.xml. That file is
bundled into the war archive in the Maven package command above. The logging
enhancer discovers the trace ID and adds it to the logs, which enables viewing
of log statements in the trace timelines. The project ID is also needed by
enhancer. If you run the example code in a location other than a GCE virtual
machine you may need to modify the logback.xml file as per the
OpenCensus Stackdriver Log Correlation instructions to include the project ID.
Create a virtual machine instance to run the Jetty server:
gcloud compute instances create $SERVER_INSTANCE \
--zone=$ZONE \
--boot-disk-size=200GBCopy the war file to the instance with SCP.
gcloud compute scp --zone=$ZONE target/jetty-server-tutorial-0.0.1.war \
$SERVER_INSTANCE:root.warSSH to the instance to configure the server.
gcloud compute ssh --zone=$ZONE $SERVER_INSTANCEInstall the Java 8 OpenJDK
sudo apt-get install openjdk-8-jdk -yInstall and configure Jetty. First download Jetty from Eclipse Jetty Downloads.
wget https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-distribution/9.4.19.v20190610/jetty-distribution-9.4.19.v20190610.tar.gz
tar -zxf jetty-distribution-9.4.19.v20190610.tar.gzSet JETTY_HOME:
export JETTY_HOME=$HOME/jetty-distribution-9.4.19.v20190610Create a Jetty base with a webapps directory
export JETTY_BASE=$HOME/oc_server_demo
mkdir -p $JETTY_BASE/webapps
cp root.war $JETTY_BASE/webapps/
cd $JETTY_BASEConfigure the Jetty base
java -jar $JETTY_HOME/start.jar --create-startd
java -jar $JETTY_HOME/start.jar --add-to-start=http,deploy
java -jar $JETTY_HOME/start.jar --add-to-start=logging-slf4j
java -jar $JETTY_HOME/start.jar --add-to-start=slf4j-simple-implThe final two of the commands above enable SF4J logging by Jetty.
Run the Jetty server with the web app
nohup java -jar $JETTY_HOME/start.jar &Send a request to check that it is working
curl http://localhost:8080/You should see the message 'OK' printed on a single line. If you see an error check the Jetty output in nohup.out:
tail -f nohup.outIf the server started successfully you should see a message including Server:main: Started.
Send a request that generates trace information
curl http://localhost:8080/testThis should return a JSON response with a series of numbers. Check that you can see the logs written to Google Cloud Logging in the Cloud Console Log Viewer. You should see a log entry that includes the text 'doGet'.
Navigate to the Trace timeline in the Cloud Console. You should see a few points in the scatter plot. Click on the points and find the one for the request to the /test endpoint sent using Curl above. It should be called Recv./test. Click the Show logs button. Notice that there is a log entry with the text 'doGet'.
Exit the jetty_server virtual machine, returning to the Cloud Shell.
exitFor the purposes of this tutorial, the code will be built on the client machine.
From the shell of your development environment, create a virtual machine instance to run the Jetty client:
gcloud compute instances create $CLIENT_INSTANCE \
--zone=$ZONE \
--boot-disk-size=200GBCopy the repository directory tree to the instance.
cd ../..
tar -zcf opencensus-jetty-demo.tar.gz opencensus-jetty-demo/
gcloud compute scp --zone=$ZONE opencensus-jetty-demo.tar.gz $CLIENT_INSTANCE:.SSH to the instance to set up the client.
gcloud compute ssh --zone=$ZONE $CLIENT_INSTANCEInstall the Java 8 OpenJDK and Maven, as above.
sudo apt-get install maven openjdk-8-jdk -yUnzip the bundle
tar -zxf opencensus-jetty-demo.tar.gz
cd opencensus-jetty-demoThe client code and build file is contained in the client directory.
source setup.env
cd client
mvn clean package appassembler:assembleRun the client application with the command
nohup target/appassembler/bin/JettyTestClient $SERVER_INSTANCE 8080 $BUCKET \
$NUM_THREADS $HTTP_TIMEOUT &This will send a continuous stream of HTTP requests to the server with name $SERVER_INSTANCE on port 8080. The payloads will be read from $BUCKET using $NUM_THREADS simultaneous threads with a timeout of $HTTP_TIMEOUT.
Monitor the nohup.out, checking for errors to the standard out
tail -f nohup.outLog messages will be sent to Google Cloud Logging. You can view these in the Log Viewer under GCE VM instances.
View trace data in the Trace list page. The application will send GET and POST requests alternately. Try clicking on a few trace points in the Trace list scatter plot. Notice that the HTTP method type is displayed in the trace detail.
Payload size can influence latency. 95% of requests will send a small payload and 5% will send a large payload. Clicking on the Show Events button on the Trace timeline will enable viewing of the payload size. Notice that the traces with higher latency tend to be the ones with larger payload.
In the trace timeline click on the Show logs button to display the logs correlated with a trace.
Try clicking on some of the high latency traces to see if any of the requests are retried. To see how retries are displayed you may try setting the $HTTP_TIMEOUT command line option to a lower value and restarting the client application to see how retries are demonstrated in the trace results.
To see metrics data open the Monitoring menu and select Add your project to a Workspace with a new workspace. Under the Resources | Metrics menu enter the string 'opencenus' in the Find resource type and metric textfield. You will see a list of metrics displayed as you type-ahead. Select any of the metrics to see a chart of the metric values. Specifically, look at the metrics opencensus/opencensus.io/octail/latency and opencensus/opencensus.io/http/client/completed_count.
You can save the chart as part of a new dashboard.
Load on both the client and server is another factor that can affect latency. Shrinking the size of the virtual machines is a great way to optimize efficiency if they can still operate with satisfactory latency. You can check the CPU load by creating a chart in the Stackdriver moniotring user interface using the metric instance/cpu/utilization. Create charts for both client and server. Increase the number of threads using the $NUM_THREADS parameter to run the application.
When increasing the number of threads you increase the request load on both client and server. To test the effect of client CPU on measured latency, create another client with a smaller VM, you can repeat the test with a smaller virtual machine instance.
From the shell of your development environment:
gcloud compute instances create $SMALL_CLIENT \
--machine-type=g1-small \
--zone=$ZONE \
--boot-disk-size=200GBThen follow the same instructions for the client above.
Delete the project.
- In the GCP Console, go to the Projects page. Go to the Project page.
- In the project list, select the project you want to delete and click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.