Skip to content

Dummy file for writing the review in the PR. #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
333 changes: 333 additions & 0 deletions runner-review.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,333 @@
#!/bin/bash
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add -e here.
This flag will make your script to stop on errors, which will unmask some issues (for example when a command such as tcpdump is not available).

Copy link
Owner

@mohamad7788 mohamad7788 Jan 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct , this will solve a lot of issues(like command-not found , etc....) , but the script will stop also in case the input command execution fail that's not good - the input command executions status should be handled by the main script .
So the suggested options here are :-

  1. To disable the -e while command execution , and enable it after it done like the below
set +e
${command}
set -e
  1. To run the command and add an option to run success process by || , then the whole line will return 0 so the script will not exit .
    ${command} || true ;


#****************************************************************************************************************\
# *
# Script Name : runner.sh *
# *
# Purpose : Get an input command and optional flags , and print a summary of the command executions *
# that include return-codes summary *
# *
# Inputs : Command for executions *
# Number of times to run the command *
# *
# Optional flags : Number of allowed failed command invocation *
# creating network traffic capture - for failed execution *
# crating system/resources metrics measured during command execution - for failed execution *
# creating log for all system calls ran by the command - for failed execution *
# creating command output logs (stdout , stderr) - for failed execution *
# Running with debug mode that show each instruction executed by the script *
# *
# *
# *
# Written by : Mohamad Abo-Ras ([email protected]) *
# *
# Date : Jan01,2018 *
# *
# Changes History: *
# *
# Date | By | Changes/New features *
# ----------+------------+----------------------------------------------- * *
# 01.01.2018 | Mohamad | script created for 1st version . * *
# 8 | Abo-Ras | * *
# ----------+------------+----------------------------------------------- * *
#****************************************************************************************************************/




## define global variables
export DEBUG="set +x" ## disabled debug
export execution_count=0 ## total executions counter
export DEFAULT_EXIT_CODE=255 ## default exit-code
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good practice to avoid global variables whenever possible.
For example, DEFAULT_EXIT_CODE is being used only in the finalize method, so it can be a local variable.

Also, the export command makes the variable an environment variable (which is then accessible by child process'), and I don't think it's needed in our case.

export EXEC_NAME=$(basename $0) ## script name
export OUTPUT_DIR="`pwd`" ## path for output & generated files
export TIME_SIGNUTURE="date +"%Y%m%d_%H%M%S"" ## current date&time
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that putting this logic in a function makes more sense.
It's a bit misleading because you have to remember using $(...) to evelaute the time signature.

Copy link
Owner

@mohamad7788 mohamad7788 Jan 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possible idea for improvement & code tuning .
All variables except TIME_SIGNATURE could be initiated by a function , but the variable
TIME_SIGNATURE will be defined into the memory that include date command with time formatting (date+time , like : 20180104_180400)
and it's executed more than one time and get unique value , so it will be more efficient as the script will get the value from the memory , comparing of calling a separate function many times that it will take more time than memory access .



## define local functions/methods

## print the summary report
# And exit the program with the most frequent return-code .
# in case no returned-codes (such case if the script killed while the first invocation) , then it will exit with default-return-code (predefined as 255)
finalize() {

frequent_rc=${DEFAULT_EXIT_CODE}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use local to make the variable valid only in the function scope otherwise, it will be declared globally.

if [[ ! -z ${list_rc[@]} ]] ; then

printf "|------Return Codes Summary-----|\n"
printf "| Return-Code | Occurrence |\n"
printf "|-------------------------------|\n"
echo ${list_rc[@]} | tr ' ' '\n' | sort | uniq -c | sort -rnk 1 | awk '{ printf "|\t%2d\t|\t%2d\t|\n", $2, $1 }'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have here 5 calls to subprocess, which is a lot of overhead for finding the max value in an array.
I recommend using a simple for loop for this task.

printf "|-------------------------------|\n"

frequent_rc=$(echo ${list_rc[@]} |tr ' ' '\n' | sort | uniq -c | sort -rnk 1 | head -1 | awk '{print $2}')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this line is a code duplication of line 60.

fi

exit ${frequent_rc}

}

## signal handler
# catch system signals (SIGINT , SIGTERM)
# handler : call finalize
# note : trap will not catch SIGKILL signal as OS doesn't allow it
trap finalize SIGINT SIGTERM



## print script usage
usage() {

cat <<EOF
Usage: ${EXEC_NAME} [OPTION].... [command [arg ...]]

-c COUNT Number of times to run the given command
--failed-count N Number of allowed failed command invocation attempts before giving up
--net-trace For each failed execution, create a 'pcap' file with the network traffic during the execution.
--sys-trace For each failed execution, create a log for each of the following values, measured during command execution
(disk io,+memory,processes/threads,cpu usage of the command network card package counters)
--call-trace For each failed execution, add also a log with all the system calls ran by the command
--log-trace For each failed execution, add also the command output logs (stdout, stderr)
--debug Debug mode, show each instruction executed by the script
EOF

exit ${DEFAULT_EXIT_CODE}
}

# generate network traffic capture by calling to tcpdump
# default : collect 4000 packets for all the network interfaces formatted with default timestamp with extra verbose output
# note : those values is configurable and could be any value depend on the local machine/server and which details needed to capture from network traffic
exec_net_trace () {

export NET_PCAP="${OUTPUT_DIR}/net_trace_`${TIME_SIGNUTURE}`.pcap"
[[ ${NET_TRACE} ]] && /usr/sbin/tcpdump -i any -vvv -nnN -tttt -s0 -c 4000 -w ${NET_PCAP}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain what each flag is doing and why it's needed?

Copy link
Owner

@mohamad7788 mohamad7788 Jan 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-i any = listen to capture packets from all interfaces
Needs : as we don't have any assumption about the current network interfaces .
-vvv = to print more verbose outputs like : ttl , identification, total length and options in an IP packet are printed , more fields are printed from NFS reply packets ,etc...
Needs : to collect more fields & data info .
-nnN = protocols , port numbers , DNQ of host names will not be converted to names either .
Needs : keep the sources/destinations/protocols without any names convert .
-tttt = Print a timestamp in default format proceeded by date on each dump line .
Needs : put packet time & date
-s0 = use the required length to catch whole packets(default is 68 bytes - is adequate for IP, ICMP, TCP and UDP but may truncate protocol information from name server and NFS packets )
-c 4000 = capture only 4000 packets .
In this case , it set to 4000packets , it could be ignore-able if the execution-command runs long time , so keep tcpdump child-process run in background till the command execution finish , and then kill the tcpdump process .
-w ${NET_PCAP} = capture and save the file in a .pcap format .


}



# generate system/resources metrics measured by using OPVA-HP extract utility : disk io , memory , processes/threads and cpu usage of the command,network card package counters
# time frame to generate from start to end date with format : %m/%d/%Y %H:%M:%S" , e.g. 01/01/2018 12:15:00
# all the extracted data saved to common file "SYS_TRACE_OUT"
# + alternative utility is to use named sar "/usr/bin/sar" - Collect, report, or save system activity information
# time-frame(start to end) time format is "HH:MM:SS"
# example : sar -n ALL -P ALL -r -u -s ${START_TIME} -e ${END_TIME}
exec_sys_trace() {

export SYS_TRACE_OUT="${OUTPUT_DIR}/sys_trace_`${TIME_SIGNUTURE}`.out"
[[ ${SYS_TRACE} ]] && /opt/perf/bin/extract -GDNUY -xp -b \"${START_DATE}\" -e \"${END_DATE}\" -f ${SYS_TRACE_OUT}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have sar installed, but the report is not being generated.

Copy link
Owner

@mohamad7788 mohamad7788 Jan 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sar collect all performance data on an on-going basis , similar to extract , but with different flags & different time format(should be HH:MM:SS)
example : sar -n ALL -P ALL -r -u -s 12:00:00 -e 12:30:00


}







#************************************************************************************************************
# *
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have just add all the code below to a function called main.

# m a i n p r o g r a m f l o w *
# *
#************************************************************************************************************






# Test getopt : run getopt for test , if fail then exit
getopt --test > /dev/null
if [[ $? -ne 4 ]]; then
echo "getopt Test : `getopt --test` failed in this environment."
exit ${DEFAULT_EXIT_CODE}
fi




# setting the valid options : short&long options
# -c => mandatory flag & input argument required
# --failed-count => optional flag & input argument required
# --net-trace => optional flag & input argument not required
# --sys-trace => .... same as previous
# --call-trace => .... same as previous
# --log-trace => .... same as previous
# --debug => .... same as previous
SHORT_OPTIONS=c:
LONG_OPTIONS=failed-count:,net-trace,sys-trace,call-trace,log-trace,debug

OPTS=$(getopt --options=${SHORT_OPTIONS} --longoptions=${LONG_OPTIONS} --name "$0" -- "$@")
if [[ $? -ne 0 ]]; then
# e.g. $? == 1
# then getopt has complained about wrong arguments to stdout
exit 2
fi


# read getopt's output this way to handle the quoting right:
eval set -- "${OPTS}"



# parse each options from argument list until getting separate --
while true; do
case "$1" in

-c)
# Getting & store the execution count/times
export COUNT="$2"
shift 2
;;

--failed-count)
# Getting & store the failed execution count
export FAIL_COUNT="$2"
shift 2
;;

--debug)
# Setting the debug options
export DEBUG="set -xvf"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the vf flags needed?

Copy link
Owner

@mohamad7788 mohamad7788 Jan 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those flags where set in order to get more output for each executed line , in order to compare the line(from the code) and the outline(with debug) , so those flags left over when testing the script , so i think that the flag is -x is enough .

-v
Print shell input lines as they are read.
-f
Disable filename expansion (globbing).

shift
;;

--net-trace)
# trigger tcpdump to generate network traffic : we used - /usr/sbin/tcpdump
export NET_TRACE="Y"
shift
;;
--sys-trace)
# triggering OPVA Extract utility to generate system resources measured matrices : we used /opt/perf/bin/extract
export SYS_TRACE="Y"
shift
;;

--call-trace)
# trigger the strace to generate call trace - we used /usr/bin/strace
export CALL_TRACE="Y"
shift
;;

--log-trace)
# trigger the log trace by printing the stdout & stderr to separate log files
export LOG_TRACE="Y"
export LOG_OUT="${OUTPUT_DIR}/log_trace_`${TIME_SIGNUTURE}`.out"
export LOG_ERR="${OUTPUT_DIR}/log_trace_`${TIME_SIGNUTURE}`.err"
shift
;;


--)
# once we arrived to -- this end of arguments parsing
shift
break
;;
*)
# when the input from OPTS no matching any of above , then it means the input not valid/or something wrong happened
usage
;;
esac
done


# enable/disable the debug option depend on the input options (if --debug was passed)
${DEBUG}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many blank lines, please keep the code compact.




# handle non-option arguments , check if the input argument(not for the OPTS) mount is only 1 argument(one argument for the command parameter)
if [[ $# -ne 1 ]]; then
usage
fi


# getting the command to run from command-line argument
COMMAND="$1"

# declare array to store all the returned value for each command execution
set -a list_rc







# command executions loop - main invocations steps
# Test and check the executions time and allowed fails counters
while [[ ${execution_count} != ${COUNT} ]] && [[ ${failed_executions} != ${FAIL_COUNT} ]] ; do
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • When FAIL_COUNT is not being passed to the script, this loop will not be executed even once, for example:
    ./runner.sh -c 1 true will return code 255.

  • Comparing numbers is done using -eq -ne etc... , != = is used for comparing strings.


export START_DATE=$(date +"%m/%d/%Y %H:%M:%S")
export CALL_TRACE_OUT="${OUTPUT_DIR}/call_trace_`${TIME_SIGNUTURE}`.out"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If strace will be called more than once in less than a second, the log will be overridden.
You can utilize mktemp for this task.

Copy link
Owner

@mohamad7788 mohamad7788 Jan 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the files allocation/creation decision - i thought about two option :-

  1. to set a filename pattern for almost of generated files :
    "output-path/metric-name_timestamp.out"
  2. To use mktemp .

The only difference here is when using mktemp the file will be created with zero size , instead of the 1st option , the filename will be created only when redirect the output to it , and another advantage is that we have more details about the file & the content as it include the timestamp .

But we can use mktemp in this case , in order to not override another files .



# if --net-trace enabled then trigger net traffic monitor
[[ ${NET_TRACE} ]] && exec_net_trace 1>/dev/null 2>&1 &
wait_pids="${wait_pids} $!"



# --log-trace enable
if [[ ${LOG_TRACE} ]] ; then

# --call-trace
if [[ ${CALL_TRACE} ]] ; then

/usr/bin/strace -ttT -o ${CALL_TRACE_OUT} ${COMMAND} 1>>${LOG_OUT} 2>>${LOG_ERR}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't have an assumption about the location of strace's binary, it does not have to be the same on all systems.
Just use strace ... and let the user configure the PATH variable.

Copy link
Owner

@mohamad7788 mohamad7788 Jan 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this comment , and agree with you .


# getting return code from strace.out file 'exit_group' printted in the last line in strace.out and include exit-code exit_group(exit-code)
this_rc="`grep exit_group ${CALL_TRACE_OUT} | tail -1 | cut -d'(' -f2 | cut -d')' -f1`"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strace returns the status of the command which had been executed.
No need for string manipulation.

Copy link
Owner

@mohamad7788 mohamad7788 Jan 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already tried this in the design/development stage , and referred to the command manual doc , as it will return 0 when the command found else it will return non-zero .
The strace version is strace -- version 4.5.18

example :-

$ ls x ; echo $?                                         
ls: x: No such file or directory
2
$ strace ls x 1>/dev/null  2>/dev/null ; echo $?
0

In the this example : the command for execution is ls x and it fail , but the strace itself success to execute "failed-command"
It will fail when the command not found and return non-zero value

$ strace lss ; echo $? 
strace: lss: command not found
1



else
# redirect the stdout & stderr to separate file in filesystem
${COMMAND} 1>>${LOG_OUT} 2>>${LOG_ERR}

this_rc=$?
fi

else
# --call-trace
if [[ ${CALL_TRACE} ]] ; then

/usr/bin/strace -ttT -o ${CALL_TRACE_OUT} ${COMMAND}
this_rc="`grep exit_group ${CALL_TRACE_OUT} | tail -1 | cut -d'(' -f2 | cut -d')' -f1`"


else
${COMMAND}
this_rc=$?
fi



fi


wait ${wait_pids}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you have to kill tcpdump here? why waiting?


export END_DATE=$(date +"%m/%d/%Y %H:%M:%S")


# save the returned-code of the command execution into the list
list_rc[execution_count]=${this_rc}

# return-code handler in two cases success/fail
# failure case : increase the failed execution counter to be updated for the next execution in while loop
# success case : delete the pcap file that the generated only if net-trace is enabled
[[ ${this_rc} != 0 ]] && ((failed_executions++))
[[ ${this_rc} == 0 ]] && [[ ${NET_TRACE} ]] && rm -f ${NET_PCAP}

# update the loop counter in any case
((execution_count++))

# another case : if the execution failed - then trigger the sys-trace generation
[[ ${this_rc} != 0 ]] && exec_sys_trace 1>/dev/null 2>&1

done


# calling to finalize to print the summary , when the script done
finalize