Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed paper/template.pdf
Binary file not shown.
117 changes: 76 additions & 41 deletions paper/template.tex
Original file line number Diff line number Diff line change
Expand Up @@ -58,26 +58,26 @@

\begin{document}

\title{Usage and Structure of continuous integration as configuration?}

\author{\IEEEauthorblockN{1\textsuperscript{st} Given Name Surname}
\IEEEauthorblockA{\textit{dept. name of organization (of Aff.)} \\
\textit{name of organization (of Aff.)}\\
City, Country \\
email address or ORCID}
\and
\IEEEauthorblockN{2\textsuperscript{nd} Given Name Surname}
\IEEEauthorblockA{\textit{dept. name of organization (of Aff.)} \\
\textit{name of organization (of Aff.)}\\
City, Country \\
email address or ORCID}
\and
\IEEEauthorblockN{3\textsuperscript{rd} Given Name Surname}
\IEEEauthorblockA{\textit{dept. name of organization (of Aff.)} \\
\textit{name of organization (of Aff.)}\\
City, Country \\
email address or ORCID}
}
\title{Usage and Structure of Continuous Integration as Configuration?}

% \author{\IEEEauthorblockN{1\textsuperscript{st} Given Name Surname}
% \IEEEauthorblockA{\textit{dept. name of organization (of Aff.)} \\
% \textit{name of organization (of Aff.)}\\
% City, Country \\
% email address or ORCID}
% \and
% \IEEEauthorblockN{2\textsuperscript{nd} Given Name Surname}
% \IEEEauthorblockA{\textit{dept. name of organization (of Aff.)} \\
% \textit{name of organization (of Aff.)}\\
% City, Country \\
% email address or ORCID}
% \and
% \IEEEauthorblockN{3\textsuperscript{rd} Given Name Surname}
% \IEEEauthorblockA{\textit{dept. name of organization (of Aff.)} \\
% \textit{name of organization (of Aff.)}\\
% City, Country \\
% email address or ORCID}
% }

\maketitle %% restore the layout

Expand All @@ -87,7 +87,10 @@
% NOTE: have to careful when doing time comparisons as currently I am doing it as time windows so "last 2 years" doesn't work as that is 2 years of data not 1 year

\begin{abstract}
\sm{is Sanner double blind?}
\sm{title needs to reflect the center piece of the paper.
How about something like:
Usage of Continuous Integration Services on GitHub
}

Continuous integration (CI) is widely popular
for instance as support for fast-paced development cycle.
Expand All @@ -96,7 +99,7 @@
Pose the question of whether things have changed since people looked at it last.}
This paper investigates how CI is currently used by projects on GitHub.
To our knowledge, the last major study was in 2016
by \citeauthor{Hilton2016}\citep{Hilton2016}
by Hilton et al.\citep{Hilton2016}
and we want to see how things changed in the last 5 years.
% In doing so compared our results against \citet{Hilton2016} work to see if their has been a increase in usage.

Expand All @@ -112,15 +115,16 @@
\sm{for the structure, I'd suggest to use something else than the comment bit}
% In terms of structure we found that configuration files are written with no comments normally. We suggest at the end further research is needed to get a better understanding of this growing field.

\jl{this is what I submitted last week as the abstract, it looks like we might be able to update it via the site}
Continuous integration (CI) is becoming more popular as software development moves to an Agile fast paced development life cycle.
This paper investigates how CI is currently used by projects on GitHub.
To our knowledge, the last major study in 2016 by Michael Hilton and we want to see how things have changed over the last 5 years.

We got 94,379 open source projects from GitHub to answer these questions.
We found a shift in CI services being used in particular the rise of GitHubActions and we found similar usage numbers.
Additionally, we looked at structure of CI configuration, we found that average configuration files remained small (under 100 lines) but over 50\%
used external scripts for additional functionality. We suggest at the end further research is needed to get a better understanding of this growing field.
%% This is the old abstract, the system can be updated, we don't really need it here
% \jl{this is what I submitted last week as the abstract, it looks like we might be able to update it via the site}
% Continuous integration (CI) is becoming more popular as software development moves to an Agile fast paced development life cycle.
% This paper investigates how CI is currently used by projects on GitHub.
% To our knowledge, the last major study in 2016 by Michael Hilton and we want to see how things have changed over the last 5 years.
%
% We got 94,379 open source projects from GitHub to answer these questions.
% We found a shift in CI services being used in particular the rise of GitHubActions and we found similar usage numbers.
% Additionally, we looked at structure of CI configuration, we found that average configuration files remained small (under 100 lines) but over 50\%
% used external scripts for additional functionality. We suggest at the end further research is needed to get a better understanding of this growing field.

\end{abstract}

Expand All @@ -129,34 +133,65 @@
\section{Introduction}
\label{Introduction}

Continuous integration (CI) is becoming more popular over the last few years. This can be seen by how major version control hosting services GitHub, Bitbucket and Gitlab have all released CI products or have been improving their CI products. In terms of research, Infrastructure as Code in \citet{Rahman2019} which does a systematic mapping of research in that area. For Continuous Integration with \citet{Shahin2017} which does another systematic review on how it is used. These two papers demonstrate some of breadth of research that has taken place. In addition you have papers like Google's Innovation Factory: Testing, Culture, and Infrastructure \citet{Copeland2010} which demonstrate some of the depth that the papers go into.
\sm{I don't think this here gives the user a good intro.
There are too many things going on, without actually saying much:}
Industry fully embraces continuous integration (CI)
and major code hosting services such as GitHub, Bitbucket, and Gitlab
have CI products.
In terms of research, Infrastructure as Code in \citet{Rahman2019} which does a systematic mapping of research in that area. For Continuous Integration with \citet{Shahin2017} which does another systematic review on how it is used. These two papers demonstrate some of breadth of research that has taken place. In addition you have papers like Google's Innovation Factory: Testing, Culture, and Infrastructure \citet{Copeland2010} which demonstrate some of the depth that the papers go into.

Continuous Integration is a process of automatically compiling, running tests and checking that the product works. This is can be combined with Continuous Delivery where the product is deployed or released after it has gone through successfully CI.

This can get complicated quickly therefore Configuration as Code (or Infrastructure as Code) is used to configure it. The main kind of configuration format used for this is Yaml followed by Xml and Java based scripting formats.
\sm{:end}


\sm{this is an attempt to frame the goal of the paper:}

Hilton et al.\citep{Hilton2016} investigate in \sm{can you determine from the data set when this happened?}
how CI systems are used in open source project.
It has been \ins{X} years since then,
and with the rapid changes in industry usage,
we wanted to understand how CI usage has changed since then.

In order to look at our first theme CI usage we looked at In Usage, Costs, and Benefits of Continuous Integration Open-Source Projects \cite{Hilton2016}. They looked closely at the usage of CI as well. As we are looking at CI usage as well we are going answer the first three questions from their theme \enquote{Usage of CI}.
To this end, we revisit their research questions and based on their data set
as well as a current one,
we assess how things have changed.
Specifically, we will answer the following research questions,
based on the first three of Hilton et al.\citep{Hilton2016}:

% In order to look at our first theme CI usage we looked at In Usage, Costs, and Benefits of Continuous Integration Open-Source Projects \cite{Hilton2016}. They looked closely at the usage of CI as well. As we are looking at CI usage as well we are going answer the first three questions from their theme \enquote{Usage of CI}.
\begin{itemize}
\item \textbf{RQ1} What percentage of open-source projects use CI?
\item \textbf{RQ2} What is the breakdown of different CI services?
\item \textbf{RQ3} Do certain types of projects use CI more than others?
\item \textbf{RQ 3.5} Do more recently contributed projects use CI?
\item \textbf{RQ2} Which CI services are used?
\item \textbf{RQ3} Do certain types of projects use CI more than other types of projects?
\item \textbf{RQ 3.5} Are newer projects more likely to use CI?
\end{itemize}

We will be using doing a comparison with our corpus against theirs in order to work out what has changed over the last 4 years.

It would have been really interesting to do a full in depth analysis of each CI configuration format like \citet{Gallaba2018} does for Travis. However we can look at the general structure of all the CI configuration files allowing for comparisons to be made between configuration files. As that will allow comparisons to be made more easily otherwise comparing very specific features would have been harder.
\sm{I rephrased those questions, because before they didn't really make sense to me from a language perspective.
Do they still make sense?
Can we adapt things to this?}

% We will be using doing a comparison with our corpus against theirs in order to work out what has changed over the last 4 years.

In addition, we investigate how the CI configuration files are used.
Currently, the tooling for these configuration languages is minimal
and from our own practice, we know that maintaining them can be a lot of effort.
To get a high-level overview of how they are used,
we will answer the following research questions:
%
% It would have been really interesting to do a full in depth analysis of each CI configuration format like \citet{Gallaba2018} does for Travis. However we can look at the general structure of all the CI configuration files allowing for comparisons to be made between configuration files. As that will allow comparisons to be made more easily otherwise comparing very specific features would have been harder.
\begin{itemize}
% Could have "How are configuration files broken down?"
% syntax errors, line counts, comment analysis all available but it's not quite coherrent
% so combining into one and maybe even deleting script usage might be for the best.... or just adding it in there....
% as it's about quality not quanity and there is a strict limit on words
\item \textbf{RQ4} What are the common errors when loading yaml configuration?
\item \textbf{RQ5} How are comments used in the configuration?
\item \textbf{RQ6} How are external scripts used within the configuration?
\item \textbf{RQ4} What are the common errors when loading yaml configuration?
\end{itemize}
\sm{reorder, should also be relabeled}

\sm{would be good to eliminate the use of vspace}
\section{Related Works}
\vspace*{-0.05in}
\subsection{Continuous Integration}
Expand Down