NAME

csvdb - Database Operations for CSV Files

VERSION

Version 0.0.2

LICENCE AND COPYRIGHT

Licensed under WTFPL.

INTRODUCTION

        The program was written to provide for database
        operations on csv files. You can run use it to
        analyze csv files as if they were database tables.

SYNOPSIS

./csvdb.pl [options]

csvdb interprets all .csv files in the current directory as tables.

You can of course have the whole git repository somewhere else and have its csvdb.pl on the path.

You can specify a data directory on the command line like so:

./csvdb.pl -d data/xyz/data/ -s "select id, name from employee"

This tells the program to look for employee.csv within the directory data/xyz/data.

Alternatively, you can also specify the xyz directory using an environment variable DATASET, which will make the data directory to default to, if DATASET is xyz, data/xyz/data:

export DATASET=xyz
./csvdb.pl -s "select id, name from employee"

Or, you can do it all in one line:

DATASET=xyz ./csvdb.pl -s "select id, name from employee"

So if, for example, you do only this:

./csvdb.pl -s "select distinct id, name from employee order by name"

Then this expects to find a file employee.csv (in the current directory, because a dataset directory was not defined), with at least a header line containing something like id, name, which are going to be the column headers.

  Notice that column headers are going to be simplified in the sense
  that all special characters, including spaces, are replaced by
  underscores. If in doubt, use the "-c" option to get a list of
  column headers for your query.

Command line parameters can take any order on the command line.

Options:

  General Options:

  -help            brief help message (alternatives: ?, -h)
  -man             full documentation (alternatives: -m)
  -d               debug (alternatives: -debug): Print out sql statements.

  csvdb Options:

  -dir data        The location of the csv files. Default: ./data/

  -c somefile      Show the csv columns in somefile.csv in alphabetical order
  -k somefile      Show the csv columns in somefile.csv in their original order

  -s "select..."   Execute a select statement

  -v somefile.sql  Execute a select statement in some file

  -p               Interpret the following as key=value pair for a query

  -r               Output the result of the query in csv format
  -h               When using -r, do not output the column headers
  -q               When using -q, quote all columns (also numbers)

  Other Options:

  -documentation   Recreate the README.md (needs pod2markdown)

OPTIONS

-help

Print a brief help message and exits.
-man

Prints the manual page and exits.
-documentation

Regenerate the README.md file
-dir

Specify the directory within which the data files (.csv files) are to be located. Default is a directory "data" under the current directory.
-d

Print the generated sql statement to stderr. This can be useful if you receive errors from the database engine, in order to understand what was actually tried to select.
-c

Show the columns in a given file, in alphabetical order.

For example:
```
  ./csvdb.pl -c employee
  id
  name
```
This assumes that there is a file employee.csv in the data directory.
-k

Show the columns in a given file, in their original order.

For example:
```
  ./csvdb.pl -k employee
  name
  id
```
This assumes that there is a file employee.csv in the data directory.
-s

Run a query from the command line. For example:
```
  ./csvdb.pl -s "select distinct id, name from employee order by name"
  2, Hinz
  1, Kunz
```
This assumes that there is a file employee.csv in the data directory.
-v

Run a query from a file. For example:
```
  ./CSVdb.pl -v employees.sql
```
This assumes there is an employees.sql (you can give a path to that file) which contains the actual query. This file is called a view.

Lines starting with a # are ignored, and all other lines are concatenated into one single line. If you are unsure about the resulting query, use the -d command line option.

For example:
```
  #
  # Select for EMPLOYEE
  #
  # Parameters:
  #
  # none
  #
  select
    e.id                      as EmpId,
    e.name                    as EmpName
  from employee e
  order by
    e.name
```

-p

Any query, be it on the command line (where it doesn't make too much sense) or from a view file, can contain parameters. These can be specified on the command line, and if they exist, they are going to be replaced into the query. Here is a more complex query which uses CUSTOMER that it replaces into the query, and also an optional _WHERE_ placeholder which, if not specified on the command line, will be removed. Also, the following example shows how to join multiple tables (in the given example, we want to see from some pipeline.csv file only the products which we find in a products.csv file):

  ./csvdb.pl -v customer.sql -p CUSTOMER="New York Times" -p _WHERE_="and p.acv_keur > 100"

Here is a more complex view:

  #
  # Select for CUSTOMER
  #
  # Parameters:
  #
  # CUSTOMER
  # _WHERE_ (optional)
  #
  select
    p.country                 as Country,
    p.bp_org_name             as Customer,
    p.opportunity_owner_name  as Opp_Owner,
    p.opportunity_id          as Opp_Id,
    p.closing_date            as Close_Date,
    p.opp_phase               as Phase,
    p.fc_qualification        as Category,
    p.opportunity_description as Opp_Desc,
    p.product                 as Product,
    p.product_desc            as Product_Desc,
    p.acv_keur                as ACV,
    p.tcv_keur                as TCV
  from pipeline p
  join hcp on p.product = hcp.product
  where
        p.revenue_type   = 'New Software'
    and p.opp_status     = 'In process'
    and p.bp_org_name    like '%CUSTOMER%'
    _WHERE_
  order by
    bp_org_name,
    tcv_keur desc

-r

Output the result of a query not in tabular, but in csv format. This is useful if you want to run further queries on the result of a given query. Notice that the column headers are going to be potentially different from the original table; special characters can be escaped using underscores, and also, you may have used the "AS" statement in the query.
-h

When using -r, do not output the column headers. This can be useful if you want to collect the results of multiple queries into one target file, appending as you go along.
-q

When using -r, quote all columns in the output - this may or may not be useful, as numbers are going to be quoted too with this option; when importing the resulting file into, for example, Excel, Excel will interpret them as text.

INSTALLATION

There are three ways to run csvdb: First, you can run it natively on a Linux or MacOS system. Second, you can run it as a Docker image. Third, you can run it inside a virtual machine, e.g. using VirtualBox.

RUN NATIVELY

The program uses a number of perl modules. You can install those modules, if you are on a Linux or MacOS system, using the script install/src/configure/configure_perl.sh. This will attempt to install the modules for you.

Here is a walkthrough of the installation under Ubuntu:

root[14:48:58]@ltest:~# export LANG=C
root[14:49:03]@ltest:~# apt-get install -y git
Reading package lists... Done
Building dependency tree
Reading state information... Done
git is already the newest version (1:2.7.4-0ubuntu1.3).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
root[14:49:09]@ltest:~# cd /usr/local/bin
root[14:49:20]@ltest:/usr/local/bin# git clone https://github.com/mnott/csvdb csvdb-bin
Cloning into 'csvdb-bin'...
remote: Counting objects: 440, done.
remote: Compressing objects: 100% (60/60), done.
remote: Total 440 (delta 45), reused 79 (delta 36), pack-reused 344
Receiving objects: 100% (440/440), 133.57 KiB | 0 bytes/s, done.
Resolving deltas: 100% (228/228), done.
Checking connectivity... done.
root[14:49:33]@ltest:/usr/local/bin#
root[14:49:33]@ltest:/usr/local/bin# ln -s csvdb-bin/csvdb.pl csvdb
root[14:49:33]@ltest:/usr/local/bin# cd csvdb/install/src/configure
root[14:52:06]@ltest:/usr/local/bin/csvdb/install/src/configure# ./configure_perl.sh
root[15:12:45]@ltest:/usr/local/bin/csvdb/install/src/configure# cd /tmp
root[15:12:47]@ltest:/tmp# echo "a,b" >test.csv
root[15:12:57]@ltest:/tmp# echo 1,2>>test.csv
root[15:13:02]@ltest:/tmp# echo 3,4>>test.csv
root[15:13:06]@ltest:/tmp# csvdb -s "select a, b from test where a=3"
a b
3 4

RUN AS A DOCKER IMAGE

Running csvdb via Docker is very easy. You typically need to install only two things:

1. Install [Git](https://git-scm.com/download)

2. Install [Docker for Mac] (https://docs.docker.com/docker-for-mac/install/), or Install [Docker for Win] (https://docs.docker.com/docker-for-windows/install/)

Note: On Linux, use your distribution's preferred way to install docker. For the subsequent commands, also install docker-compose, which is, for example, a separate package on Ubuntu.

Very important: If you are working on a Windows system, make sure to configure Git, when it installs, to *not automatically* convert Line Ends ("CR/LF"). Shouly you have configured it wrongly, you can do this on the command line:

git config --global core.autocrlf false

Also, if you are using a proxy, you would want to set the proxy for git. For example:

git config --global http.proxy proxy.wdf.sap.corp:8080

To remove a proxy configuration, you might do:

git config --unset --global http.proxy

Then, before continuing, make sure that you have switched on Virtualization in your BIOS (the feature is often under either Configuration or Security, and is often called Intel Virtualization Technology and VT-d Feature: Enable both). If you fail to do this, the image may not start up.

Finally, you open a command line, e.g. on your Desktop, and do this:

git clone https://github.com/mnott/csvdb
cd csvdb
docker-compose up

The last process will typically take about 5 - 10 minutes.

You should then be able to open [the web application](http://localhost:8080/).

RUN WITHIN A VIRTUAL MACHINE

Alternatively, if you do not want to mess with your local installation, you can install the whole package as to run inside a virtual machine (only install what you have not yet got):

1. Install [VirtualBox](https://www.virtualbox.org/wiki/Downloads)

2. Install [Vagrant](https://www.vagrantup.com/downloads.html)

3. Install [Git](https://git-scm.com/download)

4. Install [Vagrant Manager](https://github.com/deveres/vagrant-manager-windows/releases/download/1.0.0.7/vagrant-manager-windows-1.0.0.7.exe)

If you don't want to deal with Vagrant via the command line, there is an excellent frontend about it. The link above for Vagrant Manager is a special version for Windows that has resolved an issue where Vagrant Manager would get stuck on refreshing the list of virtual machines. If you run on Mac or others, you can download the default version: [Vagrant Manager](http://vagrantmanager.com) It is highly recommendable to use Vagrant Manager.

When you download VirtualBox, don't forget to also download the extension pack (on the same page).

Very important: If you are working on a Windows system, make sure to configure Git, when it installs, to *not automatically* convert Line Ends ("CR/LF"). Shouly you have configured it wrongly, you can do this on the command line:

git config --global core.autocrlf false

Also, if you are using a proxy, you would want to set the proxy for git. For example:

git config --global http.proxy proxy.wdf.sap.corp:8080

To remove a proxy configuration, you might do:

git config --unset --global http.proxy

Then, before continuing, make sure that you have switched on Virtualization in your BIOS (the feature is often under either Configuration or Security, and is often called Intel Virtualization Technology and VT-d Feature: Enable both). If you fail to do this, the virtual machine will not start up, and also may be recognized wrongly as 32bit.

Finally, you open a command line, e.g. on your Desktop, and do this:

git clone https://github.com/mnott/csvdb
cd csvdb
vagrant up

The last process will typically take about 5 - 10 minutes.

You should then be able to open [the web application](http://localhost:8080/).

To SSH into your virtual machine, you can just use

vagrant ssh

and then, to become root,

sudo su -

You can stop the virtual machine using, from the csvdb directory,

vagrant suspend

You can delete the virtual machine using, from the csvdb directory,

vagrant destroy

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
data		data
docker		docker
html		html
install		install
lib		lib
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Vagrantfile		Vagrantfile
csvdb.pl		csvdb.pl
docker-compose.yaml		docker-compose.yaml
index.pl		index.pl
log4p.ini		log4p.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NAME

VERSION

LICENCE AND COPYRIGHT

INTRODUCTION

SYNOPSIS

OPTIONS

INSTALLATION

RUN NATIVELY

RUN AS A DOCKER IMAGE

RUN WITHIN A VIRTUAL MACHINE

About

Uh oh!

Releases

Packages

Languages

mnott/csvdb

Folders and files

Latest commit

History

Repository files navigation

NAME

VERSION

LICENCE AND COPYRIGHT

INTRODUCTION

SYNOPSIS

OPTIONS

INSTALLATION

RUN NATIVELY

RUN AS A DOCKER IMAGE

RUN WITHIN A VIRTUAL MACHINE

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages