Daemons

Job-processing daemons

The job-processing pipeline involves several other daemons that are application-independent: i.e., they're part of BOINC, and you don't have to modify them.

Feeder

This program is supplied by BOINC and is application independent. It creates a shared-memory segment used to pass database info to scheduler processes. This data includes applications, application versions, and 'work items' (an unsent result and its corresponding workunit). It has the following command-line options:

-d N

Sets the verbosity level. 1=low ... 4=high.

--mod n i

Handle only results for which mod(id, n) = i. This lets you run the feeder on arbitrarily many machines.

--wmod n i

Handle only workunits for which mod(id, n) = i. This lets you run the feeder on arbitrarily many machines.

--random_order

Enumerate work items in order of increasing result.random

--random_order_db

Use rand(sysdate()) to order work items from the DB (only available in MySQL 5+)

--priority_order

Enumerate work items in order of decreasing priority.

--priority_order_create_time

Enumerate work items in order of decreasing priority, then increasing create time.

--priority_asc

Enumerate work items in order of increasing priority.

--sleep_interval N

Sleep N seconds if nothing to do

--allapps

Interleave work items from all applications. Weight applications according to the value of their 'weight' field; if all weights are zero, results are interleaved uniformly. Without this option, runnable results are enumerated in an indeterminate order, and there may be periods when only results from one application are available for sending.

--by_batch

Interleave work items uniformly across batches.

--purge_stale X

remove work items from the shared memory segment that have been there for longer than X minutes.

--appids N1, N2, ...

Get work only for the given apps. Useful if your project uses both locality and job-cache scheduling. Can't be used in combination with --allapps, but you can get the same effect by using --allapps and assigning nonzero weights only to the apps you want.

If a user's project preferences include elements of the form <app_id>N</app_id> then the scheduler will send the user work only from those applications.

Feeder_user

This has the same function as the feeder, but it's designed for batch-oriented multi-user projects. It divides computing power among users based on their quotas, and (for a given user) it orders jobs as required for Batch-acceleration.

Options:

-d N

Sets the verbosity level. 1=low ... 4=high.

--purge_stale X

remove work items from the shared memory segment that have been there for longer than X minutes.

Transitioner

It handles state transitions of workunits and results. It generates initial results for workunits, and generates more results when timeouts or errors occur. Options:

--sleep_interval N

sleep N seconds if nothing to do (default 5)

--one_pass

do one pass, then exit

-d N

debug level N (1 .. 4 = most verbose)

--mod n i

process only workunits with (id mod n) == i

The transitioner works asynchronously from work generators and schedulers, and in high-load situations it can fall behind (i.e. a large queue of workunits waiting to be processed can build up). The transitioner backlog is shown in the server status page. If the backlog becomes large (> 1 hour) you can do the following:

Stop the project
Run sched/transitioner_catchup.php
Start the project

file_deleter

Files are deleted from the data server's upload and download directories by two programs:

the file_deleter daemon deletes input and output files as jobs are completed.
The antique file deleter periodic task deletes files that were missed by the file_deleter and "fell through the cracks".

The default file deletion policy is:

A workunit's input files are deleted when all results are 'over' (reported or timed out) and the workunit is assimilated.
A result's output files are deleted after the workunit is assimilated. The canonical result is handled differently, since its output files may be needed to validate results that are reported after assimilation; hence its files are deleted only when all results are over, and all successful results have been validated.
If <delete_delay_hours> is specific in config.xml, the deletion of file is delayed by that interval.

Command-line options:

-d N

set debug output level (1/2/3/4)

--mod M R

handle only WUs with ID mod M == R

--one_pass

exit after one pass through DB

--dry_run

don't update DB (for debugging only)

--download_dir D

override download_dir from project config with D

--sleep_interval N

sleep for N seconds between scans (default 5)

--appid N

only process workunits with appid=N

--app S

only process workunits of app with name S

--dont_retry_errors

don't retry file deletions that failed previously

--preserve_wu_files

update the DB, but don't delete input files

--preserve_result_files

update the DB, but don't delete output files

--dont_delete_batches

don't delete anything with positive batch number

--input_files_only

don't delete output files If you store input and output files on different servers, you can improve performance by running separate file deleters, each one on the machine where the corresponding files are stored.

--output_files_only

don't delete input files

--xml_doc_like L

only process workunits where xml_doc LIKE 'L'

In some cases you may not want files to be deleted. There are three ways to accomplish this:

Use the --preserve_wu_files and/or the --preserve_result_files command-line options.
Include <no_delete/> in the <file_info> element for a file in a workunit or result template. This lets you suppress deletion on a file-by-file basis.
Include nodelete in the workunit name.

db_purge

As a BOINC project operates, the size of its workunit and result tables increase. To limit this growth, BOINC provides a utility db_purge that writes result and WU records to XML-format archive files, then deletes them from the database. Workunits are purged only when their input files have been deleted. Because of BOINC's file-deletion policy, this implies that all results are completed. So when a workunit is purged, all its results are purged too.

db_purge creates an archive/ directory and stores archive files there.

db_purge is normally run as a daemon, specified in the project configuration file. It has the following command-line options:

--app appname

Purge only workunits of the given app.

-d N

Set logging verbosity to N (1,2,3,4)

--daily_dir

Write archives in a new directory (YYYY_MM_DD) each day

--dont_delete

Don't delete from DB, for testing only

--gzip

Compress archive files using gzip

--max N

Purge at most N WUs, then exit

--max_wu_per_file N

Write at most N WUs to each archive file. Recommended value: 10,000 or so.

--min_age_days X

Purge only WUs with mod_time at least X (can be < 1) days in the past. Recommended value: 7 or so. This lets users examine their recent results.

--mod N R

Process only WUs with ID mod N == R. This lets you run multiple instances for increased throughput. Must be used with --no_archive.

--no_archive

Don't archive workunits or results

--one_pass

Do one pass, then quit

--zip

Compress archive files using zip

--batches

Delete retired batches from the batch table

Recording credited jobs

You may want to keep a record of which user returned a valid result for each workunit (e.g. so that you can give them credit for specific discoveries). If you use db_purge, this information is removed from the database; it can be extracted from the XML archives, but it's slow.

Alternatively, you can store the user/workunit association in a separate table called credited_job. The records in this table are small - they store only a user ID and a 64-bit integer which is taken from the workunit's 'opaque' field (typically an index in a separate database). By default, the table has indices that allow you to efficiently enumerate workunits for a given user and users for a given workunit.

To use the credited_job table, pass the --update_credited_job flag to your validator.

Archive file format

The archive files have names of the form wu_archive_TIME and result_archive_TIME where TIME is the Unix time the file was created. In addition, db_purge generates index files 'wu_index' and 'result_index' associating each WU and result ID with the timestamp of its archive file.

The format of both type of index files is a number of rows each containing:

ID     TIME    NAME

The ID field of the WU or result, 5 spaces, the timestamp part of the archive filename where the record with that ID can be found, 4 spaces, and the name of the WU/result.

The format of a record in the result archive file is:

<result_archive>
  <id>%d</id>
  <create_time>%d</create_time>
  <workunitid>%d</workunitid>
  <server_state>%d</server_state>
  <outcome>%d</outcome>
  <client_state>%d</client_state>
  <hostid>%d</hostid>
  <userid>%d</userid>
  <report_deadline>%d</report_deadline>
  <sent_time>%d</sent_time>
  <received_time>%d</received_time>
  <name>%s</name>
  <cpu_time>%.15e</cpu_time>
  <xml_doc_in>%s</xml_doc_in>
  <xml_doc_out>%s</xml_doc_out>
  <stderr_out>%s</stderr_out>
  <batch>%d</batch>
  <file_delete_state>%d</file_delete_state>
  <validate_state>%d</validate_state>
  <claimed_credit>%.15e</claimed_credit>
  <granted_credit>%.15e</granted_credit>
  <opaque>%f</opaque>
  <random>%d</random>
  <app_version_num>%d</app_version_num>
  <appid>%d</appid>
  <exit_status>%d</exit_status>
  <teamid>%d</teamid>
  <priority>%d</priority>
  <mod_time>%s</mod_time>
</result_archive>

The format of a record in the WU archive file is:

<workunit_archive>
  <id>%d</id>
  <create_time>%d</create_time>
  <appid>%d</appid>
  <name>%s</name>
  <xml_doc>%s</xml_doc>
  <batch>%d</batch>
  <rsc_fpops_est>%.15e</rsc_fpops_est>
  <rsc_fpops_bound>%.15e</rsc_fpops_bound>
  <rsc_memory_bound>%.15e</rsc_memory_bound>
  <rsc_disk_bound>%.15e</rsc_disk_bound>
  <need_validate>%d</need_validate>
  <canonical_resultid>%d</canonical_resultid>
  <canonical_credit>%.15e</canonical_credit>
  <transition_time>%d</transition_time>
  <delay_bound>%d</delay_bound>
  <error_mask>%d</error_mask>
  <file_delete_state>%d</file_delete_state>
  <assimilate_state>%d</assimilate_state>
  <hr_class>%d</hr_class>
  <opaque>%f</opaque>
  <min_quorum>%d</min_quorum>
  <target_nresults>%d</target_nresults>
  <max_error_results>%d</max_error_results>
  <max_total_results>%d</max_total_results>
  <max_success_results>%d</max_success_results>
  <result_template_file>%s</result_template_file>
  <priority>%d</priority>
  <mod_time>%s</mod_time>
</workunit_archive>

Restoring archived results into the DB

To extract old statistics, one has to recover the xml archive files (records get periodically removed from the results table by db_purge). This section explains a quick hack to recover the xml archive into a database. For now, only the result_archive is recreated, without xml data.

Note. This procedure relies on the LOAD XML INTO MySQL command, which was introduced (about) in MySQL version 6.0 alpha. The 6.0 version was then removed from the MySQL roadmap, and must therefore be resurrected from sameplace on the net. Try e.g. looking for mysql-6.0.11-alpha-linux-x86_64-glibc23.tar.gz. It might be found e.g. at ftp://ftp.fu-berlin.de/unix/databases/mysql/Downloads/MySQL-6.0/ . MySQL 6.0 should not be used in the production DB, but only on a sandbox machine (called local henceforth).

Prerequisites:

A local working installation of mysql 6.0 alpha
A local image of the current BOINC database (e.g. restored from a recent dump)

Procedure:

put the following script, boinc_xml_import, where it can be executed


# iterate over all arguments in the command line, unzip them,
# strip XML-in-XML and feed them to mysql for import

tmpf=/tmp/boinc_import.$$

for f in $*; do
    echo Processing $f ...

    # Remove the 3 xml snippets of each result. Treat the file as a huge string.
    zcat -f $f | perl -w -e '
        use strict;
        my @a;
        { local $/=undef;
          @a=<>;        }
        my $l="@a";
        $l=~s|<xml_doc_in>.+?</stderr_out>|    |sg;
        print "$l";
    ' > $tmpf

    mysql YOURPROJECT <<EOF
          CREATE TABLE IF NOT EXISTS result_archive LIKE result;
          SET AUTOCOMMIT=0;
          SET UNIQUE_CHECKS=0;
          SET FOREIGN_KEY_CHECKS=0;
          LOAD XML CONCURRENT INFILE '$tmpf' IGNORE INTO TABLE result_archive
             ROWS IDENTIFIED BY '<result_archive>';
          SET FOREIGN_KEY_CHECKS=1;
          SET UNIQUE_CHECKS=1;
          COMMIT;
EOF

    echo ... done
done

Copy the .xml.gz archive files on a local machine, and run the following command (it will take a long time)

boinc_xml_import *.xml.gz

At the end of the run, a result_archive table will be reconstructed on the server. To add the most recent results, contained in the result table, do (it will also take a while)

INSERT INTO result_archive SELECT * FROM result;

Home

Daemons

Job-processing daemons

Feeder

-d N

--mod n i

--wmod n i

--random_order

--random_order_db

--priority_order

--priority_order_create_time

--priority_asc

--sleep_interval N

--allapps

--by_batch

--purge_stale X

--appids N1, N2, ...

Feeder_user

-d N

--purge_stale X

Transitioner

--sleep_interval N

--one_pass

-d N

--mod n i

file_deleter

-d N

--mod M R

--one_pass

--dry_run

--download_dir D

--sleep_interval N

--appid N

--app S

--dont_retry_errors

--preserve_wu_files

--preserve_result_files

--dont_delete_batches

--input_files_only

--output_files_only

--xml_doc_like L

db_purge

--app appname

-d N

--daily_dir

--dont_delete

--gzip

--max N

--max_wu_per_file N

**--min_age_days X **

--mod N R

--no_archive

--one_pass

--zip

--batches

Recording credited jobs

Archive file format

Restoring archived results into the DB

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

--min_age_days X