-
Notifications
You must be signed in to change notification settings - Fork 514
Daemons
The job-processing pipeline involves several other daemons that are application-independent: i.e., they're part of BOINC, and you don't have to modify them.
This program is supplied by BOINC and is application independent. It creates a shared-memory segment used to pass database info to scheduler processes. This data includes applications, application versions, and 'work items' (an unsent result and its corresponding workunit). It has the following command-line options:
Sets the verbosity level. 1=low ... 4=high.
Handle only results for which mod(id, n) = i. This lets you run the feeder on arbitrarily many machines.
Handle only workunits for which mod(id, n) = i. This lets you run the feeder on arbitrarily many machines.
Enumerate work items in order of increasing result.random
Use rand(sysdate()) to order work items from the DB (only available in MySQL 5+)
Enumerate work items in order of decreasing priority.
Enumerate work items in order of decreasing priority, then increasing create time.
Enumerate work items in order of increasing priority.
Sleep N seconds if nothing to do
Interleave work items from all applications. Weight applications according to the value of their 'weight' field; if all weights are zero, results are interleaved uniformly. Without this option, runnable results are enumerated in an indeterminate order, and there may be periods when only results from one application are available for sending.
Interleave work items uniformly across batches.
remove work items from the shared memory segment that have been there for longer than X minutes.
Get work only for the given apps.
Useful if your project uses both locality and job-cache scheduling.
Can't be used in combination with --allapps,
but you can get the same effect by using --allapps
and assigning nonzero weights only to the apps you want.
If a user's project preferences include elements of the form <app_id>N</app_id> then the scheduler will send the user work only from those applications.
This has the same function as the feeder, but it's designed for batch-oriented multi-user projects. It divides computing power among users based on their quotas, and (for a given user) it orders jobs as required for Batch-acceleration.
Options:
Sets the verbosity level. 1=low ... 4=high.
remove work items from the shared memory segment that have been there for longer than X minutes.
It handles state transitions of workunits and results. It generates initial results for workunits, and generates more results when timeouts or errors occur. Options:
sleep N seconds if nothing to do (default 5)
do one pass, then exit
debug level N (1 .. 4 = most verbose)
process only workunits with (id mod n) == i
The transitioner works asynchronously from work generators and schedulers, and in high-load situations it can fall behind (i.e. a large queue of workunits waiting to be processed can build up). The transitioner backlog is shown in the server status page. If the backlog becomes large (> 1 hour) you can do the following:
- Stop the project
- Run sched/transitioner_catchup.php
- Start the project
Files are deleted from the data server's upload and download directories by two programs:
- the file_deleter daemon deletes input and output files as jobs are completed.
- The antique file deleter periodic task deletes files that were missed by the file_deleter and "fell through the cracks".
The default file deletion policy is:
- A workunit's input files are deleted when all results are 'over' (reported or timed out) and the workunit is assimilated.
- A result's output files are deleted after the workunit is assimilated. The canonical result is handled differently, since its output files may be needed to validate results that are reported after assimilation; hence its files are deleted only when all results are over, and all successful results have been validated.
- If <delete_delay_hours> is specific in config.xml, the deletion of file is delayed by that interval.
Command-line options:
set debug output level (1/2/3/4)
handle only WUs with ID mod M == R
exit after one pass through DB
don't update DB (for debugging only)
override download_dir from project config with D
sleep for N seconds between scans (default 5)
only process workunits with appid=N
only process workunits of app with name S
don't retry file deletions that failed previously
update the DB, but don't delete input files
update the DB, but don't delete output files
don't delete anything with positive batch number
don't delete output files If you store input and output files on different servers, you can improve performance by running separate file deleters, each one on the machine where the corresponding files are stored.
don't delete input files
only process workunits where xml_doc LIKE 'L'
In some cases you may not want files to be deleted. There are three ways to accomplish this:
- Use the --preserve_wu_files and/or the --preserve_result_files command-line options.
- Include <no_delete/> in the <file_info> element for a file in a workunit or result template. This lets you suppress deletion on a file-by-file basis.
- Include nodelete in the workunit name.
As a BOINC project operates, the size of its workunit and result tables increase. To limit this growth, BOINC provides a utility db_purge that writes result and WU records to XML-format archive files, then deletes them from the database. Workunits are purged only when their input files have been deleted. Because of BOINC's file-deletion policy, this implies that all results are completed. So when a workunit is purged, all its results are purged too.
db_purge creates an archive/ directory and stores archive files there.
db_purge is normally run as a daemon, specified in the project configuration file. It has the following command-line options:
Purge only workunits of the given app.
Set logging verbosity to N (1,2,3,4)
Write archives in a new directory (YYYY_MM_DD) each day
Don't delete from DB, for testing only
Compress archive files using gzip
Purge at most N WUs, then exit
Write at most N WUs to each archive file. Recommended value: 10,000 or so.
Purge only WUs with mod_time at least X (can be < 1) days in the past. Recommended value: 7 or so. This lets users examine their recent results.
Process only WUs with ID mod N == R. This lets you run multiple instances for increased throughput. Must be used with --no_archive.
Don't archive workunits or results
Do one pass, then quit
Compress archive files using zip
Delete retired batches from the batch table
You may want to keep a record of which user returned a valid result for each workunit (e.g. so that you can give them credit for specific discoveries). If you use db_purge, this information is removed from the database; it can be extracted from the XML archives, but it's slow.
Alternatively, you can store the user/workunit association in a separate table called credited_job. The records in this table are small - they store only a user ID and a 64-bit integer which is taken from the workunit's 'opaque' field (typically an index in a separate database). By default, the table has indices that allow you to efficiently enumerate workunits for a given user and users for a given workunit.
To use the credited_job table, pass the --update_credited_job flag to your validator.
The archive files have names of the form wu_archive_TIME and result_archive_TIME where TIME is the Unix time the file was created. In addition, db_purge generates index files 'wu_index' and 'result_index' associating each WU and result ID with the timestamp of its archive file.
The format of both type of index files is a number of rows each containing:
ID TIME NAME
The ID field of the WU or result, 5 spaces, the timestamp part of the archive filename where the record with that ID can be found, 4 spaces, and the name of the WU/result.
The format of a record in the result archive file is:
<result_archive>
<id>%d</id>
<create_time>%d</create_time>
<workunitid>%d</workunitid>
<server_state>%d</server_state>
<outcome>%d</outcome>
<client_state>%d</client_state>
<hostid>%d</hostid>
<userid>%d</userid>
<report_deadline>%d</report_deadline>
<sent_time>%d</sent_time>
<received_time>%d</received_time>
<name>%s</name>
<cpu_time>%.15e</cpu_time>
<xml_doc_in>%s</xml_doc_in>
<xml_doc_out>%s</xml_doc_out>
<stderr_out>%s</stderr_out>
<batch>%d</batch>
<file_delete_state>%d</file_delete_state>
<validate_state>%d</validate_state>
<claimed_credit>%.15e</claimed_credit>
<granted_credit>%.15e</granted_credit>
<opaque>%f</opaque>
<random>%d</random>
<app_version_num>%d</app_version_num>
<appid>%d</appid>
<exit_status>%d</exit_status>
<teamid>%d</teamid>
<priority>%d</priority>
<mod_time>%s</mod_time>
</result_archive>The format of a record in the WU archive file is:
<workunit_archive>
<id>%d</id>
<create_time>%d</create_time>
<appid>%d</appid>
<name>%s</name>
<xml_doc>%s</xml_doc>
<batch>%d</batch>
<rsc_fpops_est>%.15e</rsc_fpops_est>
<rsc_fpops_bound>%.15e</rsc_fpops_bound>
<rsc_memory_bound>%.15e</rsc_memory_bound>
<rsc_disk_bound>%.15e</rsc_disk_bound>
<need_validate>%d</need_validate>
<canonical_resultid>%d</canonical_resultid>
<canonical_credit>%.15e</canonical_credit>
<transition_time>%d</transition_time>
<delay_bound>%d</delay_bound>
<error_mask>%d</error_mask>
<file_delete_state>%d</file_delete_state>
<assimilate_state>%d</assimilate_state>
<hr_class>%d</hr_class>
<opaque>%f</opaque>
<min_quorum>%d</min_quorum>
<target_nresults>%d</target_nresults>
<max_error_results>%d</max_error_results>
<max_total_results>%d</max_total_results>
<max_success_results>%d</max_success_results>
<result_template_file>%s</result_template_file>
<priority>%d</priority>
<mod_time>%s</mod_time>
</workunit_archive>To extract old statistics, one has to recover the xml archive files (records get periodically removed from the results table by db_purge). This section explains a quick hack to recover the xml archive into a database. For now, only the result_archive is recreated, without xml data.
Note. This procedure relies on the LOAD XML INTO MySQL command, which was introduced (about) in MySQL version 6.0 alpha. The 6.0 version was then removed from the MySQL roadmap, and must therefore be resurrected from sameplace on the net. Try e.g. looking for mysql-6.0.11-alpha-linux-x86_64-glibc23.tar.gz. It might be found e.g. at ftp://ftp.fu-berlin.de/unix/databases/mysql/Downloads/MySQL-6.0/ . MySQL 6.0 should not be used in the production DB, but only on a sandbox machine (called local henceforth).
Prerequisites:
- A local working installation of mysql 6.0 alpha
- A local image of the current BOINC database (e.g. restored from a recent dump)
Procedure:
- put the following script, boinc_xml_import, where it can be executed
# iterate over all arguments in the command line, unzip them,
# strip XML-in-XML and feed them to mysql for import
tmpf=/tmp/boinc_import.$$
for f in $*; do
echo Processing $f ...
# Remove the 3 xml snippets of each result. Treat the file as a huge string.
zcat -f $f | perl -w -e '
use strict;
my @a;
{ local $/=undef;
@a=<>; }
my $l="@a";
$l=~s|<xml_doc_in>.+?</stderr_out>| |sg;
print "$l";
' > $tmpf
mysql YOURPROJECT <<EOF
CREATE TABLE IF NOT EXISTS result_archive LIKE result;
SET AUTOCOMMIT=0;
SET UNIQUE_CHECKS=0;
SET FOREIGN_KEY_CHECKS=0;
LOAD XML CONCURRENT INFILE '$tmpf' IGNORE INTO TABLE result_archive
ROWS IDENTIFIED BY '<result_archive>';
SET FOREIGN_KEY_CHECKS=1;
SET UNIQUE_CHECKS=1;
COMMIT;
EOF
echo ... done
done
- Copy the .xml.gz archive files on a local machine, and run the following command (it will take a long time)
boinc_xml_import *.xml.gz
- At the end of the run, a result_archive table will be reconstructed on the server. To add the most recent results, contained in the result table, do (it will also take a while)
INSERT INTO result_archive SELECT * FROM result;