The random scheduler won't set TaskRun created time correctly, breadth first created time will have race conditions with other users active, the default depth first scheduler is setting created time correctly with a little luck.
Before Pybossa.js presents a task, it pre-loads the next task. At the time of pre-load, the back-end sets a Redis key for the pre-loaded task with a time stamp, even though the user hasn't started the task yet.
You can easily verify this by going to your Pybossa back-end environment:
import redis
r = redis.StrictRedis()
r.keys("pybossa:task_requested:*")
# load task
r.keys("pybossa:task_requested:*")
You will see that current task and next task are both set, as expected.
But they have almost identical time stamps, so we must examine when or if the time stamp is updated for the pre-loaded task, so we know it matches when the user sees the task, not when it is pre-loaded.
When you save the first task, if you are using a depth first scheduler (default), the call to _fetchNewTask here will get the task that is currently being presented, because, after all, the user has not completed it.
https://github.com/Scifabric/pybossa.js/blob/master/pybossa.js#L147
var xhr = (taskId && (previousTask === undefined)) ? _fetchTask(taskId) : _fetchNewTask(project.id, offset);
xhr.done(function(task) {
if (previousTask && task.id === previousTask.id) {
var secondTry = _fetchNewTask(project.id, offset+1)
.done(function(secondTask){
_resolveNextTaskLoaded(secondTask, def);
});
}
else {
_resolveNextTaskLoaded(task, def);
}
});
Luckily, /project/<project_id>/newtask calls guard.stamp(task, get_user_id_or_ip()) which updates the Redis key that will be used as the current task's timestamp when it is saved.
Then there is an if (previousTask && task.id === previousTask.id) statement in the pre-load , that says, oops, same task, let's get the one after.
So with a depth first scheduler, the current task just being presented will be loaded (a second time), and secondTry will always be run to get the actual next task.
This is good because it has the effect of setting the time correctly for the current task right when it is presented. But it relies on the scheduler returning the same task again - a RANDOM scheduler will not luckily refresh the created time of the current task.
BREADTH FIRST may return a different task than the first time if in the meantime, another user has completed tasks, shuffling what is next in line.
In either case, we would expect the TaskRun created time stamp to incorrectly be saved with the time as of the pre-load, thus incorrectly including all the time of the prior task with the task now being saved.
Possible Solution
Some API call has to be made at the time the task is presented to update the Redis key with the created time stamp. This is the only other routine on the backend that also sets the time stamp, so it could be called just to refresh the timestamp, and discard the result.
https://github.com/Scifabric/pybossa/blob/master/pybossa/view/projects.py#L852
@blueprint.route('/<short_name>/task/<int:task_id>')
def task_presenter(short_name, task_id):
…
guard.stamp(task, get_user_id_or_ip())
Or a new lightweight API could be made that returned nothing, just refreshes the time stamp. It would be a bad idea to make /api/task/<task_id> set the time stamp, as it would break existing expectations and not be backward compatible with older Pybossa.js. For example it is called in pybossa.saveTask and you wouldn't want to set the created stamp at the moment you are saving the taskrun.
The random scheduler won't set TaskRun
createdtime correctly, breadth firstcreatedtime will have race conditions with other users active, the default depth first scheduler is settingcreatedtime correctly with a little luck.Before Pybossa.js presents a task, it pre-loads the next task. At the time of pre-load, the back-end sets a Redis key for the pre-loaded task with a time stamp, even though the user hasn't started the task yet.
You can easily verify this by going to your Pybossa back-end environment:
You will see that current task and next task are both set, as expected.
But they have almost identical time stamps, so we must examine when or if the time stamp is updated for the pre-loaded task, so we know it matches when the user sees the task, not when it is pre-loaded.
When you save the first task, if you are using a depth first scheduler (default), the call to _fetchNewTask here will get the task that is currently being presented, because, after all, the user has not completed it.
https://github.com/Scifabric/pybossa.js/blob/master/pybossa.js#L147
Luckily,
/project/<project_id>/newtaskcallsguard.stamp(task, get_user_id_or_ip())which updates the Redis key that will be used as the current task's timestamp when it is saved.Then there is an
if (previousTask && task.id === previousTask.id)statement in the pre-load , that says, oops, same task, let's get the one after.So with a depth first scheduler, the current task just being presented will be loaded (a second time), and secondTry will always be run to get the actual next task.
This is good because it has the effect of setting the time correctly for the current task right when it is presented. But it relies on the scheduler returning the same task again - a RANDOM scheduler will not luckily refresh the
createdtime of the current task.BREADTH FIRST may return a different task than the first time if in the meantime, another user has completed tasks, shuffling what is next in line.
In either case, we would expect the TaskRun
createdtime stamp to incorrectly be saved with the time as of the pre-load, thus incorrectly including all the time of the prior task with the task now being saved.Possible Solution
Some API call has to be made at the time the task is presented to update the Redis key with the created time stamp. This is the only other routine on the backend that also sets the time stamp, so it could be called just to refresh the timestamp, and discard the result.
https://github.com/Scifabric/pybossa/blob/master/pybossa/view/projects.py#L852
Or a new lightweight API could be made that returned nothing, just refreshes the time stamp. It would be a bad idea to make
/api/task/<task_id>set the time stamp, as it would break existing expectations and not be backward compatible with older Pybossa.js. For example it is called inpybossa.saveTaskand you wouldn't want to set thecreatedstamp at the moment you are saving the taskrun.