-
Notifications
You must be signed in to change notification settings - Fork 57
workflow
Workflow dichotomy:
(Having a graph will help you express parallel execution)
-
Makefile style of dependency [reverse dependency graph]
- Know your endpoint but not your beginning
- strong idempotency
- Doesn't like to be dependent on data
-
Forward facing workflow graph
- Know your beginning but not your endpoint [may have many choices]
- Know your direction/velocity
For any defined abstraction layer:
- Only important that the contract is adhered to
- No implication that there are lower level abstraction layers
- May show a forward-looking vision of elegant lower level abstractions
chain :twitter_parse do
wukong_rb ‘parse_api.rb’
pig ‘uniq_and_unsplice.pig’
end
Wukong.workflow(:launch) do
task :aim do
#...
end
task :enter do
end
task :commit do
# ...
end
end
Wukong.workflow(:recall) do
task :smash_with_rock do
#...
end
task :reprogram do
# ...
end
end
Wukong workflows work somewhat differently than you may be familiar with Rake and such.
In wukong, a stage corresponds to a resource; you can then act on that resource.
Consider first compiling a c program:
to build the executable, run `cc -o cake eggs.o milk.o flour.o sugar.o -I./include -L./lib`
to build files like '{file}.o', run `cc -c -o {file}.o {file}.c -I./include`
In this case, you define the steps, implying the resources.
Something rake can't do (but we should be able to): make it so I can define a dependency that runs last
A run
is the event that ensues when you invoke a workflow. Invoking the bake_pie
workflow at 01:20:55 on Jan 30, 2012 results in the bake_pie-20120130012055
run.
A stage
is a data process having
- one input, an array of length one called
inputs
. (later: multiple inputs, named inputs) - one output, called
output
(later: multiple outputs, named outputs) - (later) an error channel named
:error
.
Any stage can be invoked by name; only that stage is executed.
A chain
runs a sequence of stages, one after the other, in order. A chain
is itself is a stage
; it has an array of sub-stages (called steps
) that it will execute in order.
- the input to the chain becomes the input to the first stage, and the output of the last stage becomes the output of the chain.
You can of course invoke either the chain or one of its steps
A shell_process
invokes the swineherd runner.
- hash of config variables
- ?ordered? inputs
- one output, named
:output
, and an error channel named:error
By default, a stage’s inputs are specified by the outputs of its dependencies.
The output asset names are constructed from the stage’s metadata. There is a small set of pathname templates (in fact, only one):
- Development mode output pathname template
somehow: %{user}, %{run_id}, %{session}, %{run_index}, %{prod|dev|test}
(?implement a template that you think works, those are some possible ingredients we’ll codify &/or fix?)
- (later) Automated mode output pathname template (used when deployment class is
prod
andtest
):/%{project_path}/%{run_id}/%{transformed_stage_name}-%{deployment_class}
(just implement something sensible, we’ll figure out the details)
somehow: %{user}, %{session}, %{run_index}, %{prod|dev|test}, %{timestamp}
-
project_path: A container for runs for the same purpose/project
-
session: A temporally close connected set of runs
-
run_index: An auto-incremented counter for the runs
-
deployment_class: The type of deployment instantiation. These may be used for more than one granularity of sets of run.
-
run_id: The time the run started and some other information to uniquely identify this specific invocation of the workflow. (?complete as you find natural?)
-
timestamp: timestamp of run. everything in this invocation will have the same timestamp.
-
user: username;
ENV['USER']
by default -
sources: basenames of job inputs, minus extension, non-
\w
replaced with '_', joined by '-', max 50 chars.
Normally, one should not rename inputs and output. However, there are some (hopefully rare) cases where they may be renamed. Example cases include:
You can override the default input name to adapt to external processes:
- (show how)
- (make sure I can still inject an explicit name at execution time)
You can also inject an explicit name:
- (show how)
...
-
handled by configliere:
nukes launch --launch_code=GLG20
-
TODO: configliere needs context-specific config vars, so I only get information about the
launch
action in thenukes
job when I runnukes launch --help
- when files are generated or removed, relocate to a timestamped location
- a file
/path/to/file.txt
is relocated to~/.wukong/backups/path/to/file.txt.wukong-20110102120011
where20110102120011
is the job timestamp - accepts a
max_size
param - raises if it can't write to directory -- must explicitly say
--safe_file_ops=false
- a file
each action
-
the default action is
call
-
all stages respond to
nothing
, and like ze goggles, donothing
. -
clobber
-- run, but clear all dependencies -
undo
-- -
clean
--
The primitives correspond heavily with Rake and Chef. However, they extend them in many ways, don't cover all their functionality in many ways, and incompatible in several ways.
The concrete swineherd runnables each have eponymous stage names.
Any simple Rake task should work as a swineherd flow
- task
- desc
- namespace
- ... (flesh out)