pipeline Module

exception pipelet.pipeline.PipeException[source]

Bases: exceptions.Exception

Extension of exception class.

class pipelet.pipeline.Pipeline(seg_list, code_dir='./', prefix='./', sqlfile=None, matplotlib=False, matplotlib_interactive=False, env=<class pipelet.environment.Environment at 0x2b40ce8>, permissive=False)[source]

A segment oriented graph.

A pipeline is a list of code segments with their corresponding repository and dependency tree.

Setting matplotlib to True will turn the matplotlib backend to Agg in order to allow the execution in non-interactive environment.

compute_hash()[source]

Compute the hashkey for all segments of the tree once and for all.

connect(seg1, p, seg2)[source]

Connect two pipes.

First segment becomes parent of second segment.

find_seg(seg, expr)[source]

Find segment which name matches regexp.

flatten()[source]

Return a generator delivering segments in execution order.

Perform a topological sort of the dependency graph. For this to succeed the graph MUST be acyclic.

Iterative algorithm from Kahn (1962) http://en.wikipedia.org/wiki/Topological_sorting

>>> T = Pipeline('''first->second->fourth;
...                 third->fourth;''', permissive=True)
>>> for e in T.flatten():
...    print e
third
first
second
fourth
from_dot(s)[source]

Parse a subset of the graphviz dot language.

>>> p = Pipeline('')
>>> p.from_dot('a;')
>>> print p._parents == {'a':[]}, p._children == {'a':[]}
True True
>>> p.from_dot('''a->b;
... c->b;
... c->d;''')
>>> print p._children 
{'a': ['b'], 'c': ['b', 'd'], 'b': [], 'd': []}
>>> print p._parents
{'a': [], 'c': [], 'b': ['a', 'c'], 'd': ['c']}
get_childrens(seg)[source]

Return a list of child segments.

get_curr_dir(seg)[source]

Return the segment directory.

get_data_dir(seg, prod=None)[source]

Return the data directory for the segment or a product full name.

get_full_seg_name(seg)[source]

Return segment full name (segment name + hashkey).

get_log_dir()[source]

Return the pipe log directory.

get_log_file(seg)[source]

Return the segment log filename.

get_meta_file(seg, prod=-1)[source]

Return the meta data filename

This routine is called for segment meta data storage and task meta data storage.

If the first case, meta data are stored in the segment curr_dir.

In the second case, meta data are stored in the task directory (prod may be None)

get_param_file(seg)[source]

Return the segment directory.

get_parents(seg, nophantom=False)[source]

Return a list of parent segments.

>>> a = Pipeline('''first->second->fourth
...                 third->fourth''',
...              permissive=True)
>>> print(a.get_parents('fourth'))
['second', 'third']
get_tag_file(seg)[source]

Return the segment directory.

push(**args)[source]

Add input to orphan segment.

strseg(seg)[source]

Convert a segment to a string.

>>> p = Pipeline('a->b', permissive=True)
>>> print p.strseg('a')
a -> 
>>> print p.strseg('b')
a -> b -> 
to_dot(filename=None)[source]

Return a string representation of the graph.

Writen using a subsample of the graphviz dot syntax.

>>> p = Pipeline("")
>>> p.from_dot('''a->b->c;
... d->c;''')
>>> print p.to_dot()
digraph pipelet {
b -> c;
d -> c;
a -> b;
}

Table Of Contents

This Page