This document discusses bridging TensorFlow to run on Intel nGraph backends. It summarizes various optimization passes used in the nGraph-TensorFlow integration, including passes to liberate nodes from placement constraints, confirm placement, cluster the graph, and encapsulate clusters. Key points:
- NGraphLiberatePass and NGraphConfirmPass run during the PRE_PLACEMENT phase to handle nGraph placement
- NGraphClusterPass runs during POST_REWRITE_FOR_EXEC to cluster the graph into subgraphs, similar to XLA partitioning
- NGraphEncapsulatePass encapsulates clusters into NGraphEncapsulateOp nodes, analogous to XLA's use of _XlaLaunchOp
-
11. Bridge TensorFlow* to run on Intel®
nGraph™ backends
https://github.com/NervanaSystems/ngraph-tf
https://github.com/NervanaSystems/ngraph-tf/tree/r0.4/
13. まずは、cpu で確認してみると
def test_cpu(self):
with tf.Session() as sess:
x = tf.placeholder(tf.float32, [2], name="x")
with tf.device("cpu"):
y = x * 2
result = sess.run(y, {x: [1.5, 0.5]})
18. 次に、gpu に変更してみると
def test_gpu(self):
with tf.Session() as sess:
x = tf.placeholder(tf.float32, [2], name="x")
with tf.device("gpu"):
y = x * 2
result = sess.run(y, {x: [1.5, 0.5]})
31. 次に、NGRAPH に変更してみると
def test_ngraph(self):
with tf.Session() as sess:
x = tf.placeholder(tf.float32, [2], name="x")
with tf.device("NGRAPH"):
y = x * 2
result = sess.run(y, {x: [1.5, 0.5]})
39. src/ngraph_liberate_pass.cc
tf::Status LiberateNGraphPlacement(tf::Graph* graph) {
int i = 0;
for (auto node : graph->op_nodes()) {
if (node->IsOp() && IsNGraphNode(node)) {
std::vector<std::string> colo;
if (tf::GetNodeAttr(node->attrs(), tf::kColocationAttrName, &colo) == tf::Status::OK()) {
for (auto& s : colo) {
std::stringstream ss; ss << s << "/LIBERATED_" << (i++); s = ss.str();
}
node->ClearAttr(tf::kColocationAttrName);
node->AddAttr(tf::kColocationAttrName, colo);
}
}
}
return tf::Status::OK();
}
NGraphLiberatePass
40. src/ngraph_liberate_pass.cc
// At graph construction time, TensorFlow likes to place colocation constraints
// that force variables onto the same device as their initializers. For nGraph
// this doesn't work very well, because we don't yet support RNG ops, and this
// results in randomly-initialized variables being forced onto the host.
//
// The workaround implemented here is to "liberate" nGraph-placed ops from
// colocation constraints. This pass only applies to nodes with a requested
// placement on NGRAPH, meaning that the graph will be unchanged except
// where the user has explicitly requested nGraph.
NGraphLiberatePass
41. src/ngraph_liberate_pass.cc
// General algorithm:
//
// i := 0
// For each node n in the graph:
// If n has been placed on device NGRAPH:
// For each colocation constraint s on n:
// Append the string ("/LIBERATED_" + i) to s
// i++
//
// (Note that simply blanking out the colocation constraints does not work,
// because this causes the placer to act as if the node is subject to an
// eponymous colocation constraint, which happens to be exactly the name that
// the variable construction stuff will assign to it anyway.)
NGraphLiberatePass
44. src/ngraph_confirm_pass.cc
// In some cases, we require more complex placement constraints than than
// TensorFlow's native "soft-placement" machinery is capable of handling. To
// handle this, we insert a pass called the "confirmation" pass during the
// pre-placement phase.
// For example, we can only handle Reshape if the "shape" input is a constant,
// so this is okay:
//
// ... Const[2,4,2]
// /
// Reshape (1)
//
// but this is not:
//
// ... Placeholder
// /
// Reshape (2)
NGraphConfirmPass
45. src/ngraph_confirm_pass.cc
// We want to reject placement of Reshape on NGRAPH for the second graph, but
// allow it for the first. We also want to attach some more metadata to the
// Reshape node so that we can remember the requested output shape even if the
// Const node winds up being placed in a different subgraph.
//
// This pass exploits a feature of the placement engine that allows a kernel
// builder registration request to restrict use of the kernel to nodes that
// have a particular value set for the "_kernel" attribute. In this case, we
// will check every node that has a requested placement on NGRAPH, and make
// sure that it conforms to certain (op-dependent) constraints. If the
// constraints are satisfied, we will tag the node with a "_kernel" value of
// "ngraph", along with some op-specific metadata (if applicable). The stub
// kernels, in turn, are registered with the constraint that _kernel="ngraph".
// This means that during the placement pass, our kernels will not be allowed
// for nodes we did not mark during this pass, and placement will fall back on
// CPU.
NGraphConfirmPass
46. src/ngraph_confirm_pass.cc
// Taking Reshape as an example, the pass ensures that the "shape" input is
// constant, and if so, it adds to the Reshape node the "_kernel=ngraph"
// attribute, along with some metadata recording the value of the constant.
// Thus graph (1) is transformed as follows:
//
// ... Const[2,4,2][_kernel="ngraph"]
// /
// Reshape[_kernel="ngraph",
// _ngraph_reshape_static_shape={2,4,2}]
//
// while graph (2) would be left unchanged, meaning that soft placement will
// fall back on non-nGraph implementations.
NGraphConfirmPass
47. src/ngraph_confirm_pass.cc
// Internally, there are two pieces. The first is a type constraint checker,
// which supplants the type checking machinery usually used with
// REGISTER_KERNEL_BUILDER. This ensures that any constraints on the data types
// of input tensors are satisfied---for example, we do not support DT_STRING.
// The second part is a set of finer-grained per-op checks called "confirmation
// functions", implementing more specific checks like the one described for
// Reshape above.
//
// The confirmation functions are implemented as callbacks of the type:
//
// std::function<tf::Status(tf::Node*, bool*)>.
NGraphConfirmPass
48. src/ngraph_confirm_pass.cc
// A confirmation function returns true/false by reference through its second
// parameter: true if placement is "accepted", and false if it is "rejected".
// For example, the confirmation function for "Reshape" will return true
// for (1) above, and false for (2).
//
// A confirmation function can also, as a side effect, add attributes to the
// node being checked, which can be used later in ngraph_builder. (Note that in
// general such attributes will need to start with "_" to mark them as
// "internal" or "system" attributes, as otherwise TensorFlow attempts to
// validate them as against the op schema.)
NGraphConfirmPass
54. src/ngraph_cluster.h
class NGraphEncapsulatePass : public tensorflow::GraphOptimizationPass {
public:
tf::Status Run(const tf::GraphOptimizationPassOptions& options) {
if (std::getenv("NGRAPH_TF_SKIP_ENCAPSULATION") != nullptr) {
NGRAPH_VLOG(0)
<< "NGRAPH_TF_SKIP_ENCAPSULATION is set. Skipping encapsulation "
"step.";
return tf::Status::OK();
}
return EncapsulateFunctions(options.graph->get());
}
…….
};
NGraphEncapsulatePass
55. src/ngraph_encapsulate_pass.cc
tf::Status EncapsulateFunctions(tf::Graph* graph) {
// Pass 1: Populate the cluster-index-to-device name map for each existing
// cluster.
// Pass 2: Find all nodes that are feeding into/out of each cluster, and
// add inputs for them to the corresponding FunctionDef(s).
// 各クラスタに入力部分がある場合は、入力ノードを追加?
NGraphEncapsulatePass::EncapsulateFunctions
56. src/ngraph_encapsulate_pass.cc
// Pass 2: Find all nodes that are feeding into/out of each cluster, and
// add inputs for them to the corresponding FunctionDef(s).
// 各クラスタに入力部分がある場合は、入力ノード /出力ノードの追加?
auto new_input_node_def =
NGraphClusterManager::GetClusterGraph(dst_cluster_idx)->add_node();
new_input_node_def->set_name(new_input_name);
new_input_node_def->set_op("_Arg");
SetAttrValue(dt, &((*(new_input_node_def->mutable_attr()))["T"]));
SetAttrValue(arg_index_count[dst_cluster_idx],
&((*(new_input_node_def->mutable_attr()))["index"]));
NGraphEncapsulatePass::EncapsulateFunctions
58. src/ngraph_encapsulate_pass.cc
// Pass 4: Remap all non-clustered inputs that are reading from
// encapsulated edges, and all control edges that cross cluster
// boundaries.
// Pass 5: Make copies of all clustered nodes inside the cluster graphs,
// rewiring the inputs in their NodeDefs as we go.
// Pass 6: Remove clustered nodes from the graph.
NGraphEncapsulatePass::EncapsulateFunctions
59. src/ngraph_encapsulate_pass.cc
// Pass 7 (optional, only run if environment variable
// NGRAPH_TF_VALIDATE_CLUSTER_GRAPHS is set):
// validate the graph def, and make sure we can construct a graph from it.
NGraphEncapsulatePass::EncapsulateFunctions
72. src/ngraph_builder.cc
// 出力部
vector<shared_ptr<ng::Node>> ng_result_list(tf_ret_vals.size());
for (auto n : tf_ret_vals) {
tf::Node* tf_input_node;
n->input_node(0, &tf_input_node;
int index;
tf::GetNodeAttr(n->attrs(), "index", &index);
auto item = ng_op_map.find(tf_input_node->name());
if (item != ng_op_map.end()) {
ng_result_list[index] = item->second;
} else {
return tf::errors::InvalidArgument("Cannot find return node: ",
tf_input_node->name());
}
}
Builder::TranslateGraph
73. src/ngraph_builder.cc
vector<shared_ptr<ng::Node>> ng_result_list(tf_ret_vals.size());
for (auto n : tf_ret_vals) {
tf::Node* tf_input_node;
n->input_node(0, &tf_input_node);
int index;
tf::GetNodeAttr(n->attrs(), "index", &index);
auto item = ng_op_map.find(tf_input_node->name());
if (item != ng_op_map.end()) {
ng_result_list[index] = item->second;
}
// 関数のポインタ (nGraph)
ng_function = make_shared<ng::Function>(ng_result_list, ng_parameter_list);
return tf::Status::OK();
}
Builder::TranslateGraph
74. src/ngraph_builder.cc
// Now create the nGraph ops from TensorFlow ops.
//
for (auto op : tf_ops) {
NGRAPH_VLOG(2) << "Constructing op " << op->name() << " which is "
<< op->type_string();
// NOTE: The following cases should be kept in alphabetical order.
// いろいろな Ops に対する処理をしている
}
Builder::TranslateGraphでサポートするOps
84. Switch to deviceless (#117)
Large PR ("never again", I tell myself) to implement "deviceless" support for
nGraph. To make a long story short:
* The `NGRAPH` device goes away.
* `NGraphEncapsulateOp` now runs on the `CPU` device (no more sends/recvs)
* No more stub kernels or copied implementations of TF core ops like `Enter`/`Exit`
* Clustering, encapsulation, etc. is moved to an all-at-once pass in
`POST_REWRITE_FOR_EXEC` (so the weirdness we've seen where a confirmed
op gets rewritten without required attributes will not happen anymore).
https://github.com/NervanaSystems/ngraph-tf/commit/ddba671ba
23dda4e4e0f6e045936e05a624bb962