Memgraph enables easier development and production serving of your machine learning models based on graph data by allowing you to query Memgraph directly from TensorFlow using the Memgraph TensorFlow op.
A TensorFlow op (operation) is a fundamental building block of all TensorFlow models. Memgraph TensorFlow op wraps the high-performance Memgraph client for use with TensorFlow, allowing natural data transfer between Memgraph and TensorFlow at any point in the model.
See TensorFlow Graphs and Session guide for more information.
Memgraph TensorFlow op API consists of inputs, attributes, and outputs.
There are two inputs:
- input list
query is a string and represents an
Memgraph TensorFlow op has some limitations on the query.
list is a query parameter. The name of this parameter is
Let's see one simple example:
The query execution replaces
$input_list with the provided op input
(see python example for more).
$input_list is the only query parameter used by Memgraph TensorFlow op.
Memgraph TensorFlow op attributes:
use_ssl are attributes used for
connecting to Memgraph. The only different attribute is
output_dtype has no default value and it is used to determine the type of
output tensor. Notice that all data in the output tensor must be of the same
output_dtype can be
TensorFlow op does not support other output types.
Memgraph TensorFlow op has two outputs:
The header is a string list. The list contains headers provided by query execution:
Rows data represents the query result.
Rows data is the matrix
(|rows| x |headers|).
If there are no results from the query (empty set),
the matrix has a dimension
(0 x 0).
Let's see the following example:
This query returns
n.id and list of
Memgraph TensorFlow op returns a matrix.
Therefore all elements in the matrix must be
of the same type. Memgraph TensorFlow op expands lists into the row.
Matrix dimension is
|rows| x |(standard headers + sum of list sizes)|.
Memgraph TensorFlow op also supports more than one list in the output:
Input list (
$input_list) can contain only elements of
The output matrix contains only elements with the same data type.
The data type can be
Null is not allowed in matrix output.
An exceptional case is a
string data type. In this case, the query result
can contain different types. All data will be converted into
A user must be careful here because converting data type to string
and vice versa can have unwanted performance issues.
If the query contains a list as output, the list expands into the row. All corresponding lists must have the same size.
Memgraph TensorFlow op reports internal errors:
- Cannot create a client instance
- Memgraph TensorFlow op missing some system resources to create a connection to Memgraph
- Cannot connect to Memgraph: \<message>
- Connection issue (wrong host name, wrong port, ssl problem, ...)
- Query error: \<message>
- Query is not valid
- Internal error: \<message>
- Some non-specific error appears during the communication between Op and Memgraph.
- List has wrong size, row: \<row>, header: \<header>
- Some output list has the wrong size. Size must be the same for all corresponding lists.
- Wrong type: \<header> = \<type> (\<value>)
- Matrix output contains an element with a wrong data type.
Memgraph Parallel Tensorflop Op is a way to speed up the performance of queries in a data-parallel way. The parallelization is done by splitting the input list into chunks, running the query on each chunk of the input list independently and simply concatenating the results into a single tensor.
The inputs, outputs, and errors are all equivalent to the regular Memgraph TensorFlow Op, with the exception of the parallel op having one additional attribute
num_workers determines how many parallel connections to Memgraph the parallel
TensorFlow Op will maintain and into how many chunks the input list is broken.
Under the hood, the Parallel TensorFlow Op runs each of your queries as several
independent queries. The exact number matches the
Your input list is split into chunks, such that every worker gets a chunk of approximately equal size. The only way to utilize parallelism is to use input lists.
Since the queries are independent, the queries' semantics can change depending on the number of workers. Running with a single worker is semantically equivalent to using the regular Memgraph TensorFlow Op. Running with multiple workers, any query which assumes it's seeing all the results is likely to produce unexpected results.
For example, a query that sorts results will only sort results within its chunk.
If this is the result of an imaginary query with
num_workers = 1:
This might be the result with
num_workers = 2:
The first worker is assigned a chunk of size three and the second worker a
chunk of size two.
Hence the first three elements are sorted amongst each other and the last
two elements are sorted amongst each other, but the entire result is not
A query with a limit clause will only limit the results within that
chunk, meaning the total result might have
(num_workers * limit) rows.
WHERE something in $input_list will cause unexpected results.
The parallel Memgraph TensorFlow op is best used when the input list is full of "ids" of nodes to be found and something independent has to be done for each found node, such as return its features, or its neighbors.