How to Implement Query Modules?

We are going to examine how the query module example is implemented using the C API and the Python API. Both query modules can be found in the /usr/lib/memgraph/query_modules directory.

Using Docker With Query Modules

If you are using Docker to run Memgraph you will have to create a volume and mount it to access the query_modules directory. This can be done by creating an empty directory ~modules and executing the following command:

docker volume create --driver local --opt type=none --opt device=~modules --opt o=bind modules

Now, you can start Memgraph and mount the created volume:

docker run -it --rm -v modules:/usr/lib/memgraph/query_modules -p 7687:7687 memgraph

Everything from the directory /usr/lib/memgraph/query_modules will be visible/editable in your mounted modules volume and vice versa.

Python API

Query modules can be implemented using the Python API provided by Memgraph. If you wish to write your own query modules using the Python API, you need to have Python version 3.5.0 or above installed.

Let's take a look at the py_example.py file.

import mgp

On the first line, we import the mgp module, which contains definitions of the public Python API provided by Memgraph. In essence, this is a wrapper around the C API described in the next section. This file (mgp.py) can be found in the Memgraph installation directory, under python_support. On the standard Debian installation, this will be under /usr/lib/memgraph/python_support.

Next, we have a procedure function. This function will serve as the callback for our py_example.procedure invocation through openCypher.

@mgp.read_proc
def procedure(context: mgp.ProcCtx,
required_arg: mgp.Nullable[mgp.Any],
optional_arg: mgp.Nullable[mgp.Any] = None
) -> mgp.Record(args=list,
vertex_count=int,
avg_degree=mgp.Number,
props=mgp.Nullable[mgp.Map]):
‚Äč
...

This procedure needs to be callable which optionally takes ProcCtx as the first argument. Other arguments will be bound to values passed in the cypher query. The full signature of this procedure needs to be annotated with types. The return type must be Record(field_name=type, ...) and the procedure must produce either a complete Record or None. As you can see, the procedure is passed to a read_proc decorator which handles read-only procedures. You can also inspect the definition of said decorator in the mgp.py file or take a look at the Python API reference guide.

In our case, the example procedure returns 4 fields:

  • args: a copy of arguments passed to the procedure.

  • vertex_count: number of vertices in the database.

  • avg_degree: average degree of vertices.

  • props: properties map of the Vertex or Edge object passed in required_arg.

    In case a Path instance is passed, the procedure returns the properties map of

    the starting vertex.

This procedure can be invoked in openCypher as follows:

MATCH (n) WITH n LIMIT 1 CALL py_example.procedure(n, 1) YIELD * RETURN *;

The following lines create the properties map for a received Edge, Vertex or Path instance:

if isinstance(required_arg, (mgp.Edge, mgp.Vertex)):
props = dict(required_arg.properties.items())
elif isinstance(required_arg, mgp.Path):
start_vertex, = required_arg.vertices
props = dict(start_vertex.properties.items())

As you can see, in the case of mgp.Edge and mgp.Vertex, we obtain an instance of mgp.Properties class which holds the respective properties by accessing the properties property of our mgp.Edge or mgp.Vertex instance. Once we have access to mgp.Properties instance, we can simply invoke the items() method which returns an Iterable that contains mgp.Property objects. Since the type of mgp.Property is a simple collections.namedtuple containing name and value, we can easily pass it to a dict constructor.

We go on to counting the number of vertices and edges in our graph:

vertex_count = 0
edge_count = 0
for v in context.graph.vertices:
vertex_count += 1
edge_count += sum(1 for e in v.in_edges)
edge_count += sum(1 for e in v.out_edges)

As you can see, we can access the mgp.Graph instance through context.graph. This instance contains the state of our database when executing the cypher query that called our procedure. A mgp.Graph instance has a property vertices which allows us to access a mgp.Vertices object which can be iterated upon.

Similarly, each mgp.Vertex object has in_edges and out_edges properties which allow us to iterate over the corresponding mgp.Edge objects. The rest of the code logic from the previous snippet is self-explanatory, we simply increase the adequate variables on each traversed vertex or edge.

After that we calculate the average degree and obtain a copy of the passed arguments:

avg_degree = 0 if vertex_count == 0 else edge_count / vertex_count
args_copy = [copy.deepcopy(required_arg), copy.deepcopy(optional_arg)]

Finally, we return a mgp.Record with all the calculated values:

return mgp.Record(args=args_copy, vertex_count=vertex_count,
avg_degree=avg_degree, props=props)

In conclusion, Python API provided by Memgraph can be a very powerful, yet simple tool when implementing query modules. Therefore, we strongly suggest that all users thoroughly inspect the mgp.py source file.

NOTE: You should not globally store any graph elements when writing your own query modules with the intent to use them in a different procedure invocation.

C API

Query modules can be implemented using the C API provided by Memgraph. Such modules need to be compiled to a shared library so that they can be loaded when Memgraph starts. This means that you can write the procedures in any programming language which can work with C and can be compiled to the ELF shared library format.

WARNING: If your programming language of choice throws exceptions, these exceptions must never leave the scope of your module! You should have a top-level exception handler which returns with an error value and potentially logs the error message. Exceptions which cross the module boundary will cause all sorts of unexpected issues.

Let's take a look at the example.c file.

#include "mg_procedure.h"

On the first line, we include mg_procedure.h, which contains declarations of all functions that can be used to implement a query module procedure. This file is found in the Memgraph installation directory, under include/memgraph. On the standard Debian installation, this will be under /usr/include/memgraph. To compile the module, you will have to pass the appropriate flags to the compiler. For example, using clang:

clang -Wall -shared -fPIC -I /usr/include/memgraph example.c -o example.so

Next, we have a procedure function. This function will serve as the callback for our example.procedure invocation through openCypher.

static void procedure(const struct mgp_list *args, const struct mgp_graph *graph,
struct mgp_result *result, struct mgp_memory *memory) {
...
}

If this were C++ you'd probably write the function as such:

namespace {
void procedure(const mgp_list *args, const mgp_graph *graph,
mgp_result *result, mgp_memory *memory) {
try {
...
} catch (const std::exception &e) {
// We must not let any exceptions out of our module.
mgp_result_set_error_msg(result, e.what());
return;
}
}
}

The procedure function will receive the list of arguments (args) which are passed in the query. The parameter result is used to fill in the resulting records of the procedure. Parameters graph and memory are context parameters of the procedure, and they are used in some parts of the provided C API. For more information on what exactly is possible via C API, take a look at the mg_procedure.h file or at the C API reference guide, as well as the example.c found in /usr/lib/memgraph/query_modules/src

Then comes the required mgp_init_module function. Its primary purpose is to register procedures which can then be invoked through openCypher. Although the example registers a single procedure, you can register multiple different procedures in a single module. Each of these can be invoked using CALL <module>.<procedure> ... syntax. The <module-name> will correspond to the name of the shared library. Since we compile our example into example.so, then the module is called example. Procedure names can be different than their corresponding implementation callbacks because the procedure name is defined when registering a procedure.

int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) {
// Register our `procedure` as a read procedure with the name "procedure".
struct mgp_proc *proc =
mgp_module_add_read_procedure(module, "procedure", procedure);
// Return non-zero on error.
if (!proc) return 1;
// Additional code for better specifying the procedure (omitted here).
...
// Return 0 to indicate success.
return 0;
}

The omitted part specifies the signature of the registered procedure. The signature specification states what kind of arguments a procedure accepts and what will be the resulting set of the procedure. For information on signature specification API, take a look at mg_procedure.h file and read the documentation on functions prefixed with mgp_proc_.

The passed in memory argument is only alive throughout the execution of mgp_init_module, so you must not allocate any global resources with it. If you really need to set up some global state, you may do so in the mgp_init_module but using the standard global allocators.

Consequently, you may want to reset any global state or release global resources in the following function.

int mgp_shutdown_module() {
// Return 0 to indicate success.
return 0;
}

As previously mentioned, no exceptions should leave your module. If you are writing the module in a language that throws them, you probably want exception handlers in mgp_init_module and mgp_shutdown_module as well.