This article is a part of a series intended to show users how to use Memgraph on real-world data and, by doing so, retrieve some interesting and useful information.
We highly recommend checking out the other articles from this series which are listed in our tutorial overview section.
This example shows how to implement a simple recommendation system
openCypher in Memgraph.
First, we will show how to perform simple operations,
and then we will implement a query for the movie recommendation.
In this example, we will use reduced MovieLens dataset (less than 1000 movies).
There are three different types of data:
Movies have properties:
Users have properties:
Genres have a property:
Each movie can be connected with
:ofGenre edge to different genres.
A user can rate some movie. Rating is modeled with
and this edge has property
score — float number between 0 and 5.
To import the dataset, download the Memgraph Lab
desktop application and navigate to the
Datasets tab in the sidebar. From there,
choose the dataset
MovieLens: Movies, genres and users and continue with the tutorial.
1) List first 10 movies sorted by title
2) List last 15 users sorted by name
3) List 10 movies that have Comedy and Action genres and sort them by title
4) Uniqueness constraint for genre:
Let's create a new unique constraint:
And now we can try to create new
Genre node with existing `name': "Comedy":
This query returns an error because genre "Comedy" already exists.
5) Average score for Star Wars movie:
6) Average scores for first 10 movies:
7) Create a new user and rate some movies:
Check if new user is created:
Rate some movies:
8) Recommendation system:
The idea is to implement simple memory based collaborative filtering.
Let's recommend some movies for user Aladin:
How does this query work?
This query has two parts:
- Finding similar users
- Predicting the score for some movie (recommendation)
In the first part, we are looking for similar users. First, we need to define similar users: Two users are considered similar if they tend to give similar scores to the same movies. For the target user (Aladin) and some other user we are searching for the same movies:
But this is not enough for finding similar users. We need to choose users with the same movies and similar scores:
Here we calculate similarities as the average distance between target user score and some other user score on the same set of movies. There are two parameters: similarUserCount limit (2) and similar user set size limit (10). Similar user count limit is used for filtering users who have at least 2 movies in common with the target user. Similar user set size is used to peek top 10 similar users (10 or less).
Now we have similar user set. We will use those users to calculate the average score for all movies in the database.
We encourage you to play with some parameters, like similar user count limit and similar user set size limit. You can also try to use different similarity functions, for example Euclidean distance:
Here we use
reduce function. Reduce function accumulate list elements
into a single result by applying an expression.
In our query, this function starts with 0 and sums up squared differences.
collect function is used for putting squared differences into the list.