Package Documentation

Popular (not Personalised) Recommender

rs_course.popular.popular_recommender(ratings: DataFrame, warm_users_only: bool, top_k: int, train_percentage: float) → None

Build a non-personalised recommender based on item popularity.

>>> popular_recommender(
...     getfixture("test_dataset").ratings,  # noqa: F821
...     False,
...     10,
...     0.95
... )
1.0

Parameters:

ratings – a dataset of user-items interactions
warm_users_only – test on only those users, who were in training set
top_k – the number of items to recommend
train_percentage – percentage of user-item pairs to leave in the training set

Content-based item2item KNN Recommender

rs_course.content_based_knn.get_content_based_recommender(movielens: MovieLens, split_test_users_into: int, top_k: int, train_percentage: float) → ItemItemRecommender

Build a content-based recommender.

>>> _ = get_content_based_recommender(
...     getfixture("test_dataset"),  # noqa: F821
...     1,
...     10,
...     0.95
... )
Content-Based Hit-Rate: 1.0

Parameters:

movielens – a MovieLens dataset
split_test_users_into – into how many chunks to split the test set
top_k – the number of items to recommend
train_percentage – percentage of user-item pairs to leave in the training set

Returns:

a trained recommender

Collaborative Filtering KNN Recommender

rs_course.collaborative_filtering_knn.collaborative_filtering_knn(ratings: DataFrame, number_of_neighbours: int, split_test_users_into: int, top_k: int, train_percentage: float) → None

Build a collaborative filtering KNN model.

>>> collaborative_filtering_knn(
...     getfixture("test_dataset").ratings,  # noqa: F821
...     7,
...     1,
...     top_k=10,
...     train_percentage=0.95
... )
1.0

Parameters:

ratings – dataset of user-items interactions
number_of_neighbours – number of neighbours for KNN
split_test_users_into – a number of chunks for testing
top_k – the number of items to recommend
train_percentage – percentage of user-item pairs to leave in the training set

Collaborative Filtering ALS Recommender

rs_course.cf_als.als_recommendations(ratings: DataFrame, model_params: Dict[str, Any], split_test_users_into: int, top_k: int, train_percentage: float) → Tuple[csr_matrix, AlternatingLeastSquares, float, DataFrame, DataFrame]

Build an ALS recommender from implicit.

>>> ratings = getfixture("test_dataset").ratings  # noqa: F821
>>> import os
>>> model_params = {
...      "factors": 1,
...      "use_gpu": os.environ.get("TEST_ON_GPU", False),
...      "random_state": 0,
...      "iterations": 1,
... }
>>> _, _, hit_rate, _, _ = als_recommendations(
...     ratings=ratings,
...     model_params=model_params,
...     split_test_users_into=1,
...     top_k=10,
...     train_percentage=0.95
... )
>>> print(hit_rate)
1.0

Parameters:

ratings – a dataset of user-items interactions
model_params – ALS training parameters
split_test_users_into – a number of chunks for testing
top_k – the number of items to recommend
train_percentage – percentage of user-item pairs to leave in the training set

Returns:

a tuple of train set in sparse format, trained recommender, hit_rate@10, train, and test test in pandas format

Pure SVD Recommender

rs_course.cf_svd.get_svd_recs(recommender: TruncatedSVD, sparse_train: csr_matrix, test: DataFrame, split_test_users_into: int, top_k: int) → Dict[int, List[int]]

Get recommendations given a truncated SVD decomposition.

Parameters:

recommender – a truncated SVD decomposition
sparse_train – a scr_matrix representation of the train data
test – test data
split_test_users_into – a number of chunks for testing
top_k – the number of items to recommend

Returns:

recommendations in rs_metrics compatible format

rs_course.cf_svd.pure_svd_recommender(ratings: DataFrame, split_test_users_into: int, model_config: Dict[str, Any], top_k: int, train_percentage: float) → None

Build an example of SVD recommender based on sklearn.

>>> pure_svd_recommender(
...     getfixture("test_dataset").ratings,  # noqa: F821
...     1,
...     {"n_components": 1, "random_state": 0},
...     10,
...     0.95
... )
1.0

Parameters:

ratings – a dataset of user-items interactions
split_test_users_into – a number of chunks for testing
model_config – a dict of TruncatedSVD argument for model training
top_k – number of items to recommend
train_percentage – percentage of user-item pairs to leave in the training set

DNN Recommender Example

rs_course.dnn_rs.dnn_recommender(ratings: DataFrame, model_config: Dict[str, Any], split_test_users_into: int, top_k: int, train_percentage: float) → float

Build a RecBole model.

>>> import os
>>> model_config = {
...     "data_path": ".",
...     "eval_step": 0,
...     "embedding_size": 1,
...     "cnn_channels": [1, 1],
...     "cnn_kernels": [1],
...     "cnn_strides": [1],
...     "epochs": 1,
...     "use_gpu": os.environ.get("TEST_ON_GPU", False),
... }
>>> test_ratings = getfixture("recbole_test_data").ratings  # noqa: F821
>>> isinstance(
...     dnn_recommender(test_ratings, model_config, 100, 10, 0.95),
...     float
... )
True

Parameters:

ratings – a dataset of user-items interactions
model_config – config_dict of a recbole model
split_test_users_into – split test by users into several chunks to fit into memory
top_k – the number of items to recommend
train_percentage – percentage of user-item pairs to leave in the training set

Returns:

hitrate@top_k

rs_course.dnn_rs.get_dnn_predictions(recommender: Any, sparse_train: csr_matrix, test: DataFrame, top_k: int, split_test_users_into: int) → Dict[int, List[int]]

Get recommendations given a DNN recommender.

Parameters:

recommender – a recommender
sparse_train – a scr_matrix representation of the train data
test – test data
top_k – how many recommendations to return for each user
split_test_users_into – split test by users into several chunks to fit into memory

Returns:

recommendations in rs_metrics compatible format

rs_course.dnn_rs.get_recbole_trained_recommender(config: Config, train_data: Dataset) → ConvNCF

Train a DNN recommender from RecBole.

Parameters:

config – a recbole config
train_data – a training dataset in the recbole format

Returns:

a trained model ready for evaluation

rs_course.dnn_rs.prepare_recbole_data(data_name: str, dataframe: DataFrame, config: Config) → Dataset

Create a directory and write an interactions ‘Atomic File’ there.

Attention! The directory data_name will be removed no questions asked

Parameters:

data_name – a name for the folder and the main file
dataframe – a Pandas dataframe to convert to recbole’s Atomic Files
config – a recbole config

Returns:

Cold Start Recommender Example

rs_course.cold_start.cold_start(movielens: MovieLens, als_config: Dict[str, Any], split_test_users_into: int, top_k: int, train_percentage: float) → None

Build and test an ALS-based recommender with cold start.

>>> import os
>>> als_config = {
...      "factors": 1,
...      "use_gpu": os.environ.get("TEST_ON_GPU", False),
...      "random_state": 0
... }
>>> cold_start(
...     getfixture("test_dataset"),  # noqa: F821
...     als_config,
...     1,
...     10,
...     0.08
... )
Collaborative Filtering Hit-Rate: 0.0
Content-Based Hit-Rate: 1.0
cold items percentage in test: 1.0
cold rows percentage in test: 1.0
users with cold items percentage in test: 1.0
users with no cold items percentage in test: 0.0
Hybrid Hit-Rate: 1.0

Parameters:

movielens – MovieLens dataset
als_config – collaborative model training params
split_test_users_into – a number of chunks for testing
top_k – the number of items to recommend
train_percentage – percentage of user-item pairs to leave in the training set

rs_course.cold_start.compute_cold_factors(cold_items: List[int], content_based_recommender: ItemItemRecommender, recommender: AlternatingLeastSquares) → Dict[int, ndarray]

Compute latent factors for cold items.

Parameters:

cold_items – a list of cold items to which compute factors
content_based_recommender – a content-based recommender to induce new factors
recommender – an ALS recommender with known factors (for non-cold items)

Returns:

a dictionary with cold item IDs as keys and their computed factors as values

rs_course.cold_start.get_cold_items(train: DataFrame, test: DataFrame) → List[int]

Get a list of cold items.

Parameters:

train – train set
test – test set

Returns:

a list of cold items and the test set

Factorisation Machines with Vowpal Wabbit

rs_course.factorisation_machines.factorisation_machines(ratings: DataFrame, num_epochs: int, verbose: bool, seed: int, bit_precision: int) → float

Build a factorisation machine.

>>> factorisation_machines(
...     getfixture("test_dataset").ratings, 1, False, 0, 1  # noqa: F821
... )
0.8333333333333333

Parameters:

ratings – ratings dataset
num_epochs – number of epochs (vw passes)
verbose – an opposite of vw quiet
seed – a random seed for testing
bit_precision – a VW argument

Returns:

Useful Function for the Whole Course

rs_course.utils.enumerate_users_and_items(ratings: DataFrame) → None

In-place change of user and item IDs into numbers.

Parameters:: ratings – ratings dataset

rs_course.utils.evaluate_implicit_recommender(recommender: ItemItemRecommender, train: csr_matrix, test: DataFrame, split_test_users_into: int, top_k: int) → float

Compute hit-rate for a recommender from implicit package.

Parameters:

recommender – some recommender from implicit package
train – sparse matrix of ratings
test – pandas dataset of ratings for testing
split_test_users_into – split test by users into several chunks to fit into memory
top_k – how many items to recommend to each user

Returns:

hitrate@10

rs_course.utils.filter_users_and_items(ratings: DataFrame, min_items_per_user: int | None, min_users_per_item: int | None) → DataFrame

Leave only items with at least min_users_per_item users who rated them.

(and only users who rated at least min_items_per_user)

Parameters:

ratings – ratings dataset
min_items_per_user – if None then don’t filter
min_users_per_item – if None then don’t filter

Returns:

filtered ratings dataset

rs_course.utils.get_sparse_item_features(movielens: MovieLens, ratings: DataFrame) → Tuple[csr_matrix, DataFrame]

Extract item features from tags dataset.

Parameters:

movielens – full MovieLens dataset (only for tags and genres)
ratings – ratings data (can differ from one in MovieLens object)

Returns:

sparse matrix and a pandas DataFrame of item features (tags)

rs_course.utils.movielens_split(ratings: DataFrame, train_percentage: float, warm_users_only: bool = False) → Tuple[csr_matrix, DataFrame, Tuple[int, int]]

Split ratings dataset to train and test.

Parameters:

ratings – ratings dataset from MovieLens
train_percentage – percentage of data to put into training dataset
warm_users_only – test on only those users, who were in training set

Returns:

sparse matrix for training and pandas dataset for testing

rs_course.utils.pandas_to_scipy(pd_dataframe: DataFrame, data_name: str, rows_name: str, cols_name: str, shape: Tuple[int, int]) → csr_matrix

Transform pandas dataset with three columns to a sparse matrix.

Parameters:

pd_dataframe – an input pandas dataframe
data_name – column name with values for the matrix cells
rows_name – column name with row numbers of the cells
cols_name – column name with column numbers of the cells
shape – a pair (total number of rows, total number of columns)

Returns:

a csr_matrix