Package Documentation

rs_course.popular.popular_recommender(ratings: DataFrame, warm_users_only: bool, top_k: int, train_percentage: float) None

Build a non-personalised recommender based on item popularity.

>>> popular_recommender(
...     getfixture("test_dataset").ratings,  # noqa: F821
...     False,
...     10,
...     0.95
... )
1.0
Parameters:
  • ratings – a dataset of user-items interactions

  • warm_users_only – test on only those users, who were in training set

  • top_k – the number of items to recommend

  • train_percentage – percentage of user-item pairs to leave in the training set

Content-based item2item KNN Recommender

rs_course.content_based_knn.get_content_based_recommender(movielens: MovieLens, split_test_users_into: int, top_k: int, train_percentage: float) ItemItemRecommender

Build a content-based recommender.

>>> _ = get_content_based_recommender(
...     getfixture("test_dataset"),  # noqa: F821
...     1,
...     10,
...     0.95
... )
Content-Based Hit-Rate: 1.0
Parameters:
  • movielens – a MovieLens dataset

  • split_test_users_into – into how many chunks to split the test set

  • top_k – the number of items to recommend

  • train_percentage – percentage of user-item pairs to leave in the training set

Returns:

a trained recommender

Collaborative Filtering KNN Recommender

rs_course.collaborative_filtering_knn.collaborative_filtering_knn(ratings: DataFrame, number_of_neighbours: int, split_test_users_into: int, top_k: int, train_percentage: float) None

Build a collaborative filtering KNN model.

>>> collaborative_filtering_knn(
...     getfixture("test_dataset").ratings,  # noqa: F821
...     7,
...     1,
...     top_k=10,
...     train_percentage=0.95
... )
1.0
Parameters:
  • ratings – dataset of user-items interactions

  • number_of_neighbours – number of neighbours for KNN

  • split_test_users_into – a number of chunks for testing

  • top_k – the number of items to recommend

  • train_percentage – percentage of user-item pairs to leave in the training set

Collaborative Filtering ALS Recommender

rs_course.cf_als.als_recommendations(ratings: DataFrame, model_params: Dict[str, Any], split_test_users_into: int, top_k: int, train_percentage: float) Tuple[csr_matrix, AlternatingLeastSquares, float, DataFrame, DataFrame]

Build an ALS recommender from implicit.

>>> ratings = getfixture("test_dataset").ratings  # noqa: F821
>>> import os
>>> model_params = {
...      "factors": 1,
...      "use_gpu": os.environ.get("TEST_ON_GPU", False),
...      "random_state": 0,
...      "iterations": 1,
... }
>>> _, _, hit_rate, _, _ = als_recommendations(
...     ratings=ratings,
...     model_params=model_params,
...     split_test_users_into=1,
...     top_k=10,
...     train_percentage=0.95
... )
>>> print(hit_rate)
1.0
Parameters:
  • ratings – a dataset of user-items interactions

  • model_params – ALS training parameters

  • split_test_users_into – a number of chunks for testing

  • top_k – the number of items to recommend

  • train_percentage – percentage of user-item pairs to leave in the training set

Returns:

a tuple of train set in sparse format, trained recommender, hit_rate@10, train, and test test in pandas format

Pure SVD Recommender

rs_course.cf_svd.get_svd_recs(recommender: TruncatedSVD, sparse_train: csr_matrix, test: DataFrame, split_test_users_into: int, top_k: int) Dict[int, List[int]]

Get recommendations given a truncated SVD decomposition.

Parameters:
  • recommender – a truncated SVD decomposition

  • sparse_train – a scr_matrix representation of the train data

  • test – test data

  • split_test_users_into – a number of chunks for testing

  • top_k – the number of items to recommend

Returns:

recommendations in rs_metrics compatible format

rs_course.cf_svd.pure_svd_recommender(ratings: DataFrame, split_test_users_into: int, model_config: Dict[str, Any], top_k: int, train_percentage: float) None

Build an example of SVD recommender based on sklearn.

>>> pure_svd_recommender(
...     getfixture("test_dataset").ratings,  # noqa: F821
...     1,
...     {"n_components": 1, "random_state": 0},
...     10,
...     0.95
... )
1.0
Parameters:
  • ratings – a dataset of user-items interactions

  • split_test_users_into – a number of chunks for testing

  • model_config – a dict of TruncatedSVD argument for model training

  • top_k – number of items to recommend

  • train_percentage – percentage of user-item pairs to leave in the training set

DNN Recommender Example

rs_course.dnn_rs.dnn_recommender(ratings: DataFrame, model_config: Dict[str, Any], split_test_users_into: int, top_k: int, train_percentage: float) float

Build a RecBole model.

>>> import os
>>> model_config = {
...     "data_path": ".",
...     "eval_step": 0,
...     "embedding_size": 1,
...     "cnn_channels": [1, 1],
...     "cnn_kernels": [1],
...     "cnn_strides": [1],
...     "epochs": 1,
...     "use_gpu": os.environ.get("TEST_ON_GPU", False),
... }
>>> test_ratings = getfixture("recbole_test_data").ratings  # noqa: F821
>>> isinstance(
...     dnn_recommender(test_ratings, model_config, 100, 10, 0.95),
...     float
... )
True
Parameters:
  • ratings – a dataset of user-items interactions

  • model_configconfig_dict of a recbole model

  • split_test_users_into – split test by users into several chunks to fit into memory

  • top_k – the number of items to recommend

  • train_percentage – percentage of user-item pairs to leave in the training set

Returns:

hitrate@top_k

rs_course.dnn_rs.get_dnn_predictions(recommender: Any, sparse_train: csr_matrix, test: DataFrame, top_k: int, split_test_users_into: int) Dict[int, List[int]]

Get recommendations given a DNN recommender.

Parameters:
  • recommender – a recommender

  • sparse_train – a scr_matrix representation of the train data

  • test – test data

  • top_k – how many recommendations to return for each user

  • split_test_users_into – split test by users into several chunks to fit into memory

Returns:

recommendations in rs_metrics compatible format

rs_course.dnn_rs.get_recbole_trained_recommender(config: Config, train_data: Dataset) ConvNCF

Train a DNN recommender from RecBole.

Parameters:
  • config – a recbole config

  • train_data – a training dataset in the recbole format

Returns:

a trained model ready for evaluation

rs_course.dnn_rs.prepare_recbole_data(data_name: str, dataframe: DataFrame, config: Config) Dataset

Create a directory and write an interactions ‘Atomic File’ there.

Attention! The directory data_name will be removed no questions asked

Parameters:
  • data_name – a name for the folder and the main file

  • dataframe – a Pandas dataframe to convert to recbole’s Atomic Files

  • config – a recbole config

Returns:

Cold Start Recommender Example

rs_course.cold_start.cold_start(movielens: MovieLens, als_config: Dict[str, Any], split_test_users_into: int, top_k: int, train_percentage: float) None

Build and test an ALS-based recommender with cold start.

>>> import os
>>> als_config = {
...      "factors": 1,
...      "use_gpu": os.environ.get("TEST_ON_GPU", False),
...      "random_state": 0
... }
>>> cold_start(
...     getfixture("test_dataset"),  # noqa: F821
...     als_config,
...     1,
...     10,
...     0.08
... )
Collaborative Filtering Hit-Rate: 0.0
Content-Based Hit-Rate: 1.0
cold items percentage in test: 1.0
cold rows percentage in test: 1.0
users with cold items percentage in test: 1.0
users with no cold items percentage in test: 0.0
Hybrid Hit-Rate: 1.0
Parameters:
  • movielens – MovieLens dataset

  • als_config – collaborative model training params

  • split_test_users_into – a number of chunks for testing

  • top_k – the number of items to recommend

  • train_percentage – percentage of user-item pairs to leave in the training set

rs_course.cold_start.compute_cold_factors(cold_items: List[int], content_based_recommender: ItemItemRecommender, recommender: AlternatingLeastSquares) Dict[int, ndarray]

Compute latent factors for cold items.

Parameters:
  • cold_items – a list of cold items to which compute factors

  • content_based_recommender – a content-based recommender to induce new factors

  • recommender – an ALS recommender with known factors (for non-cold items)

Returns:

a dictionary with cold item IDs as keys and their computed factors as values

rs_course.cold_start.get_cold_items(train: DataFrame, test: DataFrame) List[int]

Get a list of cold items.

Parameters:
  • train – train set

  • test – test set

Returns:

a list of cold items and the test set

Factorisation Machines with Vowpal Wabbit

rs_course.factorisation_machines.factorisation_machines(ratings: DataFrame, num_epochs: int, verbose: bool, seed: int, bit_precision: int) float

Build a factorisation machine.

>>> factorisation_machines(
...     getfixture("test_dataset").ratings, 1, False, 0, 1  # noqa: F821
... )
0.8333333333333333
Parameters:
  • ratings – ratings dataset

  • num_epochs – number of epochs (vw passes)

  • verbose – an opposite of vw quiet

  • seed – a random seed for testing

  • bit_precision – a VW argument

Returns:

Useful Function for the Whole Course

rs_course.utils.enumerate_users_and_items(ratings: DataFrame) None

In-place change of user and item IDs into numbers.

Parameters:

ratings – ratings dataset

rs_course.utils.evaluate_implicit_recommender(recommender: ItemItemRecommender, train: csr_matrix, test: DataFrame, split_test_users_into: int, top_k: int) float

Compute hit-rate for a recommender from implicit package.

Parameters:
  • recommender – some recommender from implicit package

  • train – sparse matrix of ratings

  • test – pandas dataset of ratings for testing

  • split_test_users_into – split test by users into several chunks to fit into memory

  • top_k – how many items to recommend to each user

Returns:

hitrate@10

rs_course.utils.filter_users_and_items(ratings: DataFrame, min_items_per_user: int | None, min_users_per_item: int | None) DataFrame

Leave only items with at least min_users_per_item users who rated them.

(and only users who rated at least min_items_per_user)

Parameters:
  • ratings – ratings dataset

  • min_items_per_user – if None then don’t filter

  • min_users_per_item – if None then don’t filter

Returns:

filtered ratings dataset

rs_course.utils.get_sparse_item_features(movielens: MovieLens, ratings: DataFrame) Tuple[csr_matrix, DataFrame]

Extract item features from tags dataset.

Parameters:
  • movielens – full MovieLens dataset (only for tags and genres)

  • ratings – ratings data (can differ from one in MovieLens object)

Returns:

sparse matrix and a pandas DataFrame of item features (tags)

rs_course.utils.movielens_split(ratings: DataFrame, train_percentage: float, warm_users_only: bool = False) Tuple[csr_matrix, DataFrame, Tuple[int, int]]

Split ratings dataset to train and test.

Parameters:
  • ratings – ratings dataset from MovieLens

  • train_percentage – percentage of data to put into training dataset

  • warm_users_only – test on only those users, who were in training set

Returns:

sparse matrix for training and pandas dataset for testing

rs_course.utils.pandas_to_scipy(pd_dataframe: DataFrame, data_name: str, rows_name: str, cols_name: str, shape: Tuple[int, int]) csr_matrix

Transform pandas dataset with three columns to a sparse matrix.

Parameters:
  • pd_dataframe – an input pandas dataframe

  • data_name – column name with values for the matrix cells

  • rows_name – column name with row numbers of the cells

  • cols_name – column name with column numbers of the cells

  • shape – a pair (total number of rows, total number of columns)

Returns:

a csr_matrix