Package Documentation
Popular (not Personalised) Recommender
- rs_course.popular.popular_recommender(ratings: DataFrame, warm_users_only: bool, top_k: int, train_percentage: float) None
Build a non-personalised recommender based on item popularity.
>>> popular_recommender( ... getfixture("test_dataset").ratings, # noqa: F821 ... False, ... 10, ... 0.95 ... ) 1.0
- Parameters:
ratings – a dataset of user-items interactions
warm_users_only – test on only those users, who were in training set
top_k – the number of items to recommend
train_percentage – percentage of user-item pairs to leave in the training set
Content-based item2item KNN Recommender
- rs_course.content_based_knn.get_content_based_recommender(movielens: MovieLens, split_test_users_into: int, top_k: int, train_percentage: float) ItemItemRecommender
Build a content-based recommender.
>>> _ = get_content_based_recommender( ... getfixture("test_dataset"), # noqa: F821 ... 1, ... 10, ... 0.95 ... ) Content-Based Hit-Rate: 1.0
- Parameters:
movielens – a MovieLens dataset
split_test_users_into – into how many chunks to split the test set
top_k – the number of items to recommend
train_percentage – percentage of user-item pairs to leave in the training set
- Returns:
a trained recommender
Collaborative Filtering KNN Recommender
- rs_course.collaborative_filtering_knn.collaborative_filtering_knn(ratings: DataFrame, number_of_neighbours: int, split_test_users_into: int, top_k: int, train_percentage: float) None
Build a collaborative filtering KNN model.
>>> collaborative_filtering_knn( ... getfixture("test_dataset").ratings, # noqa: F821 ... 7, ... 1, ... top_k=10, ... train_percentage=0.95 ... ) 1.0
- Parameters:
ratings – dataset of user-items interactions
number_of_neighbours – number of neighbours for KNN
split_test_users_into – a number of chunks for testing
top_k – the number of items to recommend
train_percentage – percentage of user-item pairs to leave in the training set
Collaborative Filtering ALS Recommender
- rs_course.cf_als.als_recommendations(ratings: DataFrame, model_params: Dict[str, Any], split_test_users_into: int, top_k: int, train_percentage: float) Tuple[csr_matrix, AlternatingLeastSquares, float, DataFrame, DataFrame]
Build an ALS recommender from
implicit.>>> ratings = getfixture("test_dataset").ratings # noqa: F821 >>> import os >>> model_params = { ... "factors": 1, ... "use_gpu": os.environ.get("TEST_ON_GPU", False), ... "random_state": 0, ... "iterations": 1, ... } >>> _, _, hit_rate, _, _ = als_recommendations( ... ratings=ratings, ... model_params=model_params, ... split_test_users_into=1, ... top_k=10, ... train_percentage=0.95 ... ) >>> print(hit_rate) 1.0
- Parameters:
ratings – a dataset of user-items interactions
model_params – ALS training parameters
split_test_users_into – a number of chunks for testing
top_k – the number of items to recommend
train_percentage – percentage of user-item pairs to leave in the training set
- Returns:
a tuple of train set in sparse format, trained recommender, hit_rate@10, train, and test test in
pandas format
Pure SVD Recommender
- rs_course.cf_svd.get_svd_recs(recommender: TruncatedSVD, sparse_train: csr_matrix, test: DataFrame, split_test_users_into: int, top_k: int) Dict[int, List[int]]
Get recommendations given a truncated SVD decomposition.
- Parameters:
recommender – a truncated SVD decomposition
sparse_train – a
scr_matrixrepresentation of the train datatest – test data
split_test_users_into – a number of chunks for testing
top_k – the number of items to recommend
- Returns:
recommendations in
rs_metricscompatible format
- rs_course.cf_svd.pure_svd_recommender(ratings: DataFrame, split_test_users_into: int, model_config: Dict[str, Any], top_k: int, train_percentage: float) None
Build an example of SVD recommender based on
sklearn.>>> pure_svd_recommender( ... getfixture("test_dataset").ratings, # noqa: F821 ... 1, ... {"n_components": 1, "random_state": 0}, ... 10, ... 0.95 ... ) 1.0
- Parameters:
ratings – a dataset of user-items interactions
split_test_users_into – a number of chunks for testing
model_config – a dict of
TruncatedSVDargument for model trainingtop_k – number of items to recommend
train_percentage – percentage of user-item pairs to leave in the training set
DNN Recommender Example
- rs_course.dnn_rs.dnn_recommender(ratings: DataFrame, model_config: Dict[str, Any], split_test_users_into: int, top_k: int, train_percentage: float) float
Build a RecBole model.
>>> import os >>> model_config = { ... "data_path": ".", ... "eval_step": 0, ... "embedding_size": 1, ... "cnn_channels": [1, 1], ... "cnn_kernels": [1], ... "cnn_strides": [1], ... "epochs": 1, ... "use_gpu": os.environ.get("TEST_ON_GPU", False), ... } >>> test_ratings = getfixture("recbole_test_data").ratings # noqa: F821 >>> isinstance( ... dnn_recommender(test_ratings, model_config, 100, 10, 0.95), ... float ... ) True
- Parameters:
ratings – a dataset of user-items interactions
model_config –
config_dictof arecbolemodelsplit_test_users_into – split
testby users into several chunks to fit into memorytop_k – the number of items to recommend
train_percentage – percentage of user-item pairs to leave in the training set
- Returns:
- rs_course.dnn_rs.get_dnn_predictions(recommender: Any, sparse_train: csr_matrix, test: DataFrame, top_k: int, split_test_users_into: int) Dict[int, List[int]]
Get recommendations given a DNN recommender.
- Parameters:
recommender – a recommender
sparse_train – a
scr_matrixrepresentation of the train datatest – test data
top_k – how many recommendations to return for each user
split_test_users_into – split
testby users into several chunks to fit into memory
- Returns:
recommendations in
rs_metricscompatible format
- rs_course.dnn_rs.get_recbole_trained_recommender(config: Config, train_data: Dataset) ConvNCF
Train a DNN recommender from RecBole.
- Parameters:
config – a
recboleconfigtrain_data – a training dataset in the
recboleformat
- Returns:
a trained model ready for evaluation
- rs_course.dnn_rs.prepare_recbole_data(data_name: str, dataframe: DataFrame, config: Config) Dataset
Create a directory and write an interactions ‘Atomic File’ there.
Attention! The directory
data_namewill be removed no questions asked- Parameters:
data_name – a name for the folder and the main file
dataframe – a Pandas dataframe to convert to
recbole’s Atomic Filesconfig – a
recboleconfig
- Returns:
Cold Start Recommender Example
- rs_course.cold_start.cold_start(movielens: MovieLens, als_config: Dict[str, Any], split_test_users_into: int, top_k: int, train_percentage: float) None
Build and test an ALS-based recommender with cold start.
>>> import os >>> als_config = { ... "factors": 1, ... "use_gpu": os.environ.get("TEST_ON_GPU", False), ... "random_state": 0 ... } >>> cold_start( ... getfixture("test_dataset"), # noqa: F821 ... als_config, ... 1, ... 10, ... 0.08 ... ) Collaborative Filtering Hit-Rate: 0.0 Content-Based Hit-Rate: 1.0 cold items percentage in test: 1.0 cold rows percentage in test: 1.0 users with cold items percentage in test: 1.0 users with no cold items percentage in test: 0.0 Hybrid Hit-Rate: 1.0
- Parameters:
movielens – MovieLens dataset
als_config – collaborative model training params
split_test_users_into – a number of chunks for testing
top_k – the number of items to recommend
train_percentage – percentage of user-item pairs to leave in the training set
- rs_course.cold_start.compute_cold_factors(cold_items: List[int], content_based_recommender: ItemItemRecommender, recommender: AlternatingLeastSquares) Dict[int, ndarray]
Compute latent factors for cold items.
- Parameters:
cold_items – a list of cold items to which compute factors
content_based_recommender – a content-based recommender to induce new factors
recommender – an ALS recommender with known factors (for non-cold items)
- Returns:
a dictionary with cold item IDs as keys and their computed factors as values
- rs_course.cold_start.get_cold_items(train: DataFrame, test: DataFrame) List[int]
Get a list of cold items.
- Parameters:
train – train set
test – test set
- Returns:
a list of cold items and the test set
Factorisation Machines with Vowpal Wabbit
- rs_course.factorisation_machines.factorisation_machines(ratings: DataFrame, num_epochs: int, verbose: bool, seed: int, bit_precision: int) float
Build a factorisation machine.
>>> factorisation_machines( ... getfixture("test_dataset").ratings, 1, False, 0, 1 # noqa: F821 ... ) 0.8333333333333333
- Parameters:
ratings – ratings dataset
num_epochs – number of epochs (
vwpasses)verbose – an opposite of
vwquietseed – a random seed for testing
bit_precision – a VW argument
- Returns:
Useful Function for the Whole Course
- rs_course.utils.enumerate_users_and_items(ratings: DataFrame) None
In-place change of user and item IDs into numbers.
- Parameters:
ratings – ratings dataset
- rs_course.utils.evaluate_implicit_recommender(recommender: ItemItemRecommender, train: csr_matrix, test: DataFrame, split_test_users_into: int, top_k: int) float
Compute hit-rate for a recommender from
implicitpackage.- Parameters:
recommender – some recommender from
implicitpackagetrain – sparse matrix of ratings
test – pandas dataset of ratings for testing
split_test_users_into – split
testby users into several chunks to fit into memorytop_k – how many items to recommend to each user
- Returns:
- rs_course.utils.filter_users_and_items(ratings: DataFrame, min_items_per_user: int | None, min_users_per_item: int | None) DataFrame
Leave only items with at least
min_users_per_itemusers who rated them.(and only users who rated at least
min_items_per_user)- Parameters:
ratings – ratings dataset
min_items_per_user – if
Nonethen don’t filtermin_users_per_item – if
Nonethen don’t filter
- Returns:
filtered ratings dataset
- rs_course.utils.get_sparse_item_features(movielens: MovieLens, ratings: DataFrame) Tuple[csr_matrix, DataFrame]
Extract item features from
tagsdataset.- Parameters:
movielens – full MovieLens dataset (only for tags and genres)
ratings – ratings data (can differ from one in
MovieLensobject)
- Returns:
sparse matrix and a pandas DataFrame of item features (tags)
- rs_course.utils.movielens_split(ratings: DataFrame, train_percentage: float, warm_users_only: bool = False) Tuple[csr_matrix, DataFrame, Tuple[int, int]]
Split
ratingsdataset to train and test.- Parameters:
ratings – ratings dataset from MovieLens
train_percentage – percentage of data to put into training dataset
warm_users_only – test on only those users, who were in training set
- Returns:
sparse matrix for training and pandas dataset for testing
- rs_course.utils.pandas_to_scipy(pd_dataframe: DataFrame, data_name: str, rows_name: str, cols_name: str, shape: Tuple[int, int]) csr_matrix
Transform pandas dataset with three columns to a sparse matrix.
- Parameters:
pd_dataframe – an input
pandasdataframedata_name – column name with values for the matrix cells
rows_name – column name with row numbers of the cells
cols_name – column name with column numbers of the cells
shape – a pair (total number of rows, total number of columns)
- Returns:
a
csr_matrix