Easy_Rec package

Submodules

callbacks

class easy_rec.callbacks.DynamicNegatives(dataloader, neg_key='out_sid', id_key='uid', padding_idx=0)[source]

Bases: Callback

PyTorch Lightning callback that dynamically updates a buffer of hard negatives for each user based on model predictions during training.

Parameters:

dataloader (DataLoader) – The training dataloader that will receive the updated negatives.
neg_key (str) – The key used to access negatives in the batch.
id_key (str) – The key used to access user IDs in the batch.
padding_idx (int) – Index used for padding, ignored in negative selection.

init_vars()[source]: Initializes or resets internal tracking variables used to collect predictions and sampled negatives to prepare for the next epoch’s data collection.

on_train_batch_end(trainer, pl_module, model_outputs, batch_input, batch_idx)[source]

Collects model predictions and sampled negatives at the end of each training batch.

Parameters:

trainer (Trainer) – The PyTorch Lightning trainer.
pl_module (LightningModule) – The model being trained.
model_outputs (dict) – Output dictionary from the model’s forward pass. Must include “model_output”.
batch_input (dict) – The batch data input to the model, typically from the dataloader.
batch_idx (int) – Index of the current batch.

on_train_epoch_end(trainer, pl_module)[source]

Processes accumulated predictions to identify hard negatives and update the negatives buffer at the end of each training epoch.

Parameters:

trainer (Trainer) – The PyTorch Lightning trainer.
pl_module (LightningModule) – The model being trained.

data_generation_utils

easy_rec.data_generation_utils.download_dataset(dataset_name, dataset_raw_folder, additional_file_name=None)[source]

Downloads the requested dataset from predefined sources (e.g., HuggingFace, GroupLens, Amazon, etc.).

Parameters:

dataset_name (str) – Name of the dataset to download.
dataset_raw_folder (str) – Folder path to save the downloaded dataset.
additional_file_name (str, optional) – Additional file name required for some datasets.

Returns:

None

Return type:

None

easy_rec.data_generation_utils.preprocess_dataset(name, data_folder='../data/raw', min_rating=None, min_items_per_user=0, min_users_per_item=0, densify_index=True, split_method='leave_n_out', split_keys={'rating': ['train_rating', 'val_rating', 'test_rating'], 'sid': ['train_sid', 'val_sid', 'test_sid'], 'timestamp': ['train_timestamp', 'val_timestamp', 'test_timestamp']}, test_sizes=[1, 1], random_state=None, del_after_split=True, **kwargs)[source]

Preprocesses the dataset for recommender systems: - Loads and filters ratings - Densifies user/item indices - Converts data into sequence format - Splits into train/val/test

Parameters:

name (str) – Name of the dataset.
data_folder (str) – Base folder path where raw data resides.
min_rating (float, optional) – Minimum rating to retain.
min_items_per_user (int) – Min number of items per user.
min_users_per_item (int) – Min number of users per item.
densify_index (bool) – Whether to remap user/item IDs to 0-based dense indices.
split_method (str) – Data split strategy (e.g., “leave_n_out”).
split_keys (Dict) – Keys to split and their resulting keys.
test_sizes (List[int]) – Size of test/validation split.
random_state (int, optional) – Seed for reproducibility.
del_after_split (bool) – Delete original keys after splitting.

Returns:

Processed dataset and mappings (e.g., user/item index mappings).

Return type:

Tuple[Dict, Dict]

easy_rec.data_generation_utils.maybe_preprocess_raw_dataset(dataset_raw_folder, dataset_name)[source]

Checks if preprocessed CSV data exists in the raw folder. If not, runs the specific preprocessing routine.

Parameters:

dataset_raw_folder (str) – Path to the raw dataset folder.
dataset_name (str) – Name of the dataset.

Returns:

None

Return type:

None

easy_rec.data_generation_utils.map_dataset_name()[source]

Returns a dictionary mapping dataset names to their primary data file.

Returns:: Dataset name to file name mapping.
Return type:: Dict[str, str]

easy_rec.data_generation_utils.get_rating_files_per_dataset(dataset_name)[source]

Gets the path or URL to the rating file associated with the dataset.

Parameters:: dataset_name (str) – Name of the dataset.
Returns:: Path or URL to the dataset’s rating file.
Return type:: str
Raises:: ValueError – If the dataset is not recognized.

easy_rec.data_generation_utils.specific_preprocess(dataset_raw_folder, dataset_name)[source]

Performs dataset-specific preprocessing and stores a standardized CSV file.

Parameters:

dataset_raw_folder (str) – Path to the raw dataset folder.
dataset_name (str) – Name of the dataset (e.g., ‘steam’, ‘amazon_books’, ‘behance’, etc.).

Raises:

NotImplementedError – If the dataset name is unknown or not supported.

Return type:

None

easy_rec.data_generation_utils.load_ratings_df(dataset_raw_folder, dataset_name)[source]

Loads the ratings DataFrame for the specified dataset.

Parameters:

dataset_raw_folder (str) – Path to the raw dataset folder.
dataset_name (str) – Name of the dataset.

Returns:

Ratings DataFrame with columns [‘uid’, ‘sid’, ‘rating’, ‘timestamp’].

Return type:

pd.DataFrame

easy_rec.data_generation_utils.filter_ratings(df, min_rating)[source]

Parameters:

df (DataFrame)
min_rating (float)

Return type:

DataFrame

easy_rec.data_generation_utils.filter_by_frequence(df, min_items_per_user, min_users_per_item)[source]

Parameters:

df (DataFrame)
min_items_per_user (int)
min_users_per_item (int)

Return type:

DataFrame

easy_rec.data_generation_utils.densify_index_method(df, vars=['uid', 'sid'])[source]

Parameters:: df (DataFrame)

easy_rec.data_generation_utils.df_to_sequences(df, keep_vars=['uid'], seq_vars=['sid', 'rating', 'timestamp'], user_var='uid', time_var='timestamp')[source]

Parameters:: df (DataFrame)
Return type:: dict

easy_rec.data_generation_utils.print_stats(complete_set, keep_time)[source]

Parameters:

complete_set (dict)
keep_time (bool)

easy_rec.data_generation_utils.split_rec_data(data, split_method, split_keys, test_sizes, **kwargs)[source]

Parameters:

data (dict)
split_method (str)
split_keys (dict)
test_sizes (list)

Return type:

dict

easy_rec.data_generation_utils.get_max_number_of(maps, key)[source]

losses

class easy_rec.losses.SequentialBCEWithLogitsLoss(*args, **kwargs)[source]

Bases: BCEWithLogitsLoss

Custom loss function for sequential binary classification tasks that extends PyTorch’s BCEWithLogitsLoss and ignores NaN values in the target tensor in the loss calculation.

Inherits:: torch.nn.BCEWithLogitsLoss

forward(input, target)[source]

Computes the binary cross-entropy loss with logits, ignoring any targets that are NaN.

Parameters:

input (Tensor) – Predicted logits.
target (Tensor) – Target tensor of the same shape as input. NaN values are ignored.

Returns:

The computed scalar loss, averaged over non-NaN elements.

Return type:

Tensor

class easy_rec.losses.SequentialBPR(clamp_max=20, *args, **kwargs)[source]

Bases: Module

Sequential version of the Bayesian Personalized Ranking (BPR) loss for recommendation tasks over sequences that encourages the model to rank positive items higher than negative items within the same timestep.

Parameters:: clamp_max (float, optional) – Maximum value for clamping the logit differences to prevent numerical instability.

forward(input, target)[source]

Computes the Sequential BPR loss, by computing pairwise BPR loss between positive and negative items within each timestep.

Parameters:

input (Tensor) – Predicted item scores of shape (batch_size, timesteps, num_items). Contains the model’s predictions for each item at each timestep.
target (Tensor) – Target relevance tensor of shape (batch_size, timesteps, num_items). Binary relevance scores where 1 indicates positive items, 0 indicates negative items, and NaN values are ignored.

Returns:

Scalar BPR loss averaged over all valid timesteps with both positive: and negative items present.

Return type:

Tensor

class easy_rec.losses.SequentialCrossEntropyLoss(*args, **kwargs)[source]

Bases: CrossEntropyLoss

Custom cross-entropy loss function for sequential classification tasks, to handle sequences where some targets might be missing (represented as NaN). It applies the loss only to valid (non-NaN) target positions.

Inherits:: torch.nn.CrossEntropyLoss

forward(input, target)[source]

Computes the cross-entropy loss for sequential data, filtering out timesteps where all target values are NaN, then computes the standard cross-entropy loss on the remaining valid timesteps. NaN values within valid timesteps are set to 0 before loss computation.

Parameters:

input (Tensor) – Predicted logits of shape (batch_size, timesteps, num_items). Contains the model’s predictions for each item at each timestep.
target (Tensor) – Target tensor of shape (batch_size, timesteps, num_items). Target probabilities or class indices.

Returns:

The computed cross-entropy loss, averaged over valid timesteps.

Return type:

Tensor

class easy_rec.losses.SequentialGeneralizedBCEWithLogitsLoss(beta, eps=1e-06, *args, **kwargs)[source]

Bases: SequentialBCEWithLogitsLoss

Generalized Binary Cross-Entropy loss with logits for sequential data that applies different treatments to positive and negative samples based on a beta parameter.

Inherits NaN handling capabilities from SequentialBCEWithLogitsLoss.

Parameters:

beta (float) – Beta parameter controlling the gamma transformation strength. When beta = 0, only negative samples are used. When beta > 0, positive samples undergo gamma transformation.
eps (float, optional) – Small epsilon value to prevent numerical instability in the gamma transformation.

forward(input, target)[source]

Computes the generalized binary cross-entropy loss with logits.

Parameters:

input (Tensor) – Predicted logits.
target (Tensor) – Target tensor of the same shape as input. Values > 0.5 are considered positive samples. NaN values are ignored.

Returns:

The computed scalar loss.

Return type:

Tensor

gamma_transformation(scores)[source]

Applies gamma transformation to input scores, that adjusts the contribution of positive samples to the loss based on the beta parameter,

Parameters:: scores (Tensor) – Input logits to transform.
Returns:: Transformed logits of the same shape as input.
Return type:: Tensor

metrics

easy_rec.metrics.prepare_rank_corrections(metrics_info, num_negatives=None, num_items=None, put_uncorrected=True, split_keys={'test': 1, 'train': 1, 'val': 2})[source]

Prepares a structured metrics configuration with rank correction functions for recommendation evaluation metrics.

Parameters:

metrics_info (dict or list) – Configuration for metrics to compute.
num_negatives (int, dict, optional) – Number of negative samples used during evaluation.
num_items (int, dict, optional) – Total number of items in the catalog.
put_uncorrected (bool, dict, optional) – Whether to include uncorrected metrics.
split_keys (dict, optional) – Configuration of data splits and number of dataloaders per split. Format: {split_name: num_dataloaders}.

Returns:

Nested dictionary, where:

Outer keys are split names (e.g., “train”, “val”, “test”).
Each value is a list of dictionaries, one per dataloader.
Each metric can include a rank_corrections dictionary containing:
- ””: identity function (no correction)
- ”corrected”: correction function that multiplies scores by num_items / num_negatives

Return type:

dict

Raises:

NotImplementedError – If metrics_info is neither a list nor a dict.

class easy_rec.metrics.RecMetric(top_k=[5, 10, 20], batch_metric=False, rank_corrections={'': <function RecMetric.<lambda>>})[source]

Bases: Metric

Base class for recommendation system metrics with support for top-k evaluation and rank corrections

Parameters:

top_k (list) – List of integers representing top-k values for evaluation.
batch_metric (bool) – Whether to compute metrics on batch level or not.
rank_corrections (dict, optional) – Dictionary mapping correction names to correction functions.

compute()[source]

Computes and returns the metric values.

Returns:: Dictionary containing metric values for each combination of top-k and rank correction.
Return type:: dict

not_nan_subset(**kwargs)[source]

Subsets input tensors where the ‘relevance’ tensor is not NaN.

Returns:: Subset of input tensors where ‘relevance’ is not NaN.
Return type:: dict

class easy_rec.metrics.RLS_Jaccard(rbo_p=0.9, *args, **kwargs)[source]

Bases: RecMetric

Jaccard similarity-based metric for evaluating the overlap between the top-k items of two ranked score tensors. This metric is used in recommendation systems to assess how much agreement there is between two sets of rankings, at different top-k thresholds.

Parameters:

rbo_p (float) – A persistence parameter.
args – Positional arguments passed to the base RecMetric.

update(scores, other_scores, relevance)[source]

Updates the metric values based on the input scores and relevance tensors.

Parameters:

scores (torch.Tensor) – Tensor containing prediction scores.
other_scores (torch.Tensor) – Tensor containing other prediction scores to compare against.
relevance (torch.Tensor) – Tensor containing relevance values.

class easy_rec.metrics.RLS_RBO(rbo_p=0.9, *args, **kwargs)[source]

Bases: RecMetric

Computes the Ranked List Similarity (RBO) between two ranked score tensors, placing greater weight on agreement at higher ranks.

Parameters:: rbo_p (float) – Persistence parameter controlling the top-heaviness of the RBO computation. Must be in the range (0, 1). Higher values emphasize agreement at higher ranks.

update(scores, other_scores, relevance)[source]

Updates the metric values based on the input scores and relevance tensors.

Parameters:

scores (torch.Tensor) – Tensor containing prediction scores.
other_scores (torch.Tensor) – Tensor containing other prediction scores to compare against.
relevance (torch.Tensor) – Tensor containing relevance values.

class easy_rec.metrics.RLS_FRBO(rbo_p=0.9, *args, **kwargs)[source]

Bases: RecMetric

Computes the Finite Ranked Biased Overlap (FRBO) between two ranked lists of scores. FRBO is a normalized variant of Ranked Biased Overlap (RBO) that limits computation to a finite depth top_k, making it more appropriate for practical use cases where only the top portion of rankings matters.

Parameters:: rbo_p (float) – Persistence parameter controlling the top-heaviness of the FRBO computation. Must be in the range (0, 1). Higher values emphasize agreement at higher ranks.

update(scores, other_scores, relevance)[source]

Updates the metric values based on the input scores and relevance tensors.

Parameters:

scores (torch.Tensor) – Tensor containing prediction scores.
other_scores (torch.Tensor) – Tensor containing other prediction scores to compare against.
relevance (torch.Tensor) – Tensor containing relevance values.

class easy_rec.metrics.NDCG(*args, **kwargs)[source]

Bases: RecMetric

Normalized Discounted Cumulative Gain (NDCG) assesses the performance of a ranking system by considering the placement of K relevant items within the ranked list. The underlying principle is that items higher in the ranking should receive a higher score than those positioned lower in the list because they are those where a user’s attention is usually focused.

compute()[source]

Computes and returns the metric values.

Returns:: Dictionary containing metric values for each combination of top-k and rank correction.
Return type:: dict

update(scores, relevance)[source]

Updates the metric values based on the input scores and relevance tensors.

Parameters:

scores (torch.Tensor) – Tensor containing prediction scores.
relevance (torch.Tensor) – Tensor containing relevance values.

class easy_rec.metrics.MRR(*args, **kwargs)[source]

Bases: RecMetric

Mean Reciprocal Rank (MRR) evaluates the efficacy of a ranking system by considering the placement of the first relevant item within the ranked list. It is calculated by taking the reciprocal of the rank of the first relevant item. It emphasizes that the position of the first relevant item is more important than the placement of the other relevant items.

update(scores, relevance)[source]

Updates the metric values based on the input scores and relevance tensors.

Parameters:

scores (torch.Tensor) – Tensor containing prediction scores.
relevance (torch.Tensor) – Tensor containing relevance values.

class easy_rec.metrics.Precision(*args, **kwargs)[source]

Bases: RecMetric

It computes the proportion of accurately identified relevant items among all the items recommended within a list of length K. It is used to explicitly count the number of recommended, or retrieved, items that are truly relevant.

update(scores, relevance)[source]

Updates the metric values based on the input scores and relevance tensors.

Parameters:

scores (torch.Tensor) – Tensor containing prediction scores.
relevance (torch.Tensor) – Tensor containing relevance values.

class easy_rec.metrics.Recall(*args, **kwargs)[source]

Bases: RecMetric

It assesses the fraction of correctly identified relevant items among the top K recommendations, relative to the total number of relevant items in the dataset. It measures the effectiveness of the method in capturing relevant items among all of those present in the dataset.

update(scores, relevance)[source]

Updates the metric values based on the input scores and relevance tensors.

Parameters:

scores (torch.Tensor) – Tensor containing prediction scores.
relevance (torch.Tensor) – Tensor containing relevance values.

class easy_rec.metrics.F1(*args, **kwargs)[source]

Bases: RecMetric

The F1 score is the harmonic mean of precision and recall. It is a single metric that combines both precision and recall to provide a single measure of the quality of a ranking system.

update(scores, relevance)[source]

Updates internal Precision and Recall metrics based on the input scores and relevance.

Parameters:

scores (torch.Tensor) – Tensor containing prediction scores.
relevance (torch.Tensor) – Tensor containing relevance values.

compute()[source]

Computes and returns the metric values.

Returns:: Dictionary containing metric values for each combination of top-k and rank correction.
Return type:: dict

class easy_rec.metrics.PrecisionWithRelevance(*args, **kwargs)[source]

Bases: RecMetric

It computes the proportion of accurately identified relevant items among all the items recommended within a list of length K. It is used to explicitly count the number of recommended, or retrieved, items that are truly relevant.

update(scores, relevance)[source]

Updates the internal metric state using the provided prediction scores and relevance labels.

Parameters:

scores (torch.Tensor) – Tensor containing prediction scores.
relevance (torch.Tensor) – Tensor containing relevance values.

class easy_rec.metrics.MAP(*args, **kwargs)[source]

Bases: RecMetric

Mean Average Precision (MAP) evaluates the efficacy of a ranking system by considering the average precision across the top R recommendations for R ranging from 1 to K. It emphasizes that precision values for items within the top K positions contribute to the overall assessment also accounting for the significance of the order in the ranking. Different from NDCG, this metric does not explicitly assign a different importance to different slots.

update(scores, relevance)[source]

Updates the internal precision metrics needed to compute MAP.

Args:
scores (torch.Tensor): Tensor containing prediction scores. relevance (torch.Tensor): Tensor containing relevance values.

Parameters:

scores (Tensor)
relevance (Tensor)

compute()[source]

Computes and returns the metric values.

Returns:: Dictionary containing metric values for each combination of top-k and rank correction.
Return type:: dict

reset()[source]: Resets all internal states of the MAP metric and its dependent precision tracker.

Easy_Rec package

Submodules

callbacks

data_generation_utils

losses

metrics

preparation

rec_torch

RecsysModels

BERT4Rec

CORE

Caser

CosRec

GRU4Rec

HGN

NARM

NextItNet

POP

SASRec

Seq4Rec