Easy_Rec package
Submodules
callbacks
- class easy_rec.callbacks.DynamicNegatives(dataloader, neg_key='out_sid', id_key='uid', padding_idx=0)[source]
Bases:
CallbackPyTorch Lightning callback that dynamically updates a buffer of hard negatives for each user based on model predictions during training.
- Parameters:
- init_vars()[source]
Initializes or resets internal tracking variables used to collect predictions and sampled negatives to prepare for the next epoch’s data collection.
- on_train_batch_end(trainer, pl_module, model_outputs, batch_input, batch_idx)[source]
Collects model predictions and sampled negatives at the end of each training batch.
- Parameters:
trainer (Trainer) – The PyTorch Lightning trainer.
pl_module (LightningModule) – The model being trained.
model_outputs (dict) – Output dictionary from the model’s forward pass. Must include “model_output”.
batch_input (dict) – The batch data input to the model, typically from the dataloader.
batch_idx (int) – Index of the current batch.
data_generation_utils
- easy_rec.data_generation_utils.download_dataset(dataset_name, dataset_raw_folder, additional_file_name=None)[source]
Downloads the requested dataset from predefined sources (e.g., HuggingFace, GroupLens, Amazon, etc.).
- easy_rec.data_generation_utils.preprocess_dataset(name, data_folder='../data/raw', min_rating=None, min_items_per_user=0, min_users_per_item=0, densify_index=True, split_method='leave_n_out', split_keys={'rating': ['train_rating', 'val_rating', 'test_rating'], 'sid': ['train_sid', 'val_sid', 'test_sid'], 'timestamp': ['train_timestamp', 'val_timestamp', 'test_timestamp']}, test_sizes=[1, 1], random_state=None, del_after_split=True, **kwargs)[source]
Preprocesses the dataset for recommender systems: - Loads and filters ratings - Densifies user/item indices - Converts data into sequence format - Splits into train/val/test
- Parameters:
name (str) – Name of the dataset.
data_folder (str) – Base folder path where raw data resides.
min_rating (float, optional) – Minimum rating to retain.
min_items_per_user (int) – Min number of items per user.
min_users_per_item (int) – Min number of users per item.
densify_index (bool) – Whether to remap user/item IDs to 0-based dense indices.
split_method (str) – Data split strategy (e.g., “leave_n_out”).
split_keys (Dict) – Keys to split and their resulting keys.
test_sizes (List[int]) – Size of test/validation split.
random_state (int, optional) – Seed for reproducibility.
del_after_split (bool) – Delete original keys after splitting.
- Returns:
Processed dataset and mappings (e.g., user/item index mappings).
- Return type:
Tuple[Dict, Dict]
- easy_rec.data_generation_utils.maybe_preprocess_raw_dataset(dataset_raw_folder, dataset_name)[source]
Checks if preprocessed CSV data exists in the raw folder. If not, runs the specific preprocessing routine.
- easy_rec.data_generation_utils.map_dataset_name()[source]
Returns a dictionary mapping dataset names to their primary data file.
- easy_rec.data_generation_utils.get_rating_files_per_dataset(dataset_name)[source]
Gets the path or URL to the rating file associated with the dataset.
- Parameters:
dataset_name (str) – Name of the dataset.
- Returns:
Path or URL to the dataset’s rating file.
- Return type:
- Raises:
ValueError – If the dataset is not recognized.
- easy_rec.data_generation_utils.specific_preprocess(dataset_raw_folder, dataset_name)[source]
Performs dataset-specific preprocessing and stores a standardized CSV file.
- Parameters:
- Raises:
NotImplementedError – If the dataset name is unknown or not supported.
- Return type:
None
- easy_rec.data_generation_utils.load_ratings_df(dataset_raw_folder, dataset_name)[source]
Loads the ratings DataFrame for the specified dataset.
- easy_rec.data_generation_utils.filter_ratings(df, min_rating)[source]
- Parameters:
df (DataFrame)
min_rating (float)
- Return type:
DataFrame
- easy_rec.data_generation_utils.filter_by_frequence(df, min_items_per_user, min_users_per_item)[source]
- easy_rec.data_generation_utils.densify_index_method(df, vars=['uid', 'sid'])[source]
- Parameters:
df (DataFrame)
- easy_rec.data_generation_utils.df_to_sequences(df, keep_vars=['uid'], seq_vars=['sid', 'rating', 'timestamp'], user_var='uid', time_var='timestamp')[source]
- Parameters:
df (DataFrame)
- Return type:
losses
- class easy_rec.losses.SequentialBCEWithLogitsLoss(*args, **kwargs)[source]
Bases:
BCEWithLogitsLossCustom loss function for sequential binary classification tasks that extends PyTorch’s BCEWithLogitsLoss and ignores NaN values in the target tensor in the loss calculation.
- Inherits:
torch.nn.BCEWithLogitsLoss
- forward(input, target)[source]
Computes the binary cross-entropy loss with logits, ignoring any targets that are NaN.
- Parameters:
input (Tensor) – Predicted logits.
target (Tensor) – Target tensor of the same shape as input. NaN values are ignored.
- Returns:
The computed scalar loss, averaged over non-NaN elements.
- Return type:
Tensor
- class easy_rec.losses.SequentialBPR(clamp_max=20, *args, **kwargs)[source]
Bases:
ModuleSequential version of the Bayesian Personalized Ranking (BPR) loss for recommendation tasks over sequences that encourages the model to rank positive items higher than negative items within the same timestep.
- Parameters:
clamp_max (float, optional) – Maximum value for clamping the logit differences to prevent numerical instability.
- forward(input, target)[source]
Computes the Sequential BPR loss, by computing pairwise BPR loss between positive and negative items within each timestep.
- Parameters:
input (Tensor) – Predicted item scores of shape (batch_size, timesteps, num_items). Contains the model’s predictions for each item at each timestep.
target (Tensor) – Target relevance tensor of shape (batch_size, timesteps, num_items). Binary relevance scores where 1 indicates positive items, 0 indicates negative items, and NaN values are ignored.
- Returns:
- Scalar BPR loss averaged over all valid timesteps with both positive
and negative items present.
- Return type:
Tensor
- class easy_rec.losses.SequentialCrossEntropyLoss(*args, **kwargs)[source]
Bases:
CrossEntropyLossCustom cross-entropy loss function for sequential classification tasks, to handle sequences where some targets might be missing (represented as NaN). It applies the loss only to valid (non-NaN) target positions.
- Inherits:
torch.nn.CrossEntropyLoss
- forward(input, target)[source]
Computes the cross-entropy loss for sequential data, filtering out timesteps where all target values are NaN, then computes the standard cross-entropy loss on the remaining valid timesteps. NaN values within valid timesteps are set to 0 before loss computation.
- Parameters:
input (Tensor) – Predicted logits of shape (batch_size, timesteps, num_items). Contains the model’s predictions for each item at each timestep.
target (Tensor) – Target tensor of shape (batch_size, timesteps, num_items). Target probabilities or class indices.
- Returns:
The computed cross-entropy loss, averaged over valid timesteps.
- Return type:
Tensor
- class easy_rec.losses.SequentialGeneralizedBCEWithLogitsLoss(beta, eps=1e-06, *args, **kwargs)[source]
Bases:
SequentialBCEWithLogitsLossGeneralized Binary Cross-Entropy loss with logits for sequential data that applies different treatments to positive and negative samples based on a beta parameter.
Inherits NaN handling capabilities from SequentialBCEWithLogitsLoss.
- Parameters:
- forward(input, target)[source]
Computes the generalized binary cross-entropy loss with logits.
- Parameters:
input (Tensor) – Predicted logits.
target (Tensor) – Target tensor of the same shape as input. Values > 0.5 are considered positive samples. NaN values are ignored.
- Returns:
The computed scalar loss.
- Return type:
Tensor
- gamma_transformation(scores)[source]
Applies gamma transformation to input scores, that adjusts the contribution of positive samples to the loss based on the beta parameter,
- Parameters:
scores (Tensor) – Input logits to transform.
- Returns:
Transformed logits of the same shape as input.
- Return type:
Tensor
metrics
- easy_rec.metrics.prepare_rank_corrections(metrics_info, num_negatives=None, num_items=None, put_uncorrected=True, split_keys={'test': 1, 'train': 1, 'val': 2})[source]
Prepares a structured metrics configuration with rank correction functions for recommendation evaluation metrics.
- Parameters:
metrics_info (dict or list) – Configuration for metrics to compute.
num_negatives (int, dict, optional) – Number of negative samples used during evaluation.
num_items (int, dict, optional) – Total number of items in the catalog.
put_uncorrected (bool, dict, optional) – Whether to include uncorrected metrics.
split_keys (dict, optional) – Configuration of data splits and number of dataloaders per split. Format: {split_name: num_dataloaders}.
- Returns:
- Nested dictionary, where:
Outer keys are split names (e.g., “train”, “val”, “test”).
Each value is a list of dictionaries, one per dataloader.
- Each metric can include a rank_corrections dictionary containing:
””: identity function (no correction)
”corrected”: correction function that multiplies scores by num_items / num_negatives
- Return type:
- Raises:
NotImplementedError – If metrics_info is neither a list nor a dict.
- class easy_rec.metrics.RecMetric(top_k=[5, 10, 20], batch_metric=False, rank_corrections={'': <function RecMetric.<lambda>>})[source]
Bases:
MetricBase class for recommendation system metrics with support for top-k evaluation and rank corrections
- Parameters:
- class easy_rec.metrics.RLS_Jaccard(rbo_p=0.9, *args, **kwargs)[source]
Bases:
RecMetricJaccard similarity-based metric for evaluating the overlap between the top-k items of two ranked score tensors. This metric is used in recommendation systems to assess how much agreement there is between two sets of rankings, at different top-k thresholds.
- Parameters:
rbo_p (float) – A persistence parameter.
args – Positional arguments passed to the base RecMetric.
- update(scores, other_scores, relevance)[source]
Updates the metric values based on the input scores and relevance tensors.
- Parameters:
scores (torch.Tensor) – Tensor containing prediction scores.
other_scores (torch.Tensor) – Tensor containing other prediction scores to compare against.
relevance (torch.Tensor) – Tensor containing relevance values.
- class easy_rec.metrics.RLS_RBO(rbo_p=0.9, *args, **kwargs)[source]
Bases:
RecMetricComputes the Ranked List Similarity (RBO) between two ranked score tensors, placing greater weight on agreement at higher ranks.
- Parameters:
rbo_p (float) – Persistence parameter controlling the top-heaviness of the RBO computation. Must be in the range (0, 1). Higher values emphasize agreement at higher ranks.
- update(scores, other_scores, relevance)[source]
Updates the metric values based on the input scores and relevance tensors.
- Parameters:
scores (torch.Tensor) – Tensor containing prediction scores.
other_scores (torch.Tensor) – Tensor containing other prediction scores to compare against.
relevance (torch.Tensor) – Tensor containing relevance values.
- class easy_rec.metrics.RLS_FRBO(rbo_p=0.9, *args, **kwargs)[source]
Bases:
RecMetricComputes the Finite Ranked Biased Overlap (FRBO) between two ranked lists of scores. FRBO is a normalized variant of Ranked Biased Overlap (RBO) that limits computation to a finite depth top_k, making it more appropriate for practical use cases where only the top portion of rankings matters.
- Parameters:
rbo_p (float) – Persistence parameter controlling the top-heaviness of the FRBO computation. Must be in the range (0, 1). Higher values emphasize agreement at higher ranks.
- update(scores, other_scores, relevance)[source]
Updates the metric values based on the input scores and relevance tensors.
- Parameters:
scores (torch.Tensor) – Tensor containing prediction scores.
other_scores (torch.Tensor) – Tensor containing other prediction scores to compare against.
relevance (torch.Tensor) – Tensor containing relevance values.
- class easy_rec.metrics.NDCG(*args, **kwargs)[source]
Bases:
RecMetricNormalized Discounted Cumulative Gain (NDCG) assesses the performance of a ranking system by considering the placement of K relevant items within the ranked list. The underlying principle is that items higher in the ranking should receive a higher score than those positioned lower in the list because they are those where a user’s attention is usually focused.
- class easy_rec.metrics.MRR(*args, **kwargs)[source]
Bases:
RecMetricMean Reciprocal Rank (MRR) evaluates the efficacy of a ranking system by considering the placement of the first relevant item within the ranked list. It is calculated by taking the reciprocal of the rank of the first relevant item. It emphasizes that the position of the first relevant item is more important than the placement of the other relevant items.
- class easy_rec.metrics.Precision(*args, **kwargs)[source]
Bases:
RecMetricIt computes the proportion of accurately identified relevant items among all the items recommended within a list of length K. It is used to explicitly count the number of recommended, or retrieved, items that are truly relevant.
- class easy_rec.metrics.Recall(*args, **kwargs)[source]
Bases:
RecMetricIt assesses the fraction of correctly identified relevant items among the top K recommendations, relative to the total number of relevant items in the dataset. It measures the effectiveness of the method in capturing relevant items among all of those present in the dataset.
- class easy_rec.metrics.F1(*args, **kwargs)[source]
Bases:
RecMetricThe F1 score is the harmonic mean of precision and recall. It is a single metric that combines both precision and recall to provide a single measure of the quality of a ranking system.
- class easy_rec.metrics.PrecisionWithRelevance(*args, **kwargs)[source]
Bases:
RecMetricIt computes the proportion of accurately identified relevant items among all the items recommended within a list of length K. It is used to explicitly count the number of recommended, or retrieved, items that are truly relevant.
- class easy_rec.metrics.MAP(*args, **kwargs)[source]
Bases:
RecMetricMean Average Precision (MAP) evaluates the efficacy of a ranking system by considering the average precision across the top R recommendations for R ranging from 1 to K. It emphasizes that precision values for items within the top K positions contribute to the overall assessment also accounting for the significance of the order in the ranking. Different from NDCG, this metric does not explicitly assign a different importance to different slots.
- update(scores, relevance)[source]
Updates the internal precision metrics needed to compute MAP.
- Args:
scores (torch.Tensor): Tensor containing prediction scores. relevance (torch.Tensor): Tensor containing relevance values.
- Parameters:
scores (Tensor)
relevance (Tensor)