|
Recommendation systems play a vital role in personalising content on e‐commerce and e‐reading platforms. Traditional methods, such as collaborative filtering and supervised learn‐ ing, often fail to capture users’ long‐term preferences or the sequential nature of interactions. Reinforcement learning (RL) provides a rule‐based framework for modelling recommendations as a sequential decision‐making process aimed at maximising user satisfaction. However, collecting online interaction data to train RL models is costly and risky, prompting the use of offline reinforcement learning (offline RL), which learns from static historical data. This study compares several offline RL algorithms, including Batch‐Constrained Q‐Learning (BCQ), Conservative Q‐Learning (CQL), and Implicit Q‐Learning (IQL), in the context of book recommendations. Experiments were conducted using the Goodreads dataset, which represents real‐world in‐ teractions between users and books. The results show that conservative approaches, such as CQL, achieve higher stability and better recommendation accuracy, as measured by NDCG@10 and Recall@10, than less‐constrained methods. The analysis also reveals a strong relationship between the dataset’s diversity and policy performance. The research highlights the poten‐ tial of offline RL as a promising avenue for developing data‐efficient and privacy‐preserving recommendation systems without the need for costly online exploration.
|