23.6 C
24 Oktober 2021

An Improved Collaborative Filtering Using Multi-Armed Bandits Algorithm


Recommendation System is a system which processes several data as input and also produce data but it based on similarity of the context or user activity. Collaborative filtering is one technique which is common to use in Recommendation System. This technique evaluates some features from data but sometimes the result is not quite personal for user. To improve personal level for Collaborative Filtering, the process can be enhanced by implementing Multi-Armed Bandits Algorithm. The object of this research takes durations (in minutes), ratings, and genre from Netflix Movie dataset as the personal level parameters. It is focus on contextual-based recommendation and use Upper Confidence Bound (UCB) method for the Multi-Armed Bandits. The goal is to produce data which fulfill the personal level of user. For the standard of personal level is when users watch more than 75% of the movie’s duration, giving 4 starts above, and how many film which has same genre.


Recommendation System, Collaborative Filtering, Multi-Armed Bandits, Upper Confidence Bound, Personal Level

1. Preliminary

Nowadays, internet is really important thing. The internet itself can be media for any filed start from government, politics, culture, sport, and many others. Even more, people can easily access it so they can get so many information. But this cause a problem. People often confused during surfing the internet because there are a lot of information. Sometime, the information is not quite relevant to what user wants. Basically, every individual has their own personalization for the information they want to know.

To serve relevant information or maybe data, people need system which can give them recommendation based on their personalization. Recommendation system will be the answer. This system works in two ways, which are contextual-based and user-based. In this article, it will use Collaborative Filtering technique which is included in contextual-based recommendation system.

Even though collaborative filtering can produce relevant recommendation for user, but sometimes the accuracy is around 50% or maybe bellow. Multi-Armed bandit is algorithm which try to optimize the result of certain data using mathematical computation. In this article, it will implement Multi-Armed bandit algorithm to improve the accuracy of the Collaborative Filtering technique so that the recommendation system can produce data more relevant to user.

In previous research conducted interactive collaborative filtering using multi-armed bandit which use movie dataset and also implement clustering method(Wang et al., 2019). But in this research, it uses Upper Confidence Bound method within multi-armed bandit algorithm.

2. Formulations of Problem

  1. Sometimes output data is not quite similar to the goal or still have low personal level. Can Multi-Armed Bandit increase the personal level of the output data?
  2. Could Multi-Armed Bandit speed up the Collaborative Filtering process?

3. Boundaries of Problem

  1. User watch the video with no internet interruption.
  2. Data based on one user only.
  3. Using released movie data.

4. Literature Review

Based on previous research from (Wang et al., 2019), it said that there is a problem during process of collaborative filtering. The problem is it will face with limited information from user. Lack of information can cause low accuracy of recommendation result. Though, this problem can be solved by implementing another algorithm.

In this article, it focuses on implementing the collaborative filtering using multi-armed bandit. As the article from (Cañamares et al., 2019) says that multi armed bandit has so many benefit. Therefore, doing this research is to prove and create improved collaborative filtering. Multi-armed bandit will be implemented during the process of collaborative filtering itself. Additionally, it will evaluate more parameters and eventually it can determine the most relevant result.

Multi-armed bandit can make the parameters for evaluation process more dynamics. In the research from (Tekin & Turgay, 2018), it proves that multi armed bandit can make correspondence between two parameters and with lack of information in the beginning. The interesting thing about this algorithm is, it will do two processes before producing rewards or results. First one is exploration which will check every data and produce probability of reward. Then it will exploit the probability in order to optimize the result.

5. Method

The Research is using several methods to gain the result. For the beginning, this research collects data from Netflix Movie and TV Show dataset. But only the Movie data which will be the object. The dataset itself contains several information about its movie. There are some features which will be evaluated like movie genres, rating, movie duration, and an additional feature, watch duration. The addition feature is obtained from user side during watching certain movie.

Separating into two datasets which are training set and test set. Compute the watch duration parameters per movie. Then find the means and the standard deviation of each movie. After gaining duration standard deviation, move to the next parameter which is movie rating. For the rating, do as same as the watch duration. Find the means of the movies rating and its standard deviation as well.

When the both watch duration and rating standard deviation has been computed, create a diagram or graph to draw the standard deviation using Linear Equation. From the graph, we can determine the combination of those two standard deviations. This final standard deviation will be used for next computation using Upper Confidence Bound method.

Basically, Upper Confidence Bound balances between exploration process and exploitation process. Exploration is when the system tries available data and produce result or probability of each data. Then for the exploration, it is find the highest reward from the probability or exploration result. How UCB works is simple, it will take the higher means from data but before deciding to continue exploitation, UCB will check how many times the data has been explored. If the number of exploration is still low, UCB will decide to explore that data again in order to make sure whether the means are changed or not.

The final result of this method is called regret (). Collaborative Filtering will be successfully improved or has high personal level by this method if it has low regret value. To ensure the result, it will be tested with some people who likes to watch movie in Netflix.


Cañamares, R., Redondo, M., & Castells, P. (2019). Multi-armed recommender system bandit ensembles. RecSys 2019 – 13th ACM Conference on Recommender Systems, 432–436. https://doi.org/10.1145/3298689.3346984

Tekin, C., & Turgay, E. (2018). Multi-objective Contextual Multi-armed Bandit with a Dominant Objective. IEEE Transactions on Signal Processing, 66(14), 3799–3813. https://doi.org/10.1109/TSP.2018.2841822

Wang, Q., Zeng, C., Zhou, W., Li, T., Iyengar, S. S., Shwartz, L., & Grabarnik, G. Y. (2019). Online Interactive Collaborative Filtering Using Multi-Armed Bandit with Dependent Arms. IEEE Transactions on Knowledge and Data Engineering, 31(8), 1569–1580. https://doi.org/10.1109/TKDE.2018.2866041

Related posts