Pada modul ini kita akan melakukan implementasi maximizing revenue menggunakan Thompson sampling.
Untuk kemudahan dibaca, penjelasan code terdapat dalam baris komentar.
#import library
import numpy as np
import matplotlib.pyplot as plt
import random
#setting parameter
N = 10000
d = 9
#create simulation
conversion_rates = [0.05,0.13,0.09,0.16,0.11,0.04,0.20,0.08,0.01]
X = np.array(np.zeros([N,d]))
for i in range(N):
for j in range(d):
if np.random.rand() <= conversion_rates[j]:
X[i,j] = 1
#Implementing a Random Strategy and Thompson Sampling
strategies_selected_rs = []
strategies_selected_ts = []
total_reward_rs = 0
total_reward_ts = 0
numbers_of_rewards_1 = [0] * d
numbers_of_rewards_0 = [0] * d
for n in range(0, N):
# Random Strategy
strategy_rs = random.randrange(d)
strategies_selected_rs.append(strategy_rs)
reward_rs = X[n, strategy_rs]
total_reward_rs = total_reward_rs + reward_rs
# Thompson Sampling
strategy_ts = 0
max_random = 0
for i in range(0, d):
random_beta = random.betavariate(numbers_of_rewards_1[i] + 1,
numbers_of_rewards_0[i] + 1)
if random_beta > max_random:
max_random = random_beta
strategy_ts = i
reward_ts = X[n, strategy_ts]
if reward_ts == 1:
numbers_of_rewards_1[strategy_ts] = numbers_of_rewards_1[strategy_ts] + 1
else:
numbers_of_rewards_0[strategy_ts] = numbers_of_rewards_0[strategy_ts] + 1
strategies_selected_ts.append(strategy_ts)
total_reward_ts = total_reward_ts + reward_ts
# Computing the Relative Return
relative_return = (total_reward_ts - total_reward_rs) / total_reward_rs * 100
print("Relative Return: {:.0f} %".format(relative_return))
# Plotting the Histogram of Selections
plt.hist(strategies_selected_ts)
plt.title('Histogram of Selections')
plt.xlabel('Strategy')
plt.ylabel('Number of times the strategy was selected')
plt.show()
Code diatas akan menampilkan nilai relative return dan hasil ploting menggunakan matplotlib.
Relative Return: 87 %
Sesuai ekspektasi, strategy ke 6, (yaitu strategi ke 7, karena pada histogram, index dimulai dari nol), adalah strategy dengan conversion rate paling tinggi.
Pada modul selanjutnya akan diperkenalkan konsep regret. Kode diatas akan dimodifikasi sedikit untuk melakukan perhitungan regret.
Kode diatas dapat di akses di Google Colab: https://colab.research.google.com/drive/16Vv78x68oQotMHe2i7xklp_ZzDIrBzk_?usp=sharing