# Gittins.py — README This script implements two bandit models used in the advisor experiment: the **infinite-horizon Gittins index** (single arm) and the **finite 20-round two-armed bandit** (optimal policy and value). It uses the same accuracy levels and priors as the experiment (Dots & Co. vs PixelHouse). --- ## What it does 1. **Infinite-horizon Gittins index** For a single arm (one advisor), it computes the **Gittins index** at each belief state: the retirement value λ such that you are indifferent between “retire and get λ every period forever” and “continue with this arm.” The script builds a heatmap of this index over a grid of (successes, failures) for the Dots prior. 2. **20-round finite-horizon DP** For the **two-armed bandit** (choose Dots or Pixel each round, 20 rounds total), it solves the Bellman equation via dynamic programming and reports the **optimal expected number of correct answers** from the start of the block. 3. **Figures** If matplotlib and seaborn are installed, it saves: - `gittins_map.png` — Gittins index for each (wins, losses) cell. - `information_bonus.png` — Gittins index minus myopic value (expected accuracy), i.e. “information bonus” from learning. --- ## Parameters (match the experiment) | Variable | Meaning | |----------------|--------| | `accuracies` | Possible advisor accuracies: 90%, 75%, 60%, 50%. | | `prior_dots` | Dots & Co. prior over those accuracies: (0.3, 0.3, 0.2, 0.2). | | `prior_pixel` | PixelHouse prior: (0.2, 0.2, 0.3, 0.3). | | `gamma` | Discount factor for the infinite-horizon problem (0.95). | | `T_MAX` | Number of rounds in the finite problem (20). | Beliefs are 4-dimensional probability vectors over the four accuracy levels. After \(s\) successes and \(f\) failures, the posterior is computed by Bayes’ rule (binomial likelihood for each accuracy). --- ## Main functions ### Belief and reward - **`get_mu(p)`** Expected immediate reward (probability of a correct prediction) under belief `p`: \(\mu = \sum_k p_k \cdot \text{accuracy}_k\). - **`update_p(p, s, f)`** Bayesian update of belief `p` after `s` successes and `f` failures (binomial likelihood; posterior is normalized). ### Infinite-horizon Gittins - **`get_gittins(p_init)`** Returns the Gittins index for belief `p_init`. - **Idea:** You can either “retire” and receive a constant λ per period (present value λ/(1−γ)), or “continue” with this arm: get μ this period and then the discounted continuation value. - **Bellman:** \(V(\lambda, p) = \max\bigl\{ \lambda/(1-\gamma),\ \mu(p) + \gamma\bigl[\mu(p)\,V(\lambda,p_{\text{succ}}) + (1-\mu(p))\,V(\lambda,p_{\text{fail}})\bigr]\bigr\}\). - **Implementation:** Binary search on λ so that the optimal action is indifference (value of continuing equals value of retiring). Memoization over (λ, rounded p) keeps runs fast. ### 20-round finite-horizon DP - **`get_dp_val_discrete(s1, f1, s2, f2, rounds_left)`** Optimal expected total correct answers from the current state to the end of the 20 rounds. - **State:** - \((s_1, f_1)\): successes and failures so far for arm 1 (Dots). - \((s_2, f_2)\): same for arm 2 (Pixel). - `rounds_left`: number of rounds remaining. - **Priors:** Reconstructed from these counts and the initial priors `prior_dots` and `prior_pixel`. - **Recurrence:** At each step, choose arm 1 or 2; reward is 1 if correct, 0 if wrong; no discounting (total undiscounted correct answers over 20 rounds). - **`get_dp_val(p1, p2, rounds_left)`** Wrapper used at **block start** (no observations yet): calls `get_dp_val_discrete(0, 0, 0, 0, rounds_left)`. --- ## Output When you run the script: 1. **Console:** - Gittins index at the Dots prior. - 20-round optimal expected correct (from start, both arms). - Message that figures were saved (if plotting is available). 2. **Files (if matplotlib/seaborn installed):** - `gittins_map.png` — Heatmap of Gittins index vs (wins, losses). - `information_bonus.png` — Heatmap of (Gittins − myopic value). If matplotlib or seaborn is missing, the script still computes and prints the two numbers and asks you to install them for figures. --- ## How to run From the `advisor_experiment` directory: ```bash python Gittins.py ``` **Dependencies:** - `numpy` (required). - `matplotlib` and `seaborn` (optional; only needed for the two PNG figures). Install with: ```bash pip install numpy matplotlib seaborn ``` (or use the same Python environment you use for the rest of the project). --- ## Relation to the experiment - **Gittins index:** Theoretical benchmark for the infinite-horizon “which advisor to use” problem; the heatmaps show how the value of an arm (and the information bonus) changes as you observe more outcomes. - **20-round DP:** Matches the **block length** in the experiment (20 rounds per block). The reported value is the expected number of correct answers under the **optimal** switching policy between the two advisors (Dots vs Pixel) over one block.