Contextual bandits are algorithms that decide which action to take (like displaying ads or recommending products) based on the current situation (context) to maximize rewards (clicks, sales). They learn over time what works best in each context, balancing new trials with proven options.