摘要: |
This study evaluates the performance of traffic control systems based on reinforcement learning (RL), also called approximate dynamic programming (ADP). Two algorithms have been selected for testing: 1) Q-learning and 2) approximate dynamic programming (ADP) with a post-decision state variable. The algorithms were tested in increasingly complex scenarios, from an oversaturated isolated intersection, to an arterial in undersaturated conditions, to a 2x5 network in both undersaturation and oversaturation, and finally to a 4x5 network in oversaturation with even and uneven directional demands. Potential benefits of these algorithms include signal systems that not only quickly respond to the actual conditions found in the field, but also learn about them and truly adapt through flexible cycle-free strategies. Moreover, these signal systems are decentralized, providing greater scalability and lower vulnerability at the network level. |