关键词: |
Afghanistan conflict, Computer science, Artificial intelligence, Computer programming, Equations, Improvised explosive devices, Logistics, Transportation, Vehicles, Algorithms, Warfare, Dynamic programming, Air force, Aircrafts, Operations research, Unmanned aerial vehicles, Markov models, Inventory control, Adp(approximate dynamic programming), Milirp(military inventory routing), Least squares temporal differences, Cuav(cargo unmanned aerial vehicles), Fob(forward operating bases), Sirp(stochastic inventory routing problem) |
摘要: |
A brigade combat team must resupply forward operating bases (FOBs) within its area of operations from a central location, mainly via ground convoy operations, in a way that closely resembles vendor managed inventory practices. Military logisticians routinely decide when and how much inventory to distribute to each FOB. Technology currently exists that makes utilizing cargo unmanned aerial vehicles (CUAVs) for resupply an attractive alternative due to the dangers of utilizing convoy operations. However, enemy actions, austere conditions, and inclement weather pose a significant risk to a CUAV's ability to safely deliver supplies to a FOB. We develop a Markov decision process model that allows for multiple supply classes to examine the military inventory routing problem, explicitly accounting for the possible loss of CUAVs during resupply operations. The large size of the motivating problem instance renders exact dynamic programming techniques computationally intractable. To overcome this challenge, we employ approximate dynamic programming (ADP) techniques to obtain high-quality resupply policies. We employ an approximate policy iteration algorithmic strategy that utilizes least squares temporal differencing for policy evaluation. We construct a representative problem instance based on an austere combat environment in order to demonstrate the efficacy of our model formulation and solution methodology. Because our ADP algorithm has many tunable features, we perform a robust, designed computational experiment to determine the ADP policy with the best quality of solutions. Results indicate utilizing least squares temporal differences with a first-order basis function is insufficient to approximate the value function when stochastic demand and penalty functions are implemented. |