Competitive Advantages:
Adaptable learning, Flexibility in realistic situations, Makes real time decisions.
System and methods for generating a trajectory of a dynamical system are described herein. An example method includes modelling a policy from demonstration data. The method also includes generating a first set of action particles by sampling from the policy, where each of the action particles in the first set includes a respective system action, predicting a respective outcome of the dynamical system in response to each of the action particles in the first set, and weighting each of the action particles in the first set according to a respective probability of achieving a desired outcome. The method further includes generating a second set of action particles by sampling from the weighted action particles, where each of the action particles in the second set includes a respective system action, and selecting a next system action in the trajectory of the dynamical system from the action particles in the second set.USF inventors have created a trajectory generation approach that learns a broad and stochastic policy from demonstrations. This approach first learns from the human demonstrations of the policies and is able to generate a trajectory with arbitrary length, and treats the change of the constraints naturally. Based on the magnitude of action and the desired future state, the dynamic system decides to keep, reject or execute samples. The approach weighs the action particles drawn from the policy by their respective likelihood of achieving a desired future outcome, and obtains the optimal action using the weighed actions.
Brochure