In this paper, we propose and study an energy-efficient trajectory optimization scheme for unmanned aerial vehicle (UAV) assisted Internet of Things (IoT) networks. In such networks, a single UAV is powered by both solar energy and charging stations (CSs), resulting in sustainable communication services, while avoiding energy outage. In particular, we optimize the trajectory design of UAV by jointly considering the average data rate, the total energy consumption, and the fairness of coverage for the IoT terminals. A dynamic spatial-temporal configuration scheme is operated for terminals working in the discontinuous reception (DRX) mode. The module-free, action-confined on-policy and off-policy reinforcement learning approaches are proposed and jointly applied to solve the formulated optimization problem in this paper. We evaluate the effectiveness of the proposed strategy by comparing it with other dynamic benchmark algorithms. The extensive simulation results provided in this paper reveal that the proposed scheme outperforms the benchmarks in terms of data transmission, energy efficiency and adaptivity of avoiding battery depletion. By deploying the proposed trajectory scheme, the UAV is able to adapt itself according to the temporal and dynamic conditions of communication networks.