The considerable growth in demands for wireless services have led to spectrum scarcity challenge. Cognitive radio came into practice to deal with the scarcity problem by granting cognitive users access to the licensed spectrum. However, this solution requires efficient power allocation strategies to guarantee QoS for cognitive system, reduce power consumption, and protect primary users from the cognitive users' interference impact. In this paper, we investigate the energy efficient power allocation problem for cognitive radio networks in underlay mode. We propose a novel approximated online Q-learning scheme for power allocation in which cognitive users learn with conjecture feature to select the most appropriate power level. The power allocation problem is formulated as an optimization problem with the goal to maximize energy efficiency under QoS and interference constraints. The scheme is evaluated using software defined radio testbed and simulations. The evaluation results demonstrate the scheme capability to guarantee SINR for both primary and cognitive systems and mitigate interference with minimum power consumption in comparison with other schemes.