Non-orthogonal multiple access (NOMA) has been considered as a study-item in 3GPP for 5G new radio (NR). However, it was decided not to continue with it as a work-item, and to leave it for possible use in beyond 5G. In this paper, we first review the discussions that ended in such decision. Particularly, we present simulation comparisons between the Welch-bound equality spread multiple access (WSMA)-based NOMA and multi-user multiple-input-multiple-output (MU-MIMO), where the possible gain of WSMA-based NOMA, compared to MU-MIMO, is negligible. Then, we summarize the 3GPP discussions on NOMA, and propose a number of methods to reduce the implementation complexity and delay of both uplink (UL) and downlink (DL) NOMA-based transmission, as different ways to improve its efficiency. Here, particular attention is paid to reducing the receiver complexity, the cost of hybrid automatic repeat request as well as the user pairing complexity. As demonstrated, different smart techniques can be applied to improve the energy efficiency and the end-to-end transmission delay of NOMA-based systems.