A vision-based obstacle detection system is a key enabler for the development of autonomous robots and vehicles and intelligent transportation systems. This paper addresses the problem of urban scene monitoring and tracking of obstacles based on unsupervised, deep-learning approaches. Here, we design an innovative hybrid encoder that integrates deep Boltzmann machines (DBM) and auto-encoders (AE). This hybrid auto-encode (HAE) model combines the greedy learning features of DBM with the dimensionality reduction capacity of AE to accurately and reliably detect the presence of obstacles. We combine the proposed hybrid model with the one-class support vector machines (OCSVM) to visually monitor an urban scene. We also propose an efficient approach to estimating obstacles location and track their positions via scene densities. Specifically, we address obstacle detection as an anomaly detection problem. If an obstacle is detected by the OCSVM algorithm, then localization and tracking algorithm is executed. We validated the effectiveness of our approach by using experimental data from two publicly available dataset, the Malaga stereovision urban dataset (MSVUD) and the Daimler urban segmentation dataset (DUSD). Results show the capacity of the proposed approach to reliably detect obstacles.