We consider the problem of inferring a layered representation, its depth ordering and motion segmentation from video in which objects may undergo 3D non-planar motion relative to the camera. We generalize layered inference to that case and corresponding self-occlusion phenomena. We accomplish this by introducing a flattened 3D object representation, which is a compact representation of an object that contains all visible portions of the object seen in the video, including parts of an object that are self-occluded (as well as occluded) in one frame but seen in another. We formulate the inference of such flattened representations and motion segmentation, and derive an optimization scheme. We also introduce a new depth ordering scheme, which is independent of layered inference and addresses the case of self-occlusion. It requires little computation given the flattened representations. Experiments on benchmark datasets show the advantage of our method over existing layered methods, which do not model 3D motion and self-occlusion.