Digital breast tomosynthesis (DBT) is an emerging breast cancer screening and diagnostic modality that uses quasi-three-dimensional breast images to provide detailed assessments of the dense tissue within the breast. In this study, a framework of a 3D-Mask region-based convolutional neural network (3D-Mask RCNN) computer-aided diagnosis (CAD) system was developed for mass detection and segmentation with a comparative analysis of performance on patient subgroups with different clinicopathological characteristics. To this end, 364 samples of DBT data were used and separated into a training dataset (n = 201) and a testing dataset (n = 163). The detection and segmentation results were evaluated on the testing set and on subgroups of patients with different characteristics, including different age ranges, lesion sizes, histological types, lesion shapes and breast densities. The results of our 3D-Mask RCNN framework were compared with those of the 2D-Mask RCNN and Faster RCNN methods. For lesion-based mass detection, the sensitivity of 3D-Mask RCNN-based CAD was 90% with 0.8 false positives (FPs) per lesion, whereas the sensitivity of the 2D-Mask RCNN- and Faster RCNN-based CAD was 90% at 1.3 and 2.37 FPs/lesion, respectively. For breast-based mass detection, the 3D-Mask RCNN generated a sensitivity of 90% at 0.83 FPs/breast, and this framework is better than the 2D-Mask RCNN and Faster RCNN, which generated a sensitivity of 90% with 1.24 and 2.38 FPs/breast, respectively. Additionally, the 3D-Mask RCNN achieved significantly (p < 0.05) better performance than the 2D methods on subgroups of samples with characteristics of ages ranged from 40 to 49 years, malignant tumors, spiculate and irregular masses and dense breast, respectively. Lesion segmentation using the 3D-Mask RCNN achieved an average precision (AP) of 0.934 and a false negative rate (FNR) of 0.053, which are better than those achieved by the 2D methods. The results suggest that the 3D-Mask RCNN CAD framework has advantages over 2D-based mass detection on both the whole data and subgroups with different characteristics.