We propose a new biclustering method for binary data matrices using the maximum penalized Bernoulli likelihood estimation. Our method applies a multi-layer model defined on the logits of the success probabilities, where each layer represents a simple bicluster structure and the combination of multiple layers is able to reveal complicated, multiple biclusters. The method allows for non-pure biclusters, and can simultaneously identify the 1-prevalent blocks and 0-prevalent blocks. A computationally efficient algorithm is developed and guidelines are provided for specifying the tuning parameters, including initial values of model parameters, the number of layers, and the penalty parameters. Missing-data imputation can be handled in the EM framework. The method is tested using synthetic and real datasets and shows good performance. © 2013 Springer Science+Business Media New York.