A new study shows deep learning models must be carefully tested across multiple environments before being put into clinical practice.