Reducing distributional uncertainty by mutual information maximisation and transferable feature learning