Activation Atlases
Activation atlases build on feature visualization, a technique for studying what the hidden layers of neural networks can represent. With activation atlases humans can discover unanticipated issues in neural networks — for example, places where the network is relying on spurious correlations to classify images, or where re-using a feature between two classes leads to strange bugs. Activation atlases worked better than we anticipated and seem to strongly suggest that neural network activations can be meaningful to humans.
Source: openai.com