If you could use some additional explanation of attention, this post is here to help.
This post explains why we sometimes get runtime errors when creating views of tensors.