Further, the feature amount of each frame of an image of a content for highlight detection of interest that is a content from which a highlight scene is to be detected is extracted, and the feature amount of each frame of the content for highlight detection of interest is subjected to clustering into one cluster of the plurality of clusters using the cluster information, thereby converting the tim