
[1912.09253v1] Towards a Philological Metric through a Topological Data Analysis Approach
The use of TDA techniques is a new research area which provides tools for comparing properties of point clouds in highdimensional spaces, and therefore, for comparing the datasets represented by such point clouds
The canon of the baroque Spanish literature has been thoroughly studied with philological techniques. The major representatives of the poetry of this epoch are Francisco de Quevedo and Luis de Góngora y Argote. They are commonly classified by the literary experts in two different streams: Quevedo belongs to the Conceptismo and Góngora to the Culteranismo. Besides, traditionally, even if Quevedo is considered the most representative of the Conceptismo, Lope de Vega is also considered to be, at least, closely related to this literary trend. In this paper, we use Topological Data Analysis techniques to provide a first approach to a metric distance between the literary style of these poets. As a consequence, we reach results that are under the literary experts’ criteria, locating the literary style of Lope de Vega, closer to the one of Quevedo than to the one of Góngora.
‹Figure 1: The skipgram neural network architecture. The input layer has as many neurons as the length of the onehot vector that encode the words of the corpus, i.e., the number of words that compose the vocabulary of the corpus, N in this case. The size of the projection layer is equal to the dimension in which we want to embed the corpus, M. Finally, the output layer has N · S neurons where S is the size of the window, i.e., the number of surrounding words that the model tries to predict. This image is inspired in the image of the skipgram model in [18]. (Background)Figure 10: The set of arrows represents the optimum bijection between the black and white points that belong, respectively, to two different persistence diagrams, which are shown overlaid here. (Topological data analysis)Figure 2: A twodimensional point cloud sampling a circumference. Figure 3: A twodimensional point cloud sampling a noisy circumference. Figure 4: A twodimensional point cloud sampling two circumferences. Figure 5: Three datasets: a circumference, a noisy circumference, and two circumferences. Figure 6: Persistence diagram of Figure ??. Figure 7: Persistence diagram of Figure ??. Figure 8: Persistence diagram of Figure ??. Figure 9: Two persistence diagrams of the VietorisRips filtration applied to a dataset of a random selection of points from a circumference and from two circumferences, respectively, with the H0 and H1 homology classes. We want to point out in Figure ?? there are two points corresponding to the two holes in H1. To appreciate the colors in the images, please visit the online version of the paper. (Topological data analysis)Figure 11: Boxplot showing the bottleneck distance results obtained from the sonnets of the three poets. (1) is the boxplot of the bottleneck distance obtained from the comparison between the sonnets of Quevedo and Lope, (2) is the boxplot of the bottleneck distance obtained from the comparison between the sonnets of Quevedo and Góngora, and (3) is the boxplot of the bottleneck distance obtained from the comparison between the sonnets of Lope de Vega and Góngora. From this boxplot, we can expect that the literary styles of Quevedo and Lope are significantly closer. (Results)›

