{rfName}
Ly

Indexed in

License and use

Citations

1

Altmetrics

Analysis of institutional authors

Fresno V.Author

Share

February 13, 2025
Publications
>
Other
No

LyricSIM: A novel dataset and benchmark for similarity detection in Spanish song lyrics

Publicated to:Procesamiento De Lenguaje Natural. (71): 149-163 - 2023-09-01 (71), DOI: 10.26342/2023-71-12

Authors: Benito-Santos A; Ghajari A; Hernández P; Fresno V; Ros S; González-Blanco E

Affiliations

IE Universidad - Author
Universidad de Salamanca - Author
Universidad Nacional de Educación a Distancia - Author

Abstract

In this paper, we present a new dataset and benchmark tailored to the task of semantic similarity in song lyrics. Our dataset, originally consisting of 2775 pairs of Spanish songs, was annotated in a collective annotation experiment by 63 native annotators. After collecting and refining the data to ensure a high degree of consensus and data integrity, we obtained 676 high-quality annotated pairs that were used to evaluate the performance of various state-of-the-art monolingual and multilingual language models. Consequently, we established baseline results that we hope will be useful to the community in all future academic and industrial applications conducted in this context.

Keywords

Annotation taskBenchmarkCultural heritageDatasetSemantic textual similaritySong lyrics

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

The work has been published in the journal Procesamiento De Lenguaje Natural due to its progression and the good impact it has achieved in recent years, according to the agency Scopus (SJR), it has become a reference in its field. In the year of publication of the work, 2023, it was in position , thus managing to position itself as a Q1 (Primer Cuartil), in the category Linguistics and Language.

Impact and social visibility

From the perspective of influence or social adoption, and based on metrics associated with mentions and interactions provided by agencies specializing in calculating the so-called "Alternative or Social Metrics," we can highlight as of 2025-08-02:

  • The use of this contribution in bookmarks, code forks, additions to favorite lists for recurrent reading, as well as general views, indicates that someone is using the publication as a basis for their current work. This may be a notable indicator of future more formal and academic citations. This claim is supported by the result of the "Capture" indicator, which yields a total of: 7 (PlumX).