X-TAIL: eXtraction and eXploitation
of long-TAIL Knowledge with LLMs and KGs
2nd Workshop co-located with EKAW 2026, Universiy of Turin, Turin, Italy

About

Large Language Models (LLMs) encode world knowledge through pre-training on massive datasets,[1] making them the backbone of knowledge extraction applications. Their reliability degrades on long-tail knowledge: low-popularity knowledge that occurs infrequently in pre-training data.[2] Popularity is not neutral: pre-training datasets are predominantly web-crawled and, as such, are generalist, English-centric, and mostly produced over the past 30 years by Western, High-income, Educated, Liberal, Male-dominated (WHELM)[3] communities, raising the risk of models underperforming on specialized domains, non-English languages and non-contemporary times sources, and on knowledge belonging to marginalized social groups. Retrieval-Augmented Generation (RAG) has been proposed as a mitigation, but corpora used for retrieval may still be biased. Knowledge Graphs (KGs) provide a more transparent and deterministic alternative, yet open-domain KGs such as Wikidata exhibit coverage gaps along the same dimensions.[4] The X-TAIL workshop aims to advance research on extracting, exploiting, and ultimately preserving long-tail knowledge, blending the strengths of LLMs and KGs.

The previous edition saw great engagement by the public and insightful discussion on the challenges of dealing with long-tail knowledge. Papers were published in the Joint Proceedings of Posters, Demos, Workshops, and Tutorials of EKAW 2024. The invited speaker's talk was delivered by Jan-Christoph Kalo (University of Amsterdam), with the title What do Large Language Models know about the World?. The invited speaker's talk notes, accepted papers, and slides can be accessed and downloaded from the previous edition webpage.

[1] Petroni, Fabio, et al. "Language Models as Knowledge Bases?" In EMNLP-IJCNLP, Association for Computational Linguistics, 2019. https://doi.org/10.18653/v1/D19-1250

[2] Kandpal, Nikhil, et al. "Large Language Models Struggle to Learn Long-Tail Knowledge." Proceedings of the 40th International Conference on Machine Learning (Honolulu, Hawaii, USA), ICML’23, vol. 202 (July 2023): 15696–707. https://dl.acm.org/doi/10.5555/3618408.3619049

[3] Daryani, Yalda, et al. "The Homogenizing Engine: AI’s Role in Standardizing Culture and the Path to Policy." Policy Insights from the Behavioral and Brain Sciences 13, no. 1 (2026): 14–27. https://doi.org/10.1177/23727322251406591

[4] Kraft, Angelie, and Soulier, Eloïse. "Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AI." Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (New York, NY, USA), FAccT ‘24, June 5, 2024, 1433–45. https://doi.org/10.1145/3630106.3658981

Call for Papers

The link to the Call for Papers will be available soon.

The main topics of interest include:

  • Head and tail knowledge definition: operationalisation and computation of knowledge popularity
  • Knowledge Extraction from long-tail sources:
    • Tasks: Relation Extraction, Entity Linking, Knowledge Graph generation, Question Answering
    • Non-conventional source types, such as for example: domain-specific, multilingual, historical, related to marginalised social groups
  • Context-augmented methods optimised for long-tail knowledge, such as Retrieval-Augmented Generation (RAG) and Knowledge-Augmented Generation (KAG)
  • Knowledge Graph generation and completion for mitigating coverage gaps in existing knowledge bases
  • Computational methods for long-tail knowledge processing in specialised domains, such as Health, Law, Finance, Sustainability, Digital Humanities, Computational Literary Studies, Computational History, etc.
  • Impact of multilingualism on model performance on long-tail knowledge
  • Long-tail knowledge representation in ontologies and Knowledge Graphs (KGs)
  • Benchmarks that systematically address biases of both Large Language Models and Knowledge Graphs on knowledge related, for example, to specialised domains, to low-resourced languages and cultures or marginalised social groups, and historical data.
  • Error analysis of system performance stratified by knowledge popularity and its intersection with domain, language, social group, and time dimensions.
  • Negative results on methods developed to mitigate underperformance on long-tail knowledge
  • Studies of harms perpetuated by popularity-driven knowledge hierarchies learned by Large Language Models and reflected in Knowledge Graphs

Submission format and guidelines:

  • Papers must be submitted in PDF format according to the CEUR-WS template published in the CEUR-WS guidelines.
  • Long papers should be up to 8 pages, excluding references.
  • Short papers should be up to 5 pages, excluding references.
  • Workshop papers must be self-contained and in English.

Important Dates

All deadlines are to be considered 23:59 AoE.

  • Workshop Papers Submission Deadline: July 26, 2026
  • Workshop Papers Notification: August 31, 2026
  • Workshop Papers Camera Ready: September 20, 2026
  • Workshop Day: September 29, 2026

Organising Committee

Lia Draetta
Lia Draetta

University of Turin
Italy

Lia Draetta is a PhD student in the Computer Science Department at the University of Turin, Italy, with a background in linguistics. Her research focuses on under-representation and bias in data sources, and on how large language models handle rare entities and marginalised communities. She is also interested in knowledge graphs and methodologies to integrate structured knowledge into LLMs, with a focus on long-tail and underrepresented entities.
Arianna Graciotti
Arianna Graciotti

University of Groningen
Netherlands

Arianna Graciotti is a post-doctoral researcher in Natural Language Processing, Semantic Web and Digital Humanities at the University of Groningen, Netherlands. With a background in computational linguistics, her research focuses on investigating which and whose knowledge risks being lost in a world in which LLMs become central knowledge technologies, with the mission of developing resources and methods to advance equitable knowledge access and cultural heritage preservation across communities of knowers. She has co-organised SemDH 2025 and the X-TAIL workshop at EKAW 2024.
Enrico Daga
Enrico Daga

The Open University
United Kingdom

Enrico Daga is a Senior Research Fellow at the Knowledge Media Institute of The Open University, UK, focusing on combining LLMs and KGs for knowledge extraction in digital humanities and social sciences. He researches and promotes façade-based data access for querying heterogeneous data in SPARQL, and chairs the W3C Data Façades Community Group. He has co-organised numerous workshops within the Semantic Web community.
Aidan Hogan
Aidan Hogan

DCC, Universidad de Chile
Chile

Aidan Hogan is a professor in the Department of Computer Science (DCC) at the Universidad de Chile, and an Associate Researcher of the Millennium Institute for Foundational Research on Data (IMFD). His research focuses on querying Knowledge Graphs, with emphasis on indexing, query optimization, reasoning, and interfaces. He has participated in the organization of dozens of workshops and conferences on related topics.

Program Committee

More members to be announced.

  • Andrea Schimmenti, University of Bologna, Bologna, Italy
  • Arianna Muti, Bocconi University, Milan, Italy
  • Antonello Meloni, University of Cagliari, Cagliari, Italy
  • Beatrice Fiumanò, University of Bologna, Bologna, Italy
  • Célian Ringwald, University of Bologna, Bologna, Italy
  • Cristian Santini, University of Macerata, Macerata, Italy
  • Delfina Sol Martinez Pandiani, University of Amsterdam, Amsterdam,Netherlands
  • Federico Pianzola, University of Groningen, Groningen, Netherlands
  • Filip Ilievski, Vrije Universiteit Amsterdam,Amsterdam, Netherlands
  • Gianmarco Pappacoda, University of Bologna, Bologna, Italy
  • Luana Bulla, University of Bologna, Bologna, Italy
  • Jan-Christoph Kalo, University of Amsterdam, Amsterdam,Netherlands
  • Nicolas Lazzari, University of Pisa / University of Bologna, Pisa / Bologna, Italy
  • Rossana Damiano, University of Turin, Turin, Italy
  • Ruhi Mahadeshwar, University of Groningen, Groningen,Netherlands
  • Stefano De Giorgis, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
  • Stella Verkijk, Vrije University Amsterdam, Netherlands
  • Teresa Paccosi, KNAW, Amsterdam, Netherlands

Workshop Program

Workshop program to be announced.

Keynote Speakers

To be announced soon!

Contact

For any inquiries, please contact Lia Draetta and Arianna Graciotti at: lia.draetta@unito.it & a.graciotti@rug.nl.