X-TAIL Workshop 2026

About

Large Language Models (LLMs) encode world knowledge through pre-training on massive datasets,^[1] making them the backbone of knowledge extraction applications. Their reliability degrades on long-tail knowledge: low-popularity knowledge that occurs infrequently in pre-training data.^[2] Popularity is not neutral: pre-training datasets are predominantly web-crawled and, as such, are generalist, English-centric, and mostly produced over the past 30 years by Western, High-income, Educated, Liberal, Male-dominated (WHELM)^[3] communities, raising the risk of models underperforming on specialized domains, non-English languages and non-contemporary times sources, and on knowledge belonging to marginalized social groups. Retrieval-Augmented Generation (RAG) has been proposed as a mitigation, but corpora used for retrieval may still be biased. Knowledge Graphs (KGs) provide a more transparent and deterministic alternative, yet open-domain KGs such as Wikidata exhibit coverage gaps along the same dimensions.^[4] The X-TAIL workshop aims to advance research on extracting, exploiting, and ultimately preserving long-tail knowledge, blending the strengths of LLMs and KGs.

The previous edition saw great engagement by the public and insightful discussion on the challenges of dealing with long-tail knowledge. Papers were published in the Joint Proceedings of Posters, Demos, Workshops, and Tutorials of EKAW 2024. The invited speaker's talk was delivered by Jan-Christoph Kalo (University of Amsterdam), with the title What do Large Language Models know about the World?. The invited speaker's talk notes, accepted papers, and slides can be accessed and downloaded from the previous edition webpage.

^[1] Petroni, Fabio, et al. "Language Models as Knowledge Bases?" In EMNLP-IJCNLP, Association for Computational Linguistics, 2019. https://doi.org/10.18653/v1/D19-1250

^[2] Kandpal, Nikhil, et al. "Large Language Models Struggle to Learn Long-Tail Knowledge." Proceedings of the 40th International Conference on Machine Learning (Honolulu, Hawaii, USA), ICML’23, vol. 202 (July 2023): 15696–707. https://dl.acm.org/doi/10.5555/3618408.3619049

^[3] Daryani, Yalda, et al. "The Homogenizing Engine: AI’s Role in Standardizing Culture and the Path to Policy." Policy Insights from the Behavioral and Brain Sciences 13, no. 1 (2026): 14–27. https://doi.org/10.1177/23727322251406591

^[4] Kraft, Angelie, and Soulier, Eloïse. "Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AI." Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (New York, NY, USA), FAccT ‘24, June 5, 2024, 1433–45. https://doi.org/10.1145/3630106.3658981

Call for Papers

The Call for Papers is now available: https://easychair.org/cfp/x-tail26

The main topics of interest include:

Head and tail knowledge definition: operationalisation and computation of knowledge popularity
Knowledge Extraction from long-tail sources:
- Tasks: Relation Extraction, Entity Linking, Knowledge Graph generation, Question Answering
- Non-conventional source types, such as for example: domain-specific, multilingual, historical, related to marginalised social groups
Context-augmented methods optimised for long-tail knowledge, such as Retrieval-Augmented Generation (RAG) and Knowledge-Augmented Generation (KAG)
Knowledge Graph generation and completion for mitigating coverage gaps in existing knowledge bases
Computational methods for long-tail knowledge processing in specialised domains, such as Health, Law, Finance, Sustainability, Digital Humanities, Computational Literary Studies, Computational History, etc.
Impact of multilingualism on model performance on long-tail knowledge
Long-tail knowledge representation in ontologies and Knowledge Graphs (KGs)
Benchmarks that systematically address biases of both Large Language Models and Knowledge Graphs on knowledge related, for example, to specialised domains, to low-resourced languages and cultures or marginalised social groups, and historical data.
Error analysis of system performance stratified by knowledge popularity and its intersection with domain, language, social group, and time dimensions.
Negative results on methods developed to mitigate underperformance on long-tail knowledge
Studies of harms perpetuated by popularity-driven knowledge hierarchies learned by Large Language Models and reflected in Knowledge Graphs

Submission format and guidelines:

Papers must be submitted in PDF format according to the CEUR-WS template published in the CEUR-WS guidelines.
Long papers should be up to 8 pages, excluding references.
Short papers should be up to 5 pages, excluding references.
Workshop papers must be self-contained and in English.

Important Dates

All deadlines are to be considered 23:59 AoE.

Workshop Papers Submission Deadline: ~~July 26, 2026~~ July 31, 2026 (Extended Deadline)
Workshop Papers Notification: August 31, 2026
Workshop Papers Camera Ready: September 20, 2026
Workshop Day: September 29, 2026

Every paper accepted to the workshop must be covered by one registration, either student or regular.

Organising Committee

Lia Draetta

University of Turin
Italy

Lia Draetta is a PhD student in the Computer Science Department at the University of Turin, Italy, with a background in linguistics. Her research focuses on under-representation and bias in data sources, and on how large language models handle rare entities and marginalised communities. She is also interested in knowledge graphs and methodologies to integrate structured knowledge into LLMs, with a focus on long-tail and underrepresented entities.

Webpage E-mail

Arianna Graciotti

University of Groningen
Netherlands

Arianna Graciotti is a post-doctoral researcher in Natural Language Processing, Semantic Web and Digital Humanities at the University of Groningen, Netherlands. With a background in computational linguistics, her research focuses on investigating which and whose knowledge risks being lost in a world in which LLMs become central knowledge technologies, with the mission of developing resources and methods to advance equitable knowledge access and cultural heritage preservation across communities of knowers. She has co-organised SemDH 2025 and the X-TAIL workshop at EKAW 2024.

Webpage E-mail

Enrico Daga

The Open University
United Kingdom

Enrico Daga is a Senior Research Fellow at the Knowledge Media Institute of The Open University, UK, focusing on combining LLMs and KGs for knowledge extraction in digital humanities and social sciences. He researches and promotes façade-based data access for querying heterogeneous data in SPARQL, and chairs the W3C Data Façades Community Group. He has co-organised numerous workshops within the Semantic Web community.

Webpage E-mail

Aidan Hogan

DCC, Universidad de Chile
Chile

Aidan Hogan is a professor in the Department of Computer Science (DCC) at the Universidad de Chile, and an Associate Researcher of the Millennium Institute for Foundational Research on Data (IMFD). His research focuses on querying Knowledge Graphs, with emphasis on indexing, query optimization, reasoning, and interfaces. He has participated in the organization of dozens of workshops and conferences on related topics.

Webpage E-mail

Program Committee

Andrea Schimmenti, University of Bologna, Bologna, Italy
Arianna Muti, Bocconi University, Milan, Italy
Antonello Meloni, University of Cagliari, Cagliari, Italy
Beatrice Fiumanò, University of Bologna, Bologna, Italy
Célian Ringwald, University of Bologna, Bologna, Italy
Cristian Santini, University of Macerata, Macerata, Italy
Delfina Sol Martinez Pandiani, University of Amsterdam, Amsterdam,Netherlands
Federico Pianzola, University of Groningen, Groningen, Netherlands
Filip Ilievski, Vrije Universiteit Amsterdam,Amsterdam, Netherlands
Gianmarco Pappacoda, University of Bologna, Bologna, Italy
Luana Bulla, University of Bologna, Bologna, Italy
Jan-Christoph Kalo, University of Amsterdam, Amsterdam, Netherlands
Marco Stranisci, IT University of Copenaghen (ITU), Copenhagen, Denmark
Nicolas Lazzari, University of Pisa / University of Bologna, Pisa / Bologna, Italy
Rossana Damiano, University of Turin, Turin, Italy
Ruhi Mahadeshwar, University of Groningen, Groningen,Netherlands
Stefano De Giorgis, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
Stella Verkijk, Vrije University Amsterdam, Netherlands
Teresa Paccosi, KNAW, Amsterdam, Netherlands

Workshop Program

Workshop program to be announced.

Keynote Speakers

To be announced soon!

Webpage E-mail

Contact

For any inquiries, please contact Lia Draetta and Arianna Graciotti at: lia.draetta@unito.it & a.graciotti@rug.nl.

About

Call for Papers

Important Dates

Organising Committee

Lia Draetta

Bio

Arianna Graciotti

Bio

Enrico Daga

Bio

Aidan Hogan

Bio

Program Committee

Workshop Program

Keynote Speakers

To be announced soon!

Contact