Data publikacji w serwisie:

Wykład gościnny dra Pétera Király

Serdecznie zapraszamy na wykład gościnny pt. Quality assessment, estimation of lost books and finding patterns of translations. Three bibliographic data science use cases, w ramach projektu ATRIUM (Advancing fronTier Research In the arts and hUManities), który wygłosi dr Péter Király w dniu 21 marca 2025, godz. 13.30, sala C1.

Three ways of handling library data for different aspects of humanities studies will be presented. First, quality assessment and exploratory data analysis - i.e. how to decide if the quality of the data is good enough and to get a glimpse of the nature of data. Library catalogues are different, and the standardized data structures („metadata schemas”) give lots of possibilities to the libraries to record information about a book, manuscripts or other kinds of library holdings. Moreover, these schemas are encoded, and for a researcher without proper training, it is not easy to interpret the data in its native form. Second, a historical statistical use case will be displayed. We know of about 6000 publications printed in Hungary or in Hungarian before 1700. A new database allows us to explore the metadata of this corpus quantitatively. What does a quick look at the data reveal? And is it possible to estimate how many we do not know anything about? What percentage of all printed publications have disappeared without a trace? What is the ratio of this bibliographical dark matter to the surviving body of printed material? We use some mathematical formulas borrowed from biostatistics and already applied in archeology to estimate this dark matter. Finally, the last use case is patterns of translation flow that can be detected from bibliographical data. Library catalogues contain translated literary works. We have to extract information such as the title of the original work, the translated titles, both languages (and intermediary languages if there are any), and similar information. Once we have these data we can use data science techniques to trace trends, such as how the source or target languages has been changed, how a popularity of a given author, group or a genre has been changed, who are the authors that have similar patterns, are the trendsetter and follower literatures, and what forces could affect the trends. If we have enough data, we can compare distinct literature traditions and find common regional (e.g. Central European) patterns. The presentation builds on open source tools and mostly on open access data, so the approaches could be replicable for other literatures, however, I mainly use Hungarian data. Examples from the European Literary Bibliography will also be presented.

Biography

Péter Király is a software developer and researcher at GWDG, the data, computing and IT research center for the Max-Planck-Society and the University of Göttingen. He has an MA in History (University of Miskolc, Hungary), and Ph.D. in Comparative Studies – General and Comparative Literature and Cultural Studies (University of Göttingen, Germany). His main research interests are quality assessment of cultural heritage metadata and the data analysis of these metadata as historical sources. He is an editor of Code4Lib Journal, co-chair of LIBER Data Science in Libraries working group, member of library and digital humanities related groups, maker and supporter of open source and open data projects. He is collaborating with the British Library, Belgian National Library, Europeana, Deutsche Digitale Bibliothek, Victoria and Albert Museum, Harvard University, University of Helsinki, University of Mainz and other research and cultural heritage organizations worldwide.