Impresso’s mission – enabling exploration of historical newspapers and radio broadcasts across languages, media, and national borders – depends on a lot of invisible work. Much attention rightly goes to the semantic enrichments, the interfaces, the historical research they enable. But before any of that can happen, there is a more fundamental question to answer: how do you actually represent and manipulate the data at scale? Not how you enrich, analyse or display it, but how you hold it together in the first place, across hundreds of sources, in different formats, with varying digitization quality and refinement levels across decades of digitization campaigns.