Skip to main content

Arquivo.pt preserved online data from European H2020 projects

Arquivo.pt, a service managed by FCCN, the Scientific Computing Unit of FCT, recently preserved 197 million web files documenting research and development projects funded by the European Horizon 2020 program. This digital preservation safeguards approximately 17 terabytes of information and prevents it from being lost forever.

After identifying and preserving websites of research and development projects funded by the European Union during the FP4, FP5, FP6, and FP7 programs (from 1994 to 2013), Arquivo.pt has now saved valuable online information at risk of disappearing under the Horizon 2020 program (2014 to 2021).

In recent years, the use of websites to document research project activities has been increasing. These websites provide relevant scientific information that complements published literature, such as open data sets, presentations at events, or developed software. With the end of projects, this information was at risk of being irretrievably lost.

The task of identifying research projects involved various methodologies and the use of the European Union's open data portal. However, this portal does not provide all the information, and many projects did not include a website. It was therefore necessary to use tools developed by Arquivo.pt to supplement the missing information. For example, the website for the Extended Model of Organic Semiconductors (EXTMOS) project, which was available at extmos.eu, was no longer active. However, the information is fully accessible via Arquivo.pt.

Arquivo.pt provides further informationabout this work and continues to invite all users to suggest websitesthat could be preserved.

Arquivo.pt is a public service, free of charge and freely accessible to all web users. Every day, millions of pages are published on the web, but 80% of this information disappears one year after publication and becomes inaccessible. Arquivo.pt aims to counteract this trend and enable the search and retrieval of information from old websites.