Development of a python-based NISO-STS document crawler for the creation of NLP pipeline input data

  • Most of the information relating to the requirements of construction practices and buildings can be found in standards and regulations. However, most of these requirements are not available in a machine-readable format. Some approaches such as the Initiative Smart Standards of the German Institute for Standardization (DIN) aim to address this issue by presenting the data as NISO Standards Tag Suite (NISO-STS), an XML model for publishing and exchanging full-text content and metadata of standards. The DIN EN ISO 23386 is a civil engineering standard that proposes a methodology for describing building element properties as interconnected networks via property groups. The IDDO ontology implements the data model of ISO 23386. To enhance the properties and groups with textual information, the NISO-STS files are utilized. To extract this data, the textual content within the NISO-STS XML documents needs to be scraped and the extracted results must be validated. This paper presents an approach to extracting textual information from NISO-STS XML files to enrich the IDDO-Ontology properties and property groups. To illustrate the applicability of the Python-based crawler, the extracted textual information is further processed using different Natural Language Processing (NLP) algorithms.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Kilian Jonathan KandtGND, Sven ZentgrafORCiDGND
URN:urn:nbn:de:hbz:294-101291
DOI:https://doi.org/10.13154/294-10129
Parent Title (English):34th Forum Bauinformatik / 34. Forum Bauinformatik (Bochum, 06. - 08.09.2023)
Document Type:Part of a Book
Language:English
Date of Publication (online):2023/09/07
Date of first Publication:2023/09/07
Publishing Institution:Ruhr-Universität Bochum, Universitätsbibliothek
Tag:BIM; IFC
Algorithms; Design Automation; Energy Simulation
First Page:259
Last Page:266
Institutes/Facilities:Lehrstuhl für Informatik im Bauwesen
Dewey Decimal Classification:Technik, Medizin, angewandte Wissenschaften / Ingenieurbau, Umwelttechnik
open_access (DINI-Set):open_access
faculties:Fakultät für Bau- und Umweltingenieurwissenschaften
Konferenz-/Sammelbände:34th Forum Bauinformatik / 34. Forum Bauinformatik (Bochum, 06. - 08.09.2023)
Licence (German):License LogoCreative Commons - CC BY 4.0 - Namensnennung 4.0 International