Development of a python-based NISO-STS document crawler for the creation of NLP pipeline input data
- Most of the information relating to the requirements of construction practices and buildings can be found in standards and regulations. However, most of these requirements are not available in a machine-readable format. Some approaches such as the Initiative Smart Standards of the German Institute for Standardization (DIN) aim to address this issue by presenting the data as NISO Standards Tag Suite (NISO-STS), an XML model for publishing and exchanging full-text content and metadata of standards. The DIN EN ISO 23386 is a civil engineering standard that proposes a methodology for describing building element properties as interconnected networks via property groups. The IDDO ontology implements the data model of ISO 23386. To enhance the properties and groups with textual information, the NISO-STS files are utilized. To extract this data, the textual content within the NISO-STS XML documents needs to be scraped and the extracted results must be validated. This paper presents an approach to extracting textual information from NISO-STS XML files to enrich the IDDO-Ontology properties and property groups. To illustrate the applicability of the Python-based crawler, the extracted textual information is further processed using different Natural Language Processing (NLP) algorithms.
Author: | Kilian Jonathan KandtGND, Sven ZentgrafORCiDGND |
---|---|
URN: | urn:nbn:de:hbz:294-101291 |
DOI: | https://doi.org/10.13154/294-10129 |
Parent Title (English): | 34th Forum Bauinformatik / 34. Forum Bauinformatik (Bochum, 06. - 08.09.2023) |
Document Type: | Part of a Book |
Language: | English |
Date of Publication (online): | 2023/09/07 |
Date of first Publication: | 2023/09/07 |
Publishing Institution: | Ruhr-Universität Bochum, Universitätsbibliothek |
Tag: | BIM; IFC Algorithms; Design Automation; Energy Simulation |
First Page: | 259 |
Last Page: | 266 |
Institutes/Facilities: | Lehrstuhl für Informatik im Bauwesen |
Dewey Decimal Classification: | Technik, Medizin, angewandte Wissenschaften / Ingenieurbau, Umwelttechnik |
open_access (DINI-Set): | open_access |
faculties: | Fakultät für Bau- und Umweltingenieurwissenschaften |
Konferenz-/Sammelbände: | 34th Forum Bauinformatik / 34. Forum Bauinformatik (Bochum, 06. - 08.09.2023) |
Licence (German): | Creative Commons - CC BY 4.0 - Namensnennung 4.0 International |