More details, including all training code and model weights, will be released upon acceptance of the paper. Thank you for your interest!
Abstract
The development of scalable models for automated Building Information Modeling (BIM) element classification is hindered by the reliance on supervised learning, which requires expensive and laborious manual data annotation. This paper introduces BIM-JEPA, a foundation model that leverages a Joint Embedding Predictive Architecture for self-supervised pre-training on unlabeled 3D point cloud representations of individual BIM elements. By predicting the latent representations of masked regions of element geometry, BIM-JEPA learns semantically rich features that achieve competitive accuracy on a downstream classification task, outperforming existing supervised methods without heavy data augmentation, while excelling in data-scarce scenarios. This paper mitigates the data annotation bottleneck and establishes a path toward developing a foundation model for BIM geometry, enabling more scalable, data-efficient, and generalizable representation learning in the Architecture, Engineering, and Construction domain.