d4SU Server

Overview

The d4SU Server platform is focused on the management of the Building and Infrastructure Data in the context of Spatial Units subjected to Public Sector regulations. Currently the focus is on the Building Data and services relevant to Infrastructure data are not available at this moment.

By 'focused on data', it is also meant 'not providing' other functionalities such as the integration in a broader landscape. This is the responsibility of an Integration Platform that is not is scope of the d4SU.

The diagram hereunder is built with Mermaid.js integrated with Material for MkDocs. At the present time, it is not quite legible in dark mode and a swicth to standard mode (top menu bar) is recommended


C4Container
    System_Ext(Integration, "Integration server", "A system that sends processing Request to the <br />Backend server and gets responses on behalf of a client system")

    System_Boundary(b1, "Backend Server") {
        System_Boundary(b2, "Server") {
            System(FastAPI, "FastAPI", "Processes requests<br /> Delegates heavy tasks to Celery")

            Component(Tools, "Tools", "IfcOpenShell<br />IfcJSON")

            Component(Logging, "Logging","")

            System(Celery, "Celery Worker", "Processes heavy tasks")
        } 
        System_Boundary(b3, "Broker") {
            SystemQueue(Broker, "Broker and <br />Result Store (Redis)", "Act as a Message Broker and  <br />a Result Store between FastAPI and Celery")
        }    
        System_Boundary(b4, "Database", "Database") {
            SystemDb(Database, "Postgres Database", "Stores the D4SU data") 
        }
        System_Boundary(b5, "File Storage", "") {
            SystemDb(FileStorage, "File Storage (Apache Arrow FS)", "Provides file storage services for IFC and IfcJSON files")   
        }

    }
    BiRel(Integration, FastAPI , "Request / Response or Notification")
    BiRel(FastAPI, Broker, "Queue Task (Chain), Read Result")
    Rel(Celery, Broker, "Fetch from Queue and Write Result")
    Rel(FastAPI, Tools, "uses")
    Rel(Celery, Tools, "uses")
    Rel(FastAPI, Database, "CRUD Data")
    Rel(Celery, Database, "CRUD Data") 
    Rel(Celery, FileStorage, "Read/Write")

    UpdateElementStyle(Integration, $fontColor="white", $bgColor="black", $borderColor="white")

    UpdateRelStyle(FastAPI, Integration, $offsetX="20", $offsetY="20")
    UpdateLayoutConfig($c4ShapeInRow="3", $c4BoundaryInRow="2")

The d4SU Platform has 4 main technology components

FastAPI A modern, fast (high-performance), web framework for building APIs with Python.
Celery A Distributed Task Queue in Python
Redis An In-Memory Database used for Celery as Message Queue and Result Store
PostgreSQL A powerful, open source object-relational database with support for a large variety of column types (including json, jsonb, geometry) and Partitioning options for very large DB size. It has support for Geomtry with PostGIS and support for Graph with Apache AGE.

The d4SU Platform also leverage:

SQLModel SQLModel is based on Python type annotations, and powered by Pydantic and SQLAlchemy
PyArrow Apache Arrow - Filesystem Interface provides input and output streams as well as directory operations for the Local File System, Amazon S3, Google Cloud Storage (GCS), HDFS and others.
Other Apache Arrow tools (e.g., ADBC Driver for PostgreSQL, or support for Machine Learning Workflows) might be super relevant but that's not yet investigated.
ifcOpenShell A suite of IFC tools in C++ and Python the base layer of the d4SU Server and the rational for it to be in Python
IfcSON Python Tool (on top of ifcOpenShell) to convert an IFC to a JSON (IFC2JSON) and a JSON to an IFC (JSON2IFC). IfcJSON has been built in the context of a larger project presented in IFC.JSON

Rationale for tools

The d4SU platform is at the early stage of exploration, concept validation and specification. Speed and ease of implementation is key.

Why Python? Python is very easy to comprehend and has a huge ecosystem. Additionally, IfcOpenShell has a python version and it makes IFC much more accessible. With Jupyter, everthing can be tried before moving the code to a server plaform. And pandas are always welcome to make things simpler.
Why FastAPI? Incredibly simple and efficient.
Why Celery? Can't do without something to manage heavy tasks and avoid blocking the requester. Celery is a perfect fit, powerful and easy to use.
Why Redis? Again, powerful, and easy to use. With Celery, there are multiple options for the Broker and Result Store. Redis caters for the two.
Why PostgreSQL? It's mainly why another db? It has all that is needed. It has also Support for Geometry and Graph, via extensions. For extremely large DB, an offloading to HBase with Phoenix could be considered, but that is not needed at this stage.
Why SQLModel (on top of SQLAlchemy) to access PostgreSQL? Again ease of use.

When working with IFC, there is always a need for a Viewer. I am using:

OpenIFCViewer
ThatOpen (Web)
BIMCollab ZOOM
Solibri Anywhere
Blender with Bonsai; see also the video on Youtube

Why several viewers?: with the difference in versions of the IFC files (2x3, 4.0, ...) and and possible tools rendering issues, I found it a necessity to control results with several viewers. They each have distinctive capabilities.

For coding, I use Visual Studio Code and GitHub Copilot. GenAI for coding is a marvel. It does not always provide the solution to the problem but it wonderfully seeds the process that leads to the solution.

For the documentation, I use Material for MkDocs. It's a breeze!

Simple flow

sequenceDiagram
    autonumber
    Integration ->> FastAPI: RESTful request
    FastAPI ->> PostgreSQL: CRUD 
    FastAPI -->> Integration: response

Within FastAPI, the the design is layered in a standard way, with all layers sharing the same models

%%{init: {"flowchart": {"htmlLabels": true}} }%%
flowchart
    direction RL
    subgraph fastapi ["FastAPI"]
        direction TB
        web
        service
        data
        model
        web --> service
        service --> data
        data --> service
        service --> web
    end
    subgraph storage
        direction TB
        database
        filestorage
    end
    fastapi --> storage

Flow with long running background tasks

For relatively 'long running' tasks, the caller will not hang waiting for the completion. The latter will be notified asynchronously.

Mermaid.js layout is sometimes a mystery!

%%{init: {"flowchart": {"htmlLabels": true}} }%%
flowchart
    direction TB
    integration
    subgraph d4su-server ["d4su Platform"]
        direction TB
        subgraph front-server
            direction TB
            fastapi[FastAPI] 
        end
        subgraph task-server
            celery["Celery Worker(s) for background tasks"]
        end
        subgraph broker ["Broker and Result Store"]
            direction TB
            redis[Redis]
        end
        subgraph Storage
            direction TB
            db[PostgreSQL]
            fs["File Storage<br />with Apache Arrow FS (local fs, S3, ...)"]
        end
    end   
    integration -- "*1.* request" --> fastapi
    fastapi -- "*2.* rsponse with follow-up id" --> integration
    fastapi -- "*3.* instructions" --> redis
    celery -- "*4.* get instructions" --> redis
    celery -- "*5.* get and write files" --> fs
    celery -- "*6.* CRUD Model" --> db
    celery -- "*7.* write task results" --> redis
    celery -- "*8.* notify result" --> fastapi
    fastapi -- "*9.* notify result for follow-up id" --> integration

sequenceDiagram
    autonumber
    Integration ->> FastAPI: request
    FastAPI -->> Integration: response with follow-up id
    FastAPI ->> Redis: intructions
    Celery Worker ->> Redis: get instructions
    Celery Worker ->> File Storage: get and write files
    Celery Worker ->> PostgreSQL : CRUD Model
    Celery Worker ->> Redis: write task results
    Celery Worker -->> FastAPI: notify task completion
    FastAPI -->> Integration: notify result for follow-up id

This illustrative flow is as follows:

A request is made by the Integration platform and received by the web layer in FastAPI. The request can come as a RESTfull request or as a WebSocket message.
A response is provided by FastAPI with an Id for the follow-up
The request is serviced by the sevice layer in FastAPI. There, the request is associated with a single task, or with a group of tasks that can be executed in parallel or as a chain of tasks that must be performed in sequence. The execution is trigered by sending a message to the broker (Redis).
Celery has started a number of worker (by default 2 workers per core). A Celery Worker takes the task from the broker and execute the task
The task execution may e.g., gets an IFC file, transforms the file, filters the content
The task execution may eg., stores the filtered content in Postgres, ... One big task can be splitted in independent tasks that can be executed in parallel or in sequence in a chain.
Each task writes its result (such as status and parameters to be transfered to the next tasks) in the Result Store, which here is also Redis.
At the end, the task completion (one task, the group or the chain) is notified back to FastAPI. This can be done by a POST back to FastAPI. Alternatively, FastAPI could poll the broker every second to get the status of the task.
FastAPI notifies the Integration Platform of the result of the request. Can be done va WebSockets or via a 'callback' POST.

Deployment view

architecture-beta
    group d4su(cloud)[d4SU]
    service server1(server)[FastAPI] in d4su
    service server2(server)[Redis] in d4su
    service server3(server)[Celery] in d4su
    service db(database)[PosgreSQL] in d4su
    service disk(disk)[FileStore] in d4su
    server2:L -- R:server1
    server3:T -- B:server2
    db:R -- L:server3
    db:T -- B:server1
    disk:L -- R:server3

The proposed platform relies on Postgresql for the storage of data with a specific data model that will be outlined in the Data Model section. The data model is designed to accomodate IFC data but also other formats.

There are alternative solutions or approaches for storing data that can - at a point in time - complement the proposed approach. The solutions hereunder have IFC as main or exclusive focus with the exception of Speckle that provides a complete integration platform for the AEC industry.

ifc2sql - in IfcOpenShell / IfcPatch / Recipes

ifc2sql converts an IFC-SPF model to SQLite or MySQL. It is part of IfcOpenShell IfcPatch as a 'recipe'. The code is available on github. ifc2sql creates a table for each ifc entity.

Development language: Python

IfcSQL

ifcSQL is a database-schema for storing IFC-based models including the IFC data model schema of buildingSMART International. ifcSQL requires Microsoft SQLServer. It creates one table for Entities and a distinct table for each basetype of attributes.

Development language: C#

BIMserver

BIMServer enables to store and manage the information of a construction (or other building related) project. Data is stored in the open data standard IFC. The BIMserver is not a fileserver, but it uses a model-driven architecture approach. This means that IFC data is stored as objects. You could see BIMserver as an IFC database, with special extra features like model checking, versioning, project structures, merging, etc. It use Berkeley DB which is an effective KeyValue store.

Development language: Java

IFC-Graph

IFC-Graph is based on work described in the paper IFC-graph for facilitating building information access and query and enables to convert and IFC Model to a Labeled Property Graph (LPG) stored in the Graph Database Neo4j. It builds the graph with nodes for both entities and relationships (referred to repectively as 'full entities' and 'bridging entities'). Expectedly, processing time increases dramatically with the number of nodes.

Development language: Python

Speckle

Speckle Server positions itself as the data infrastructure for the AEC industry. Speckle is an open-source platform designed for the architecture, engineering, and construction (AEC) industries. It facilitates the exchange of data between various software applications and stakeholders involved in building design and construction projects. Speckle uses PostgreSQL as its primary database for storing and managing data. Speckle provides a python sdk Speckle | specklepy

Development languages : C#, Typescript, Python

d4SU Server

Overview

Rationale for tools

Simple flow

Flow with long running background tasks

Deployment view

Other database storage for technical data related to Spatial Units

ifc2sql - in IfcOpenShell / IfcPatch / Recipes

IfcSQL

BIMserver

IFC-Graph

Speckle