You have 1 message!
cubicasa floor plan app

Checkout our new mobile app!

Home » Blog » Proptech » Introduction to Property Data Automation

Introduction to Property Data Automation


Finally, it’s time to start breaking the silence! We’ve been all hands-on developing new, cool proptech stuff being unveiled bit by bit, starting with property data automation. The CubiCasa long-term vision is described as “AI-powered computer vision solution to bring all indoor spaces into the cloud”. Also, each new release is bringing us -and you- closer to the new-generation property data platform. We already have converted and normalized over 100,000,000 sqft of indoor space data to provide necessary input for automated valuation models (AVM) and comparable market analyses (CMA). This is how we go from floor plans and point clouds and neural networks into uniform datasets by providing the first sight into CubiCasa property data automation.


Background and proptech industry development


We’d say the industry is in a similar phase as social media was 15 years ago. A number of early adopters have noticed new possibilities. They all realized the market is enormous, after all, human connection is one of those few things that never goes out of fashion. Some made it big, most didn’t find success, some found their perfect niche and survived despite one of the most competed markets ever. Sooner or later, the magic pattern will be found – after lots of trial and error. We won’t go to the first-mover advantage because there might not be such a privilege. Instead, there will be those who improve existing ideas as far as possible, while the competition itself lowers the costs.


The concept of real estate will remain forever. Everyone needs shelter, ownership is valuable, and in words of Mark Twain: buy land, they’re not making it anymore. One day, the earliest innovators noticed the opportunities of developing digital standards into a traditional-by-nature kind of business. Of course, the actual properties are a little tough to digitize, all bits and bytes can’t provide a roof for a rainy day. Yet, everything else around the properties can be digitized: search process, marketing materials, paperwork, organizing public housing departments’ archives, and later even the transaction with blockchain technology. CubiCasa started as a property marketing and visualization provider -will remain as one- and is also developing a new world of opportunities with a new kind of property data.


First, we must understand what a property is. We’ll have a “lightweight” and analytical approach: every property has its own characteristics, layout, spaces, and purpose. With floor plans, we focus in the geometry (walls, openings, windows, and doors) and the semantic data (types of rooms, fixed furniture, area size, dimensions, …) that could be extracted from the source blueprint.


The neural network and automation processes explained


Earlier we started to explore neural networks and the potential possibilities on what they can unlock in the world of real estate and floor plans. Neural networks, in general, have been proven to be good at image recognition and classifying the image content. What if we’d train a neural net to “understand” floor plans? With a little technology, the conversion from a raster image to a BIM model could become a pretty simple process. From that idea, we continued with a wide and thorough research period by our deep learning team, accepting the challenge.


We tested many kinds of architectures and configurations until we came up with the working one for our use case. We also created a dataset which we used later on to experiment and train the neural network. From the first larger iteration round we got the first visual results. These first results prove that predicting floor plans with convolutional neural networks is possible.



Images above show what the neural network ‘sees’ when a floor plan image is fed through it.


In the output pictures, green areas represent door locations, blue = wall, red = window, and gray = fixed furniture. From left image to the right one, the number of iterations grows larger (from thousands to hundreds of thousands) and the progress in recognition can be seen even inside of one dataset.


Datasets are a combination of floor plan images and matching label images (also known as “ground truths”). Pixels in the ground truth are bound to pixels in the floor plan. This way we can connect the pixels representing elements in a floor plan to a certain label and from there the pixel values can be learned.


With these datasets, convolutional neural networks can be trained to recognize and learn different characteristics from the images. The neural network consists of a series of layers which have a certain amount of nodes. These nodes take data as an input and then output the data, after applying certain weights on it. This information is then sent to the next node. The final layer on the other side signals how the network responds to the information learned. The prediction and the ground truth are then compared. With the difference between the two images, weights and connections inside hidden layers are adjusted. This is called backpropagation and it starts to narrow down the difference between the result and actual ground truth.


Concluding this section, this is the first glance on how the neural network sees the floor plans and what characteristics the network actually learns. Actual optimization, testing of the network with different configurations and how the results change is a topic for another blog.




In this example, we have a schematic 2D CubiCasa floor plan and the data interpretation of this property.


Then we have a caption of the underlying dataset:



This is not a full example of a one-property information model, but definitely a, part of the final output extracted from the source floor plan. This is roughly divided into three sections: the building-related information, the location-related information, like universal property identifier (UPID) and the actual indoor space with all the various metadata.


From our information models, which we generate from normal raster floor plans, we can extract a lot of useful information and metadata. The exchange collaboration format is .IFC for the actual geometry. This data may include all kinds of indoor space information, like areas, window counts and kitchen appliance types.


The location needs to be taken from an outside source. After all, the postal address is the best kind of unique ID for each property – dating back to a time before any kind of data analytics. We have solved this case by asking the postal address in the order form, to get the unique location ID and other relevant information.


The parameters marked with red dots can be used to filter more detailed indoor space information. In this example, there are filters for types of rooms, fixed furniture, element id or indoor space.


filter_names: returns the names of specific filter, e.g.
* room_names in an apartment space = names of all rooms in an apartment
* apartment_names in a level_space = names of the apartments in a level

filter_count: counts a certain filter in space, e.g.
* room_count in an apartment space = count of rooms in the apartment
* window_count in a living_room space = count of windows in the living room
* baseCabinet_count in a kitchen space = count of base cabinets in the kitchen

filter_area: an area of a certain filter in space, e.g.
* room_area in an apartment space with list = a list of all room names and areas in the apartment
* window_area in a living_room space with a total = total area of the windows in the living room
* countertop_area in a kitchen space with a total = total area of the countertop in the kitchen


This data will fulfill one part of the most sought-after needs in digital real estate: AVM & CMA. With using our method, all properties’ indoor space characteristics are normalized in order to make this comparison as accurate as it can be. Naturally this data doesn’t suggest anything of external market expectations, detailed area analytics or those kinds of outputs, instead, we provide a uniform method to compare the properties’ indoor characteristics with each other. This data complements the AVMs and CMAs by adding new information to make better predictions and suggestions.


See more on property data automation and the free BETA


There is a number of different kinds of valuation models (some examples at the end of this blog). So far there hasn’t been a universal method since as always, real estate is a hyperlocal business. Yet there will be a pattern of what should be included in an outstanding AVM or CMA. Who will be the lucky winner?

We don’t have a comprehensive answer to that question -for not, at least- but we can provide our best option. Go to CubiCasa Property Data site and check all use cases what our solution enables!




Written by Tuomas Aarni, Sales Manager & Markus Häikiö, Project Manager.


AVM examples:

Author: Aarne Huttunen

Aarne is the Chief Product Officer at CubiCasa. His main priority is to ensure that CubiCasa's users love to use the CubiCasa App and related APIs. Most likely you'll spot him next to a coffee cup in Helsinki or meet him in a conference running a wild scanning demo!

We're building technology to digitize the real estate around us, and while doing it, helping families to find better homes, approve mortgages and renovate their homes. We are located in Oulu, Helsinki, San Jose, and Ho Chi Minh City. Currently we are especially looking for software developers to join our team.

cubicasa recruiting faces


Join the CubiCasa Family

CubiCasa announcement