A Comparison of House Price Classification with Structured and Unstructured Text Data

Erika Cardenas; Connor Shorten; Taghi M. Khoshgoftaar; Borivoje Furht

Conference ProceedingsOPEN ACCESS

A Comparison of House Price Classification with Structured and Unstructured Text Data

Proceedings of the International Florida Artificial Intelligence Research Society Conference, FLAIRS (2022) 35

DOI: 10.32473/flairs.v35i.130668

2Citations

7Readers

Abstract

Purchasing a home is one of the largest investments most people make. House price prediction allows individuals to be informed about their asset wealth. Transparent pricing on homes allows for a more efficient market and economy. We report the performance of machine learning models trained with structured tabular representations and unstructured text descriptions. We collected a dataset of 200 descriptions of houses which include meta-information, as well as text descriptions. We test logistic regression and multi-layer perceptron (MLP) classifiers on dividing these houses into binary buckets based on fixed price thresholds. We present an exploration into strategies to represent unstructured text descriptions of houses as inputs for machine learning models. This includes a comparison of term frequency-inverse document frequency (TF-IDF), bag-of-words (BoW), and zero-shot inference with large language models. We find the best predictive performance with TF-IDF representations of house descriptions. Readers will gain an understanding of how to use machine learning models optimized with structured and unstructured text data to predict house prices.

Cite

CITATION STYLE

APA

Cardenas, E., Shorten, C., Khoshgoftaar, T. M., & Furht, B. (2022). A Comparison of House Price Classification with Structured and Unstructured Text Data. In Proceedings of the International Florida Artificial Intelligence Research Society Conference, FLAIRS (Vol. 35). Florida Online Journals, University of Florida. https://doi.org/10.32473/flairs.v35i.130668

A Comparison of House Price Classification with Structured and Unstructured Text Data

Abstract

Cite

Register to see more suggestions