Since 1987, the University of Buenos Aires has gathered professors and researchers from all over the world for the annual School of Computer Science event. This year, by building a recommendation system, Hexacta won one of the machine learning competitions.
Machine learning, even without noticing it, is part of our lives. From spam filtering in Gmail, to Siri browser, or a simple search on Google, machine learning is everywhere. Following this trend in the industry, the courses at the School of Computer Science (ECI for its acronym in Spanish) are shifting more and more towards data science and machine learning topics every year.
At this year’s 30th edition, the shift toward data science can be seen in some of the course topics: Big Data Systems, Bayesian Models in Machine Learning, Machine Learning for analyzing Neuroimaging Data from Natural Stimulus Experiments, and Automatic Behavior Composition of Behaviors. Hexacta took part in this and won one of the contests!
To highlight this trend even more, ECI organized two machine learning competitions using Kaggle, the de facto platform for running data science competitions. The outcome was remarkable: the competitions attracted the attention of many researchers, companies, students, and hobbyists which competed in 50 different teams.
A team from Hexacta participated in one of these competitions called “Property recommendations in an online search system” where the challenge was to build a recommendation engine for a real estate search website.
Navent provided two months of user behavior data together with information about each property on the website. Based on these data, participants had to recommend a list of properties most likely to be contacted by each user. To define the winner, the recommendations were compared with the properties that the users actually contacted the next month.
In this era of personalization, recommendation engines are a subject of particular interest. They boost sales and customer engagement in all kinds of industries. For example, the consulting firm McKinsey reveals in its article How retailers can keep up with consumers that 35% of Amazon purchases and 75% of what people watch on Netflix come from recommendations. It is not surprising that Netflix offered a million-dollar prize to the first team that was able to improve its recommendations by 10%.
Recommendation engines come in two types: content-based filtering and collaborative filtering. Content-based filtering uses only information about the items: If you like a two-bedroom house, it will recommend you other options for two-bedroom houses.
For its part, collaborative filtering focuses only on the interaction between users and items. When several users like the same two items, collaborative filtering identify those items as similar, and when you like one of them, it will recommend you the other.
Our solution was a case of collaborative filtering. We focused only on the user’s behavior: what properties a user had visited or contacted. We ignored all the information about the properties such as the number of rooms, location, area, price, and so on.
Getting a bit more technical, the information can be represented as a bipartite network where the nodes are users and properties. Edges between nodes exist when the user interacted with the property.
This network can then be projected to another network containing only properties, and the weighted links between properties represent the similarity between them.
The crucial part of collaborative filtering is how to define the weights for each edge. What worked best for us was the network-based inference method. We also used a neural network classifier to predict when a property will be removed from the website (as the information was not provided, we needed to predict it).
Combining both models, we submitted a solution that reached first place on the leaderboard and won the competition.
Machine learning at Hexacta
This project is part of an effort from Hexacta to keep up with the growing data science ecosystem in order to provide better tools and solutions for our clients. We have a team of engineers working not only with recommendation systems, but also on automation of tasks like fraud detection, natural language processing, time series forecasting, sales prices prediction, and many more!
Read the HAT Blog.
Comments? Contact us for further information about how we can help your business. We’ll quickly get back to you.