Book a demo

Case Study: Sales Forecasting with Geo AI

Jan 20 2023 Published by under Blog

Geo AI to support retail location planning

When expanding a business, especially opening a new store in a new region, the biggest questions to be answered is whether the store can be expected to perform well and how much revenue it could potentially generate. Revenue prediction is very important in supporting site selection and decision making for ensuring a successful expansion, and there are several approaches that aim to deliver accurate forecasts.

Two machine learning algorithms that are commonly used to predict revenue are Random Forest and XGBoost. In this revenue prediction case study we compare the results from both these popular models with the results generated by Targomo’s Geo AI model. Geo AI is a location-based model, aiming for a better understanding and evaluation of the characteristics of the shop location.

The basis for this case study is the publicly available data for liquor sales data in the US-state Iowa. For the prediction, we selected the brand Hy-Vee and treated liquor sales of other brands as competitor data.

Who is Hy-Vee? Hy-Vee is an employee-owned chain of supermarkets, mostly located in the Midwestern and Southern US. Its headquarters is in West Des Moines, Iowa. Hy-Vee has over 285 locations in 8 states in the US, including among others Iowa, Illinois, Missouri. As all data of liquor sales in Iowa, Hy-Vee’s data was publicly available.

Three Methods compared: Random Forest, XGBoost, and Geo AI

The three models Random Forest, XGBoost, and Targomo’s Geo AI were used for the revenue prediction. The Random Forest and XGBoost models are often applied in prediction as their advantage is their efficiency in dealing with high dimensional data. In addition, both random forest and XGBoost are decision-tree-based models and easy to work with – there’s no need to normalize data and Python/R packages are readily available to use. It can create a relative robust model even with outliers and missing values. The drawbacks, however, are also obvious: Potential over fitting issues and low interpretability.

Geo AI is a locational-based method. The foundations of the model were developed as part of a research project conducted by Targomo in collaboration with the Hasso-Plattner-Institut and funded by the Deutsches Zentrum für Luft- und Raumfahrt.

Geo AI, gravitational model and attraction strength

One major component of Geo AI is a gravitational model. It follows the idea of Newton’s law of gravitation applying the concept to market models. According to Newton’s law of gravitation, high masses (attractiveness in market models) and short distance lead to strong attraction. The attractiveness is measured by the store’s characteristics and environment. These factors could be characteristics like store size, parking space, product range, or location factors like complementary shops, competitors or other points of interest close by.

In our model, these factors are called attraction strength. The distance, in Geo AI, is calculated based on travel time with catchment areas defined by travel time and travel mode including car, bike, public transit, and walking. A travel-time-based catchment area provides a more accurate and realistic assessment of population than more naïve distance-based catchments.

CASE STUDY ‘Geo AI for retail sales forecasting’ – Summary

 

Step 1: Data collection

To begin, Targomo’s data team first needed to compile data from various sources to calculate the general demand and attractiveness of stores:

  • Publicly available data on liquor purchases: The data covers shops with Iowa Class “E” licences, such as grocery shops, liquor shops and convenience stores. The dataset records store-level liquor purchases from 2012 to present and includes the store name, store id, store address and coordinates, date of order, the item ordered, category of the item, and sales dollar. The sales dollar is calculated by multiplying number of bottles and the state retail price, and this is what is predicted in the case study.
  • Sociodemographic data: Age group, income, occupation, family structure, methods of transportation to work, and more.
  • Mobility data: Movement data in terms of foot traffic indicates how many people are at a location over a certain period of time.
  • Store attractiveness data: If customers perceive stores within a catchment area to be differently attractive, this can impact which store they choose for shopping. To account for attractiveness in the prediction model, we included direct customer feedback in the way of Google reviews. We considered the average Google ratings and the number of ratings. In addition, the opening hours (viewed over a year as the number of hours) were also considered.
  • Competitor Data: As the publicly available data covered all liquor sales, stores other than Hy-Vee are considered as its competitors.
  • Which data was missing? Store size and sales area sizes usually has a significant influence on the revenue of a store and should always be part of the prediction. In this case, the data was not available and could not have been taken into account.

 

Step 2: Data cleaning – The foundation to build a meaningful model

Data cleaning is one of the first and most important steps in modelling as cleaned data is the foundation of a correct model and its output. The process of data cleaning involves detecting inaccurate, incomplete, and duplicated records, and then fixing or removing them.

  • Inaccurate, incomplete, and duplicated records detection: Many records in the data lacked addresses and coordinates (Figure a, highlighted in orange boxes), so the data science team added the missing data with known entries or from geocoding results. Additionally, it happened that the same stores appeared twice with a different store id (Figure b, Store.Number 3648 and 5875). To identify these duplicates, Targomo’s data science team calculated a distance matrix between all shops. If the distance between two shops was less than 50m, a cluster was formed. Then all the shops in the cluster were manually checked in Google to identify whether they are the same shop or not. If the shops were the same, ‘Store.Number’ was updated to be the same as well.
  • Outlier identification and handling missing value: After the inaccurate records were corrected, sales dollar per month in 2019 were summed up for each store, the number of unique liquor categories and liquor item bought were also counted. If a store did not report sales data for each month, it was removed from the training data set. However, it was used in the analysis like a competitor – meaning a store that potentially attracts customers, but whose sales are not known.

 

Furthermore, stores with extreme high or low annual sales have also been carefully reviewed, noting that some were beverage and liquor retailers. Since these stores cater to different customer groups – namely other stores – these outliers were removed from the data set. Casinos, hotels, inns, and distilleries were also removed.

Examples of incomplete and doublicated records of data

 

Step 3: Train the Predictive Model with Data

In countless iterations, the Geo AI engine is fed with data to continuously learn which data composition represents Hy-Vee’s store performance.

The training consists of two main components:

  • Key Driver Analysis: We detect the attraction strength factors. These are the location variables that make a location attractive for for the target group and draw customers to the site. In a continuous testing and learning process, the model is built.
  • Validating & Fine Tuning: In countless iterations, we validate the model against additional store data and fine-tune the prediction model.

 

Training error vs. Testing error

There are two relevant values measured during the learning process that indicate the prediction quality: The training error and the testing error.

  • The training error measures the ability of a given model to accurately predict the results of the same data on which the model is trained. This helps us further train and tweak the model in the direction we want.
  • The testing error is measured using predictions for a data-set, or several sets, the model has not seen before. This can help us see how well the model performs outside the training environment and is useful in preventing issues such as over-fitting.

 

When it comes to prediction results, the testing error can be considered to be more meaningful because it gives us an idea of how well the model will perform on new, unseen data – like the address of a potential new location for a shop opening. A low training error is a precondition for, but alone not a sufficient indicator of a successful project.

In our revenue prediction projects we usually go through two phases during the model training:

  1. In the first phase we want to get the training error down to prove that the model is expressive/complex enough to make predictions for the specific use case and that we have all required input. Low training error is a pre-condition but not sufficient alone for a successful project.
  2. In the second phase we want to generalise the model, so that it works for new, unknown locations. So we want to bring the testing error down, often by reducing the model complexity again.

Results: These are the location variables driving liquor sales

The table compares the model accuracy and detected success drivers of three liquor sales prediction models for Hy-Vee stores. The training error of Geo AI model is 16% and the testing error is 19.8%, while the testing errors of Random Forest model and XGBoost model are 38.1% and 30% respectively. Compared with the two commonly used models, Geo AI yields the highest model accuracy.

Which ‘success drivers’ were identified? The table compares three Hy-Vee liquor sales prediction models

The identified attributes in the Random Forest and XGBoost models are rather limited and are restricted to the number of items, the number of visitors, and one or two socio-demographic attributes.

In contrast, the Geo AI model consists of a wide range of features, including among others store features (item count, shop type), surrounding environment (footfall at the location), and socio-demographic characteristics of potential customers. Item count, indicating the variety of liquors offered in store, shop type, and shopping footfall in the close neighbourhood are used to measure store attractivity in the gravitational model.

The scatter plot of the Geo AI output: The x-axis is the actual liquor sales, and the y-axis is the predicted liquor sales. The dots are nicely distributed along the diagonal line, which indicates a good prediction performance of Targomo’s Geo AI.

Scatter plot of the Geo AI prediction model results: Dots closely distributed along the diagonal line are an indication for high model precision.

Visualisation of the Geo AI Success Driver Profile

Income is a major driver of the liquor sales in Hy-Vee shops. 25.8% of the sales are from households with income less than 40,000$, and 14.5% are from households with incomes greater than 150,000$. Many studies have found that low-income individuals are at higher risk of engaging in heavy and hazardous drinking, and higher income is associated with a higher frequency of light drinking.

Which location variables drive liquor sales for the supermarket brand Hy-Vee? The success driver profile gives detailed answers

In addition to income, certain occupations have also a positive influence on liquor sales in Hy-Vee stores in Iowa. The results show that people working in construction, wholesale trade, and art industries as well as those who are self-employed are strong drivers. According to Statista, the top 5 high alcohol consumption (15 or more alcoholic drinks per week) occupations in the US 2016 are construction or mining, installation or repair, farming, fishing, or forestry, manufacturing or production, and business owner.

Among them, construction is one of the identified drivers in the model. The other identified occupations do also correlate strongly with the groups identified by our Geo AI, e.g. installation or repair, farming, fishing/forestry, manufacturing/production are often self-employed, as business owners are as well. So we can see that some drivers from Geo AI align with empirical results from research studies.

Dynamic Analytics of Hy-Vee’s Geo AI model:

Integrated in the platform TargomoLOOP, the Geo AI model delivers immediate prediction result for any potential address.

Insights

Insight #1 Good estimate of attraction strength is key 

Attraction strength indicates how attractive our stores are and how attractive/competitive our competitors are. A good estimate of attraction strength can better estimate the market size and potential customers, and thus predict the revenue more accurately. In estimating the attraction strength, information about the store is key, for example, store size, sales area, store opening hours. In this case study, due to the limitation of public data, the information we have is not enough to capture the whole picture of the store attractiveness, such as the missing information about the store size. To circumvent this limitation, the building roof top data is used instead, but it is not possible to obtain this information for all stores.

Insight #2 An attractive shop location is defined by its surrounding business environment AND access to its target group

When selecting a new location to open a store, investigating the surrounding businesses is critical. What products/services do they offer? What is their vibe? Are they competitive or complementary? However, the geographic distribution of target demographics can be missed in the decision-making process. It does not matter how well suited a location is to a store if the target customer group are unable or unwilling to travel to it.

Using Random Forest and XGBoost models, it is hard to control what variables are used in the decision tree. In this case study, these two models mostly focused on the nearby environment while giving only minor consideration to the access to target demographics. The Geo AI model both put a greater emphasis on demographics, and took into account a wider array of demographic and environmental factors. We have abundant data to measure various aspects of demographic structure to support the model and future decision making.

Insight #3 Expert knowledge could be a double-edged sword in key driver analysis. 

When selecting variables in the driver analysis, expert knowledge can provide insights on what key drivers can be expected based on experience or existing literature. At the same time, relying too much on expert knowledge and excluding data at the beginning that we thought is unrelated can fall into a bias trap.

Instead, by letting the model and data speak, we can gain new or unseen insights. We may find some variable we thought could be important have little to no significance, and vice versa. What we should then do is to investigate it, figure out why and explain it. Our assumptions should not interfere with the model.

The feature selection mechanism in Geo AI ensures the model suffers less from overfitting even with large number of features as the input.

Conclusion

This case study tests and compares the performance of three models, Random Forest, XGBoost, and Geo AI, as revenue prediction methods, using liquor sales data from Iowa, USA. Geo AI has the best performance among all 3 models in predicting liquor sales for Hy-Vee, having the least overfitting, the lowest testing error, and the important features in the model are more comprehensive.

Compared with Random Forest and XGBoost models, Geo AI takes greater consideration of shop feature factor variables, shop surrounding environment, local demographic structure, and mobility data making the model more reliable. In addition, Targomo’s Geo AI, a locational-based model, offers better understanding and evaluation of the characteristics of the shop location. It is possible to track the model and retrieve the prediction of each store to identify key revenue drivers. Due to its rate of overfitting and testing error, Geo AI could better predict the revenue of a new location and analyse revenue drivers.

Liquor sales prediction is one example of a Geo AI model. Geo AI can be applied in various industries, including among others retail, delivery services, food industry, and health care. Are you interested to learn more about Targomo’s Geo AI forecasting for retailers and restaurant brands, and see dynamic analytics of the Hy-Vee Geo AI model? Book a demo here

Authors: Yue Luo (Spatial Data Scientist), Gideon Cohen (Software Engineer), Luisa Sieveking (Marketing & Communications)

The Geo AI model was developed by Targomo’s Data Science Team that is dedicated to evaluate which location factors influence the business success of brands and companies, and to what extend they contribute to revenue.

Comments Off on Case Study: Sales Forecasting with Geo AI

Targomo unveils Geo AI for retail sales forecasting

Sep 08 2022 Published by under Blog

Site decisions by brick-and-mortar stores and branch networks are critical to success. Targomo now helps these companies improve their location decisions by accurately predicting their sales figures thanks to Geo AI.   

More than 80% of the success of individual retail stores depends on their location. Companies can now predict this success and forecast relevant sales metrics such as store revenue or guest count thanks to Geo AI by Targomo. The new solution is integrated with the TargomoLOOP location analytics platform and provides forecasts for every potential location in the sales area. 

The Geo AI prediction is based on a bespoke prediction model built by Targomo’s data science team. It combines machine learning and geo algorithms with neighbourhood information and a company’s location data to develop and train a spatial predictive model. The result are reliable sales forecasts for any location and insights about success drivers that let the company understand what makes a location good for them. 

Geo AI provides heatmaps and forecasts of turnover or number of customers for any location

Forecast accuracy of up to 80-90 %

“The success driver analysis is at the core of our Geo AI offering,” explains Henning Hollburg, Founder and Managing Director of Targomo. “We find out which environmental and competitive factors are critical to a brand’s success, and to what extend. With our analysis, a brick-and-mortar business finally knows how much of their sales are based on each location factor. This allows for reliable forecasting of sales metrics, where we usually achieve an accuracy of up to 80-90%.” (Read full interview with Henning)

In addition to the sales forecasting and success driver analysis, heatmaps make up the third pillar of the comprehensive Geo AI offering. Heatmaps visualize areas with untapped sales opportunities and let clients discover how much additional revenue their network could create by opening branches in any parts of the country. With these tools, companies can find easily geographic areas with highest potential to grow their business and their brand. 

Are you interested to learn more about the Geo AI services and integration? Get in touch

Comments Off on Targomo unveils Geo AI for retail sales forecasting

Predictive Analytics: The Science behind the Art

Jul 11 2022 Published by under Blog

In a previous blog post we explored the potential of predictive analytics to help retailers plan future locations. Now, we lift the hood to find out more about how predictive analysis works and how it can benefit your business. Our guide is David Redlich, team lead for API Services & Data teams at Targomo.

David, simply put, what is predictive analytics?

Imagine if you could predict the performance of future new branches with a high degree of accuracy based on hard facts and not gut feeling. This is what we call predictive analytics. We combine machine learning and geo algorithms as well as socio-demographic, network and performance data to create unique predictive models that help our customers answer questions like: what makes my business successful? How many guests will come to my restaurant? How much revenue can I expect from my outlet store? Where should I expand to next? All of this analysis can now be answered within a fraction of the usual time and effort and with total control and transparency for you.

Before predictive analytics, what did location analysis or planning look like?

Typically, analytics has focused more on sub-aspects and couldn’t handle such large amounts of data. Take consultancies, for example. The analyses are static and usually limited to a few locations. This makes it impossible to run scenarios that take into account the interactions of different locations or look into the future. As the name suggests, predictive analytics are forward-looking. Once the model is in place, customers can perform in-depth analyses on any number of locations without noticeably changing costs. With TargomoLOOP, we have a really nice tool that allows our customers to do their own analysis and get powerful future insights based on data. The platform is scalable as this model can be used worldwide (depending on data availability). We’re confident we’re best in class at predicting a location’s future performance. Plus, we’re continually improving our models and tools.

What kind of data do you use in predictive analysis and where does it come from?

In general, we use four sources: public, private, corporate and self-engineered data. 

Publicly-available data includes network data such as transit and traffic information, which we use to create mobility analysis patterns to see how people get to your potential new location. Also, socio-demographic and points-of-interest (POI) data are part of the public data.

Commercial data comprises additional fine-grained socio-demographic data, additional POI data, economic data like spending power or other relevant target group data such as Limbic Types. We obtain these from our reputable worldwide data partners.

An important part of the predictive model is, of course, our customers’ corporate data, meaning location-based data that provides insight into different store characteristics and performance. First and foremost are the sales figures. If we are to develop a predictive model for a specific KPI, we naturally need the data from the existing stores. Also, information such as different closing times, sales areas, etc are also important in order to examine their impact or to be able to explain deviations. 

Another relevant source that differentiates us from our competitors is successfully engineered domain-specific data. We provide foot traffic broken down by time of day and night, and on intentions such as shopping or restaurant visits. We offer car or home ownership ratios and custom POI sets that are particularly interesting brick-and-mortar businesses to see what complimentary businesses or services are around their sites.

What are some of the challenges in creating an accurate predictive analytics model?

At the beginning of a project, a considerable amount of time still goes into collecting, scraping and cleansing the data. This step is often underestimated but crucial in order to provide our customers with reliable results. The main challenge, however, is that AI and machine learning algorithms are usually not optimised for geo-spatial data (geo-information, POIs, network data and store locations). Getting those things to fit together is challenging. One of the things we did to solve that is the usage of established economical models – like the gravitational or logit models. In these models, all locations have an “attraction” with which you can calculate the probabilities of certain demographics coming to your place, taking into account other competing locations and their attraction. The complexity of the problem is incredible. Many people have tried to solve this problem with different levels of success. But, I believe, none have managed to in the generalised manner we have. Our predictive analytics works across domains. You can just plug in the data and find individual success drivers without any domain-specific or company-specific customisation. That is a pretty big challenge we overcame. 

How does TargomoLOOP address customers’ needs around predictive analytics?

One key influence for us is the user experience. For instance, we want to put TargomoLOOP entirely into the hands of our customers. We don’t want to act as the middleman doing the analysing for them. We want to give them a tool with which they can play through many different scenarios by themselves. It’s a fine balance – and we’re still learning how to communicate complex results but at the same time give our customers a lot of user control. We have always worked a lot with test users and collect feedback from existing customers in order to achieve a good user experience and to develop it further.

What is the role of the client in predictive analytics?

Customers play an important role before and during development. At the beginning, they provide us with the store and company data (e.g. KPIs) as well as environment-independent store facts. 

They then also have an active role during the evaluation. When we have the initial results on the success drivers, we run the results past them to see if it is comprehensible to them. It’s usually not a complete surprise what the success drivers are, but to what extent they are important. After all, we quantify exactly how important car and pedestrian traffic are on site and what role competitors play. After the analysis, they know how much of their sales are based on each factor. Sometimes we also find out that factors that were seen as important really have no impact on sales at all. 

Predictive Analytics is always team effort: David and team discussing the analytics results.

What are the skills needed to create predictive analytics models?

It has been a steep learning curve for us and we created a research project together with universities to draw in knowledge that we didn’t have. We also expanded our teams with very capable data scientists. Together with our great software developers we created a successful symbiosis to offer cutting edge Geo AI as an enterprise product. I wouldn’t limit the needed skill set to that of data scientists or software developers, though. Marketing, business development and customer success are very important as well. It’s crucial the concepts and results are communicated in an understandable manner. That’s a difficult task.

David, thank you for taking us through predictive analysis. While none of us possess a crystal ball, as we’ve seen with predictive analysis, it really can help predict future outcomes in order to successfully plan retail locations and minimize risks. And the best bit is, the tool is intuitive to use, allowing our clients to plug-and-play to find the scenarios that work best for them.

Are you interested to find out how Targomo’s powerful tools can empower you to make better business decisions? Get in touch!

Comments Off on Predictive Analytics: The Science behind the Art

We use cookies on our website. They provide us with web analytics, helping to give you the best possible experience on all pages. To learn more and see a full list of cookies we use, visit our Privacy Policy

Essential Cookies
Google Analytics
HotJar
We use cookies on our website. They provide us with web analytics, helping to give you the best possible experience on all pages. To learn more and see a full list of cookies we use, visit our Privacy Policy.
These cookies help us understand how you use the website — which pages are most visited, how long users stay, and how they navigate. All data is anonymous. Enabling these helps us improve the site.
These cookies track clicks, scrolling, and how you interact with content. They allow us to analyze usage through heatmaps and session recordings. You can disable these if you prefer not to share this type of data.