As presented in the previous post, we have at our disposal a huge dataset containing the food purchases made in the Tesco shops within the boundaries of London. Faced with all this data, we are a little lost and ask ourselves where to begin. And suddenly we remember that we learned, during our Padawan formation, how to become a data vizard. We take out our most beautiful tools and begin to make some visualizations to understand better our data. We have many features available for the purchases . We notice that the Tesco paper did not deeply explore the different food categories. Therefore, we decide to focus on these ones.

What we want to visualize

We wonder if the Tesco consumers buy the same products and in same quantity all over the city. In particular, we wonder :

  • Are there some food categories that are preferred in certain regions ?
  • Are there some others that are equally bought in the city ?
  • Can we draw a pattern from this visualization ? In order to try to answer all these questions, we make plots of the distribution of a certain food category all over London. We consider a mean of the fraction of purchase of each food category in a given ward.

Visualizations

Each plot below represents the mean proportion of purchase of a certain food category per ward. To see all the plots, you can click on this link. On top of that, by moving your mouse all over the map you can discover the exact mean proportion of purchase of a certain food category, the mean income and the median income in each ward.

Bokeh Plot

Observations

We see that the distributions of proportion of purchases of food category are pretty different. Some food categories shows uniform distributions whereas some others show very disparate distributions. For instance, the comsuption of beer, soft drinks and tea/coffee seem to be the same all over the city. In contrary, the consumption of meat is 6 times higher in the northern and centered wards of London compared to some southern ward. We observe almost the same thing for the poultry. The distribution of ready made, sweets and grains seem to be more important in the southern, eastern and western edge of the city. It is particularly true for ready made where the consumption of this category is almost 50 time higher between some edge parts and some city centered parts.

What are the causes of these observations ?

Many factors can influence these observations. First, let’s not forget that the data we use comes from the Tesco shops only . Indeed, people coming to Tesco does not represent all the inhabitants of a ward and the Tesco paper has shown that the representativeness of this data is to some extent limited. Therefore, we should not exclude the hypothesis that if we had the data of all food retail markets, we would observe different results. However, some patterns seem to appear clearly in our plots. For instance, we can’t help but notice that the distribution of meat (read meat and poultry) is higher in the northern and centered parts of London. Meat is a pretty expensive product and we wonder if there can be a link between the income of people and their consumption of meat (at least at Tesco shops). It would not be so surprising to observe that in the wards where people are wealthier, they buy more meat. In addition, it is possible that the observation about the important readymade comsuption in some wards is also linked with the income of the people. Middle-class people generally have healthier diets than lower-class people and would then consume less readymade products.

A new hope

Intrigued by these observations, we will try to see in another post if there exist a correlation between the income of people and the proportion of purchases of some food category in the Tesco shops.