What is an interesting segmentation?

Let’s consider a situation when you have a new data set. You bought it, got it from some external source of yours, etc. The important fact is that this is a new dataset, and you are not familiar with it. In fact, you barely understand the semantics of the columns.
From where would you begin? Yep, you count the lines, do basic selectivity checks for each column, count the nulls. Basically, you began data profiling.
We want to put your profiling process on steroids. We suggest to select a few interesting segmentations and study them in-depth, in a bid to find interesting and informative one.
OK, what is an interesting segmentation? Let us formulate what segmentation is and look at few examples. Than we will try to deduce from it what does “interesting segmentation” mean. So let’s get started.
Segmentation is a way to split your data set into several smaller ones. For example, you can split traffic by country origins. It is an important step within “divide and conquer” approach. Actually, we are dividing here. We will get data sets for each country and will get some country specific insights. Hence we will find unusual countries, we will find similar countries etc. And this is perfect – it is exactly what we need.
Therefore, good segmentation is the one that produces mild differentiation.
We neither want every segment to be similar nor to be different.
Both of these cases are not informative. Our desire is to have several groups of segments: a couple of outliers and a mass of “regular”.
Nestlogic ADP is aimed to visualize the difference between the segments in various perspectives. We believe that this is an excellent way for understanding if the given segmentation is an interesting one or we need to try another one.

