# Sub-groups

The standard client survey gives you data about the client population. Within your client population are various sub-groups. For example, there are clients under 25 year old, clients taking up family planning services and so on. You may be particularly interested in some of these sub-groups and want to get information about that group specifically. For example, if you have a strategy to reach poor clients with family planning services specifically, you may like to look at the quintiles among family planning clients specifically as well as the broader client population.

Getting data for sub-groups complicates things and will end up costing more, but it is possible. We would advise against it unless you have a strong understanding of sampling.

if you decide to look at a sub-group, the key is you will need to have a certain number of respondents from that sub-group to achieve the precision you need for the sub-group analysis. To determine how many respondents from the sub-group you will need, you can use the same method as for the overall sample size calculation. The problem is that if you increase the number of respondents for just one group, then this distorts your sample because that sub-group is over-represented. There are three methods to get around this:

**Increase total sample**

Here you simply increase the total sample to make sure you have enough from the sub-group you are interested in so that you can analyse them seperately. This is the least complicated approach, but may be the most expensive because it could make the overall sample size quite large.

Example: you are interested in a sub-group which represents 10% of clients. You need at least 100 respondents from this group because you want to analyse them seperately, and 100 gives you the minimum precision you need. So you could increase your overall sample to 1000 respondents. This way, you can expect around 100 respondents (10% of 1000) to be from the sub-group.

**Over-sample the sub-group**

With this approach you would add respondents from your sub-group in addition to your overall sample.

Example: the sample you need for the overall client population is 300 people, but you are also interested in a sub-group. You need 100 respondents from that sub-group. The sub-group is 10% of your clients, so just 30 respondents in your sample will be from the sub-group. To get to the 100 respondents that you require, you would interview an additional 70 people from the sub-group. Your total sample is now 370 people. The problem here is when you analyse the overall client population – there are too many respondents from the sub-group because you added the 70. You can either:

**Exclude the additional respondentsâ€™ data when analysing the client population**

One approach is to only use the extra respondents’ data when analysing your sub-group. In the example above, when analysing data about the sub-group, you would use the 100 respondents from that sub-group, but when analysing the main sample to get information about the overall client population you would exclude the 70 that you added from the sub-group (so you would just use the original 300 respondents).

You might think of this as two different samples – one with 300 respondents representing the overall client population and another with 100 respondents representing the sub-group. You will need a field in the questionnaire which records which respondents were added, so that you can exclude or include them as needed during the analysis.

**Weight the dataset**

This is the most complicated approach, but also the most efficient. You would need to calculate a weight to apply to each respondent in the sample, which will depend on whether they are in the sub-group or not. One easy way to do this is as follows:

- Create a variable in your dataset called ‘weight’ which is set to 1.
- Figure out how many people from the sub-group would have been in the sub-group without over-sampling and the number with the over-sampling. In the above example, there would have been 30 respondents in the sample without over-sampling, but with the over-sampling there were 100 respondents from the sub-group.
- Divide the smaller number by the larger one. In this example, it would 30/100, which is 0.3.
- For the respondents from the sub-group, replace the ‘1’ in the variable ‘weight’ with this number, in this example 0.3.
- Now tell the analysis software to use the ‘weight’ variable to weight the dataset.
- Now you can analyse the dataset as usual.

Note that if you have a good understanding of weighting, then you can apply your knowledge and calculate it as you normally would.