A lot of patent analysis or landscape projects I see organize their corpus using what I call One Document Per Family (ODPF). With this method a single representative member, usually the most recent document from a specific country, from an INPADOC patent family is used to represent a family of documents. This is done to remove a potential same invention covered in many countries bias that can inflate the importance of an idea or an assignee within a patent document collection.
In some types of analysis, especially those in which the relative density of documents from a particular organization or topic is not particularly relevant a ODPF approach to building a corpus works well and can dramatically simplify the task of putting together a patent data collection representing a large number of countries.
Unfortunately, in many studies I see, the relative number of documents are used as a means to compare two items. In these cases a ODPF approach can dramatically under represent the effort applied to an area and should be avoided.
In cases where document density matters or a comparison is going to be made I prefer to build my corpuses based on a concept I refer to as One Document Per Invention (ODPI). Using this method a primary country is selected and all potential inventions are identified based on eliminating redundant applications which have progressed into granted patents. This can usually be done by sorting the documents by shared application number.
Once the primary country has been processed for inventions the original corpus is reduced to ODPF by INPADOC family making sure that the most recent document from the primary country is kept as the representative document. Once this is accomplished the ODPF set is combined with the primary country all potential inventions set to produce a simple ODPI corpus.
Organized this way the set to be analyzed will have all of the inventions from the primary country present and only document which were not filed for in the primary country would still be available for analysis. This is especially important when the primary country is the United States or Japan where there is a higher likelihood of multiple inventions being present in the same family.
In its simplest implementation only one country is used for the reduction. To add additional rigor an analyst can also identify secondary and tertiary countries where as long as the primary country was not part of the family then all potential inventions from the secondary and tertiary countries are used instead of being represented by a single family member as they would be in the simplest form of ODPI. Obviously, you will find there is a point of diminishing returns when using more than one or two countries in this type of approach.
A ODPI approach is especially critical when the US is used as the primary country since Divisionals in particular are always separate inventions which get grouped into a single family. Stay tuned for an example of why in Europe, for instance, ODPF works fairly well and ODPI may be overkill.