Why I Don’t Use Simple Families When Counting Patents Either

Update: Since publishing this post it has been brought to my attention that PatSeer, and Relecura provide family information that was not covered originally. Please see the comments for details on the offerings from these organizations.

Thinking back over the course of the past nine months an argument can be made that the single most important topic in patent analytics is what method should be used for conducting a patent family reduction. Previous posts have looked at the issues associated with using extended families, provided some background on vendor specific patent families, and proposed a hybrid, domestic and extended family method, referred to as One Document Per Invention (ODPI). The method chosen for performing a patent family reduction has vast, and far-reaching impacts on the statistics generated during an analysis project, so all possible options associated with this decision should be explored fully. This post will look at the implications of using simple families as a means of conducting a reduction.

Recently, during the WIPO Regional Workshop on Patent Analytics, it was suggested that the ODPI method was essentially the same as using simple families for patent family reductions. By way of definition, the European Patent Office defines a simple patent family as:

All documents having exactly the same priority or combination of priorities belong to one patent family.

 

A representative example is provided:

A Representative Example of an EPO Simple Family

A Representative Example of an EPO Simple Family

So, in this case Documents D2 and D3 are together in the same simple family, but otherwise the remaining documents reside in other simple families. As discussed previously, all the documents would be encompassed by a single extended family, such as an INPADOC family.

For a function such as “also published as” the use of simple families works very well for identifying equivalents, and it can be calculated fairly easily behind the scenes, assuming a database has all the relevant data, and it has been standardized. Unfortunately, when conducting patent analytics from a practitioners point of view, simple families can be difficult to calculate, or determine for a variety of reasons.

One of the biggest impediments to working with simple families is the apparent lack of a unique key to identify them. When working with INPADOC families, most database providers have a field for the INPADOC Patent Family Number. When sorting a spreadsheet full of patents this key can be used to quickly organize a worldwide collection by extended family. The European Patent Office, and Espacenet are the primary users of simple families, but when exporting data from this system there does not appear to be a unique identifier associated with this property. It appears that the only way to identify the discrete simple families within an extended family is by looking at the individual members one by one, or by using the Common Citation Document database (CCD) that will provide the number, and members of individual simple families within an extended family if the user inputs a single patent number. Theoretically, information about the simple families can be copy out of CCD and pasted into a table, but this can be quite cumbersome, and time consuming.

Simple families are also difficult to work with, from a practical perspective, since most commercial databases don’t support the use of them. In Thomson Innovation, for instance, patent document collections can be collapsed by INPADOC and DWPI Families, or by application number. The ability to collapse by simple family is not supported. Some might think that collapsing by application number might address this, but this function only works with domestic application numbers that match exactly, so it doesn’t collapse the equivalents from different countries.

So again, from a practical perspective, simple patent families seem difficult to work with from within commercial tools, or when data is exported into a spreadsheet, and the analyst attempts to identify and sort on them. Beyond the practical ability to reasonably conduct a reduction using simple families, when working with a large collection of patent documents, they can also be problematic based on how they organize the content. Using the Aliphcom Up portfolio, as introduced in the post on extended families, these issues can be exemplified.

When this portfolio was last studied, in April 2013, there were 80 worldwide patent documents associated with it. Since then, as of September 27th 2013, there are 97 worldwide documents within the portfolio, all of which are covered in a single extended INPADOC family. Looking at the major patent families associated with this collection the following is found:

INPADOC Families – 1

DWPI Families – 36

FamPat Families – 29

Simple Families – 35

One Document per Invention – 27

Before going any further with the analysis it is important to point out that while it looks as if the DWPI families are aligning with the simple families, and the FamPat families are aligning with the ODPI results, this is simply a coincidence. When the families are looked at individually, between the systems, there is not much overlap between which documents are in which families from case to case. The table below illustrates this with one of the simple families that actually has more overlap between the members, using the other systems, than most do:

 

Simple Family #2 Publication Numbers DWPI Family Found Within FamPat Family Found Within ODPI Family Found Within
US2012315382 Family 1 Family 28 Family 1
US2012315379 Family 2 Family 29 Family 2
US8529811 Family 2 Family 29 Family 2
CA2810735 Family 3 Family 29 N/A
CA2810714 Family 4 Family 29 N/A
CA2814743 Family 5 Family 29 N/A
AU2012266891 Family 4 Family 29 N/A
AU2012266890 Family 3 Family 29 N/A
AU2012268411 Family 5 Family 29 N/A
WO2012170108 Family 4 Family 29 N/A
WO2012170107 Family 3 Family 29 N/A
WO2012170362 Family 5 Family 29 N/A

 

To summarize, looking at a single simple family covering some of the protective overmolding documents found in the Aliphcom portfolio, we find the same documents distributed over five DWPI families, and two families each for FamPat and ODPI. In the case of these twelve documents, but not all the documents within the extended family, the FamPat One Patent Per Family (ODPF) result would be the same as the ODPI one.

In the ODPI case, there is a N/A next to the documents from the other countries, since using this reduction method all of the granted patents and pending applications from the primary country are included, and documents from additional countries would only be included if they were part of an additional extended family without a member from the primary country. Since all of these documents are part of the same extended family the domestic family reduction provides the same result as the ODPI set. An immediate concern with the ODPI method might be that there are three WO documents in the collection while there are only two US documents. In this case though a look in US Public PAIR shows that the three WO documents all claim priority to four US applications. The ‘811 grant and the ‘382 app are accounted for in simple family number 2, but priority is also being claimed to US2012313296 and US2012313272, which appear in other simple families. ‘296 and ‘272 are pending applications from the primary country so they will show up as additional members within the ODPI reduction. For the sake of completeness, ‘296 and ‘272 showed up with the ‘382 application in FamPat Family number 28 so in a ODPF analysis using FamPat they would have been thrown out of the final count.

Another way of looking at this would be to examine all the worldwide documents within this extended family on a particular invention, and see how many documents would still be represented after reduction. the table below provides this analysis for the 19 documents in the Aliphcom portfolio on coating or overmolding, associated with the Up product:

 

Worldwide Overmolding Document Number Simple Family Found Within DWPI Family Found Within FamPat Family Found Within ODPI Family Found Within
AU2012266890A1 Family 2 Family 3 Family 29 N/A
AU2012266891A1 Family 2 Family 4 Family 29 N/A
AU2012267464A1 Family 5 Family 8 Family 28 N/A
AU2012268411A1 Family 2 Family 5 Family 29 N/A
CA2810714A1 Family 2 Family 4 Family 29 N/A
CA2810717A1 Family 5 Family 8 Family 28 N/A
CA2810735A1 Family 2 Family 3 Family 29 N/A
CA2814743A1 Family 2 Family 5 Family 29 N/A
CN203004181 Family 5 N/A Family 3 N/A
CN203004205 Family 5 N/A Family 3 N/A
US20120313272A1 Family 22 Family 6 Family 28 Family 3
US20120313296A1 Family 5 Family 7 Family 28 Family 4
US20120315379A1 Family 2 Family 2 Family 29 Family 2
US20120315382A1 Family 2 Family 1 Family 28 Family 1
US8529811B2 Family 2 Family 2 Family 29 Family 2
WO2012170107A1 Family 2 Family 3 Family 29 N/A
WO2012170108A1 Family 2 Family 4 Family 29 N/A
WO2012170362A1 Family 2 Family 5 Family 29 N/A
WO2012171037A1 Family 5 Family 8 Family 28 N/A

 

So, to summarize the following was found when looking at just the coating/overmolding documents:

INPADOC Families – 1

DWPI Families – 8, four US and four WOs (assuming WO and not AU or CA are primary)

FamPat Families – 3, but only two from the US, the third would be one of the CN docs

Simple Families – 3, all US

One Document per Invention – 4, all US

In the case of the INPADOC, simple and FamPat families the inventive output, based on priority documents present, would have been under represented. The DWPI family would have over represented the output, and only the domestic method, as incorporated by ODPI, would have produced an accurate inventive output, for this one example.

Having said that, the purpose of this exercise is not to say that one family reduction method is better than another, but it can be clearly seen that in this example each one of these methods would have given a separate collection of patents, and in almost all cases a different statistical value. In this case, using all 97 worldwide documents would not be appropriate, and neither would using an extended family approach, where all of them would be represented by a single document. Absolutely, using any one of the other methods will produce a more accurate result. Pre-processing patent collections for statistical analysis always requires a family reduction step, but it is critical that analysts consider the type of method they will use, and the impact this decision will have on the values generated downstream.

Personally, I still prefer the ODPI method where the primary country is either, the US, or is determined by looking at which priority country is most frequently seen within the collection being analyzed. When the primary country is the US I wondered if there might not be an issue with WO documents where the US was the priority country but the WO might be the first filing. Under these circumstances a US publication might not appear until much later, and the PCT application might be the only published document on the pending invention. The posts on the relationship between WO and US filings, and the last one on WO representation in INPADOC families were written to generate evidence on whether modifications to the ODPI methodology were needed based on this.

Referring to the last two posts we can likely conclude the following:

When the US is not the priority country WO documents will not over or under represent inventive output using an extended family reduction method or one of the narrower definitions, including simple, DWPI and, FamPat families. The ODPI method will also not be impacted.

When the US is the priority country, ~5-7% of the time the WO document will be the only publication when a PCT application is filed, but a National Stage application is never filed in the US.

When the US is the priority country, 20% of the time there will be multiple WO documents found in the same extended family. These can also be situations where additional inventive output is not captured if WO documents are not considered beyond a domestic family reduction.

At most 25% of the potential cases might be under representing the inventive output if WO documents are not considered when the US is the primary country used in a ODPI family reduction. The actual percentage is likely much smaller, but based on these observations it is worth taking a few moments to look at the major applicants in a collection, and check to see if they use PCT applications as the first filing. Experience has shown that in the majority of cases the US publications will still represent an accurate measure of inventive output, but this should be confirmed before generating statistics.

This post has looked at the use of simple patent families when performing a family reduction pre-processing step. While simple patent families usually give similar statistical values to what is seen with the ODPI method, and with DWPI and FamPat families, the use of simple families is not recommended primarily because they are not supported on major commercial platforms, and it is difficult to generate them with data that can be easily exported from most patent search systems. Based on input from the previous two posts the ramifications of the relationship between US and WO documents  and the impact they have on the ODPI method was also explored. While the impact is not likely to be significant, in most cases, it is still prudent to look at major applicants within a collection to ensure they are not using PCT applications as the first filing, and waiting till the last moment to file a National Stage application in the US.

Comments 5

  1. Very well explained.

    I guess RELECURA has the ability to collapse patent documents based on simple family across jurisdictions

    1. Hello Shanmukha,

      Thank you for the comment. Relecura does indeed allow users to collapse by equivalents but with the 19 coating/overmolding patents it generated 8 of them. In this case it does not appear that the definition of equivalents on Relecura is quite the same as the simple families on Espacenet since there are three of them associated with these documents.

      Could you, or someone from Relecura help with the formal definition of equivalents on that system?

      Thanks,
      Tony

  2. Tony,

    Excellent analysis as always. ODPI hybrid model sounds interesting but

    I think flexibility is key I guess. Since you mention limitations of working with Simple Families, I would like to add that in PatSeer one can collapse by 3 types: Application Number, Simple Family and INPADOC Family. I ran up the above set of coating/overmolding records on PatSeer database and came up with 3 Simple Families and 1 Extended Family. Finally to aid researchers in offline exports (and I guess other databases also perhaps do this) the unique family id for _either_ Family definition can be exported along with each individual publication and this solves some of the challenges of Simple Families mentioned.

    Best,
    Manish

    1. Hello Manish,

      Excellent, it is good to hear that PatSeer is supplying the same simple families as EPO, and even better to see that there is a simple family ID, along with an extended family ID associated with each record.

      It has been my experience that most database don’t provide family IDs, and it would be really useful if they started to do so. It is good to see that PatSeer is supplying this data for users to work with. I am impressed with the number of fields that can be exported from PatSeer overall.

      Thanks,
      Tony

Leave a Reply

Your email address will not be published. Required fields are marked *