Fourteen years ago, I was working at Procter and Gamble as a patent searcher. As a large, multi-national with several, nearly discrete, business units the company was being inundated by organizations that were offering to take large collections of technical documents and help P&G researchers gain insight from these materials. Patent documents were highlighted as a particularly rich source of content but many companies also featured tools that could analyze corporate documents and non-patent literature as well.
None of the tools worked together and they were almost exclusively deployed on individual desktops. Client-server tools were beginning to emerge and the idea of software-as-a-service (cloud computing’s father) was a year or two away. The area was confusing to almost everyone involved from the user’s perspective, since they had to know something about content and how it could be manipulated, they needed to know something about software deployment, since most corporate computers were locked down and installing third-party software was prohibited, and a course in computer science might be nice since the tools were using exotic calculation methods and algorithms that most people had never worked with.
The tools were also generally, quite expensive. The open source movement hadn’t started yet, and the people who controlled the budgets were almost always different from the people who would actually be making use of the tool. The marketers and sales people, as marketers and sales people do, pitched the managers who controlled the budgets and while the demos provided always performed beautifully and generated interesting results, the realities of using the tools in practice were sometimes less than satisfying.
This is not to say that anyone was being dishonest or that some were easily drawn to pretty pictures. Most providers did the best they could explaining a relatively new area to a collection of people with limited experience. So, intentions aside, it can be stated that the situation was chaotic and lacked structure.
I got involved at P&G since I expressed a desire to learn about text mining and the management of the group I worked for thought my interest might be well suited for leading an effort to explore the analytics area. They also thought it was important to start thinking about these tools on a corporate level, as opposed to a business unit level, where the sales were being made. This started me on the path of becoming Technical Intelligence Manager for the company and my eventual focus on patent analytics.
Since there were so many tools, so many individual methods and an almost endless set of combinations associated with patent analytics I thought that a framework for conducting analysis projects with all this variety would help practitioners with their work. I was inspired by the scientific method that starts with a hypothesis, follows-up with experiments and then analyzes the results to see if the experimental results support the original hypothesis. If the experimental results don’t support the hypothesis than a new one is generated and the method repeats itself until reproducible scientific insight is achieved.
Keep in mind that prior to that point companies and analysts tended to purchase a single tool and attempted to answer all of their questions with it whether it was appropriate to do so or not. A former P&G colleague, Kathy Flynn used to like to quote the old saying that “To a Man with a Hammer, Everything Looks Like a Nail” to describe the nearly myopic dependence on a single tool when doing patent analytics. So, with this backdrop I proposed the Linear Law of Patent Analysis as a framework for performing patent analytics. The steps in the process are:
- Create a toolkit of analysis tools
- Understand the business need and the need behind the need
- The need drives the question
- The question drives the data
- The data drives the tool
This is referred to as a linear law since in this framework the steps have to be followed in order to provide the best results. As mentioned, often times companies or analysts would start with the purchase of the tool and once that was accomplished, since a significant investment had been made in the tool, they would use it exclusively to conduct all of their analysis projects. In the suggested framework the choice of which tool to use is left as the last step once all of the other parameters associated with the analysis have been worked out.
The law starts with gathering a collection of tools or a toolkit. There is no one tool that can work all sorts of data and can conduct all types of analysis so it is important for the analyst to have options. Some projects require semantic or linguistic analysis of text, others require the study of citation patterns and networks and others still require studying the changes that take place within the text of a patent as it progresses through its life-cycle. So within reason, given budget constraints, a suite of tools should be collected.
The next step speaks to understanding the business requirements that will be satisfied by conducting the analysis. Under ideal circumstances, the analyst should know precisely what decision a business leader will be making with the analysis provided. They should also have a good idea about the situation the organization finds itself in, why there is an issue with it, and have some idea how a preferred path forward might look. Analytical results should be told as a narrative to have the greatest impact with the decision maker and understanding all of the context will allow the analyst to craft their results into a compelling story that drives decision-making.
Only after the needs are thoroughly understood can the analyst start suggesting questions, and potentially hypothesis that should be explored during the course of the project. The questions at least can be confirmed with the decision maker and provides confidence that the analyst understands the needs and is thinking about ways to address them. Depending on the needs either one or several questions can be addressed.
Now that the questions have been established, the experiments can be developed that will either confirm or discredit the hypothesis associated with them. In the case of patent analytics, experiments are designed by considering the data that will be analyzed. The following is just a few of the Items to consider when deciding what attributes the data to be analyzed should have:
- Type – bibliographic data, claims text, forward or backward citations
- Origin – does it come from the US or other countries?
- Time period – will the analysis incorporate as large a frame as possible or should it be limited to a specific time period
Finally, now that all of the other details have been worked out a decision can be made on which tool will provide the proper insight into the appropriate data to either support or dismiss the hypothesis. The use of the right tool is often critical to success as an analysis but their application must be applied under the proper circumstances to provide critical insight.
I was reminded of an excellent example of this in a recent article in World Patent Information (Integration of Software Tools in Patent Analysis). The authors, Piotr Masiakowski and Sunny Wang, from the pharmaceutical company Sanofi, discussed a custom tool they design to extract data from a database, messaged it to fit their purposes, send it to a first tool for initial processing and then send it to a second tool for subsequent visualization. So while the focus of the article is on the efficiencies in Extraction, Transfer and Load (ETL) that Sanofi was able to achieve by using their custom system I read it as an example of meeting a business critical need by understanding what questions they needed to be address, which database contained the required content and then applying the appropriate tools to ensure that the results would be compelling and understood by the decision maker.
While the field of patent analytics has matured significantly in the last twelve years, I recently heard an analyst share during a conference that they just recently came to the conclusion that focusing on the business needs of the client as opposed to analyzing data for its own sake was critical to their success. Frankly, I was a little surprised to hear someone saying something like this in 2013 and would have thought that analysts would have already learned this lesson. The danger of using tools or analyzing data out of context was exposed many years ago and should always be steeped in the business needs or goals of the business. Hopefully this refresher on an early framework associated with patent analytics will be a useful reminder to analysts or an important learning for those who are new to the field.
The Linear Law of Patent Analysis was first published in October of 2002 in an article entitled Patinformatics: Identifying Haystacks from Space, published in Searcher Magazine.