Machine Learning in Patent Analytics – Part 1: Clustering, Classification, and Spatial Concept Maps, Oh My!

One of the most polarizing collection of tasks, associated with patent analytics, is the use of machine learning methods for organizing, and prioritizing documents. While these methods have caught on, and are used in many industries, the adoption in the patent information space has been sporadic. There are opponents of these methods, who are concerned about the peculiarities of language used within patent documents, and how these methods can deal with the inherent ambiguities, and proponents, who see a potential tool for assisting with time-consuming review tasks, even if they aren’t fully automatic. Regardless of an individual’s perspective on the value of these methods though, there is little doubt that significant attention is being paid to them. It is in the best interest of all patent practitioners to have a basic understanding of how these methods work, and how they are being applied to patents. This post will provide some background on machine learning methods, and how they apply to the tasks of clustering, classification and spatial concept maps. It will be the first, in a series on machine learning methods for patent analytics. Additional posts in this series will focus on each task individually, and provide practical tips on how to apply it to the analysis of patent documents.

Wikipedia provides the following definition of machine learning:

Machine learning, a branch of Statistical Learning, is about the construction and study of systems that can learn from data. For example, a machine learning system could be trained on email messages to learn to distinguish between spam and non-spam messages. After learning, it can then be used to classify new email messages into spam and non-spam folders.

The core of machine learning deals with representation and generalization. Representation of data instances and functions evaluated on these instances are part of all machine learning systems. Generalization is the property that the system will perform well on unseen data instances; the conditions under which this can be guaranteed are a key object of study in the subfield of computational learning theory also referred to as statistical learning theory.

Continuing with some additional definitions, the terms clustering, and classification are often used interchangeably but are actually quite different from one another. Clustering is normally associated with unsupervised methods of organizing document collections based on a similarity comparison between each member. With a fixed number of clusters identified at the outset, document collections that meet a threshold similarity component are grouped together. Ideally, the documents within a cluster should be similar to one another but dissimilar to documents in the other clusters.

Classification, on the other hand, is usually accomplished with a supervised, machine learning method that uses “learning sets” to identify key attributes of documents in a category. The “learning sets” are small sub-collections, one for each category, generated by the analyst, who decided which test documents should appear in each class. New documents are compared to the learning collections and assigned to a class based on their similarity to the documents that have already been assigned to the category.

Spatial concept mapping, is related to clustering, or classification, since it generally begins with one of these methods, but adds an extra component, identification of relative similarity between the categories created, to the task. The tools involved take the document clusters, or classes, and arrange them in 2-dimensional space by considering the similarity of the documents, or clusters, relative to one another, over the entire collection. Documents that share elements in common are placed closer together spatially, while ones with less similarity, are placed further away.

Now that the tasks associated with machine learning methods have been identified, let’s look at some of the algorithms used to perform them. Knowing a little about these will help analysts understand, and evaluate the tools they decide to use.

When it comes to clustering, the unsupervised machine learning task, the two most often used algorithms in patent analysis tools are k-means, and force directed placement:

  • K-means – a method of cluster analysis, which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Given a set of observations (x1, x2, …, xn), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k sets (k ≤ n) S = {S1, S2, …, Sk} so as to minimize the within-cluster sum of squares (WCSS).
  • Force Directed Placement – At the most basic level the algorithm tries to place similar objects close together and dissimilar objects far apart. The process is achieved by moving the objects randomly around the solution space via a technique similar to ‘simulated annealing’. The criterion for moving a node is the minimization of energy.

The two methods are unsupervised, so they are referred to as clustering, but they take very different approaches to the grouping of documents into categories. K-means looks to create a fixed number of clusters, and moves new documents to the cluster that has the most similarity, to the other documents, in that cluster. Force Directed Placement doesn’t generate clusters, per se, but looks to find a “local” energy minimum where additional perturbation would increase the tension in the collection. Chemists can relate to this method since it resembles the electrostatic, and steric forces that lead to most favored confirmations in small molecule 3-D modeling, and protein folding.

Readers are encouraged to explore the link provide for each algorithm if they are interested in additional details, or the math, behind the operations.

Moving to classification, the supervised machine learning task, two frequently applied algorithms are Artificial Neural Networks (ANNs), and Support Vector Machines (SVMs):

  • Artificial Neural Networks – In computer science and related fields, artificial neural networks are models inspired by animal central nervous systems (in particular the brain) that are capable of machine learning and pattern recognition. They are usually presented as systems of interconnected “neurons” that can compute values from inputs by feeding information through the network.
  • Support Vector Machines – supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. The basic SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the output, making it a non-probabilistic binary linear classifier. Given a set of training examples, each marked as belonging to one of two categories; an SVM training algorithm builds a model that assigns new examples into one category or the other.

As applied to patent analytics, the most frequently used sources of content, for both clustering, and classification exercises, come from patent classification codes, and raw, or standardized text, from a source document.

Looking at spatial concept maps, the FAQ section on the IN-SPIRE tool, a related cousin of ThemeScape, both originally developed at Pacific Northwest National Laboratories, provides the following explanation of the process used for creating spatial maps starting with a clustering step:

In brief, IN-SPIRE™ creates mathematical representations of the documents, which are then organized into clusters and visualized into “maps” that can be interrogated for analysis.

More specifically, IN-SPIRE™ performs the following steps:

  1. The text engine scans through the document collection and automatically determines the distinguishing words or “topics” within the collection, based upon statistical measurements of word distribution, frequency, and co-occurrence with other words. Distinguishing words are those that help describe how each document in the dataset is different from any other document. For example, the word “and” would not be considered a distinguishing word, because it is expected to occur frequently in every document. In a dataset where every document mentions Iraq, “Iraq” wouldn’t be a distinguishing word either.
  2. The text engine uses these distinguishing words to create a mathematical signature for each document in the collection. Then it does a rough similarity comparison of all the signatures to create cluster groupings.
  3. IN-SPIRE™ compares the clusters against each other for similarity, and arranges them in high-dimensional space (about 200 axes) so that similar clusters are located close together. The clusters can be thought of as a mass of bubbles, but in 200-dimensional space instead of just 3.
  4. That high-dimensional arrangement of clusters is then flattened down to a comprehensible 2-dimensions—trying to preserve a picture where similar clusters are located close to each other, and dissimilar clusters are located far apart. Finally, the documents are added to the picture by arranging each within the invisible “bubble” of their respective cluster.

Spatial concept maps can also be made using classification methods. Arguably, the most famous of these is the Kohonen Self Organizing Map (SOM):

Kohonen Self Organizing Maps – a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map. Self-organizing maps are different from other artificial neural networks in the sense that they use a neighborhood function to preserve the topological properties of the input space.

Machine learning methods provide organizational, and prioritization functions that can be applied to patent documents, and if used properly can deliver great value to analysts. This post has provided an introduction to the variety of tasks associated with machine learning methods in patent analytics, and distinguished them from one another. In future posts, each of the three primary tasks, clustering, classification and spatial concept maps, will be covered in detail, using tools designed for the analysis of patent documents, by way of a relevant case study.

Comments 10

    1. Hello Alfred,

      Thank you for the comment. The next post will be on binary classification with a Support Vector Machine.

      Have a nice weekend,
      Tony

  1. Hi Tony,
    This is a good summary of two of the methods for clustering and classifying patents.

    There are three techniques that we use to determine patent “similarity”. The first is very close to your K-means approach. We first determine vectors between patents within a set based on the number of semantically important keywords in common with a target patent. We then take this n-dimensional set and do a dimensional reduction to a 3-dimensional cluster which allows visualization of the directional similarity between patents–it give meaning to a patent that is “out in left field” which can be good or bad, depending on your perspective. The second method is to run a 2-generation forward/reverse citation analysis around the target patent and group patents based on the number of citations in common. The third method is to look at the US classes and group similar patents based on the number of US classes in common beyond the main class. I’d like to know your thoughts and experiences with these methods.
    Best regards,
    Jim Adams
    Chief Technology Officer
    TAEUS International

    1. Hello Jim,

      Thank you for the comment and I am happy that you liked the post.

      Regarding the methods you referred to, in the n-space instance this sounds a lot like what ThemeScape and many of the other spatial content map solutions do. I think there is a lot of value in this method, and to tease the post I will do on this task, I think the choice of terms that comprise the vector is the real key to making this work well.

      I also like the citation approach and would love to see more organizations look at citations from a network analysis perspective. Amberscope does this as well as the Assignee Citation Network tool provided in Orbit.com. A great deal can be learned with this method.

      Finally, a co-occurence analysis using US classes works nicely in some circumstances and not so well in others based on my experience. It mostly depends on how granular the codes are in the technology of interest and how they are applied by the patent offices associated with the individual documents that are being organized. I will say more about classification codes in the clustering post.

      Thanks again for the great question,
      Tony

  2. Hi Tony, I like your blogpost about Machine Learning in Patent Analytics. It’s not just about your introduction on these methods but also about the fact that this is a big opportunity for IP analytics departments to make it a valuable instrument as part of their daily IP discovery work to be able to increase the value they deliver to their – internal – customers.

    Having worked with various types of IP intensive organisations in Pharma, High Tech or IP service providers ourselves we’ve experienced that – if embedded in the portfolio of IP discovery methods – machine learning type of technologies will deliver the strongest ROI (in time, in costs and even in increased accuracy compared to traditional – human – search). Look forward to your second blog!

    Jeroen Kleinhoven
    CEO Treparel.com

    1. Hello Jeroen,

      I appreciate the comment and agree with you completely about the potential ROI for these methods.

      I will be covering binary classification in the next post in this series.

      Thanks,
      Tony

    1. Hey Kurt,

      Thanks for the compliment. I will certainly be looking at some of the CAS tools during the course of this series.

      Thanks again,
      Tony

Leave a Reply

Your email address will not be published. Required fields are marked *