Those of us in the United Sates will be celebrating Thanksgiving this week, and part of the festivities will be the over consumption of a certain tryptophan ladened, flightless bird, and plenty of starchy tubers of one sort or another, with lots of butter, and assorted accoutrements. Those responsible for preparing the afore-mentioned feasts will be toiling away over a hot stove for many hours anticipating the delicacies to be enjoyed with family and friends. Many of them will be referring to a weathered copy of Better Homes and Gardens Cookbook to assist with the preparations.
Being a synthetic organic chemist by training I like to consider myself a pretty good cook, since it has been said that any chemist worth their salt can cook. In chemistry, as well as the culinary arts, there is a certain amount of artistry, and creativity in developing novel creations, but to understand the basics new practitioners follow a recipe. Now mind you, chemists don’t refer to the use of recipes when describing their work and how they communicate it with their peers. No, chemists talk about experimental procedures, and synthetic methods, when demonstrating how to reproduce their efforts, but regardless of what they’re called, these are still recipes. Reagents (ingredients) are added, in measured quantities, in a specified order and under certain conditions, and so, no matter how complicated they may be, chemists create instructions for other chemists to follow in order to obtain the same results as they did.
Lately, I have seen that certain groups in the analytics world are starting to follow a similar pattern, and are using recipes and cookbooks to teach new entrants to the field how to create interesting, and useful creations. For instance, I recently saw the following description for a new book in the data science field entitled IBM SPSS Modeler Cookbook:
[A] New book by leading analytics professionals shows how to get the most from IBM SPSS Modeler, using detailed step-by-step examples to help you build the models you can deploy in your business.
Go beyond mere insight and build models than you can deploy in the day to day running of your business
Save time and effort while getting more value from your data than ever before
Loaded with detailed step-by-step examples that show you exactly how it’s done by the best in the business
Useful recipes for achieving certain tasks in OpenRefine:
This page collects OpenRefine recipes, small workflows and code fragments that show you how to achieve specific things with OpenRefine.
What follows are instructions for accomplishing many useful tasks in OpenRefine including:
- Trim whitespace from beginning and end of values
- Titlecase that works on hyphenated names
- Pad with leading zeroes
- Removing duplicate rows when Exact values are found in a column
And so on. There is also a book available for learning OpenRefine entitled, aptly enough, Using OpenRefine. This tome also uses the cookbook analogy for explaining how to use the tool.
Using OpenRefine takes you on a practical tour of all the handy features of this well-known data transformation tool. It is a hands-on recipe book that teaches you data techniques by example. Starting from the basics, it gradually transforms you into an OpenRefine expert.
The book is styled on a Cookbook, containing recipes – combined with free datasets – which will turn readers into proficient OpenRefine users in the fastest possible way.
In the world of patent analytics I was recently introduced to our own cookbook for creating tasty analyses and visualizations. Created by BizInt Solutions, in collaboration with Search Technology, the BizInt Smart Charts team have created several recipes specific to patent analysis in their publication, “Cookbook of Reports and Visualizations Created with the Bizint Smart Charts Product Family”
One of the recipes included is called Filtering the Patent Family by Authorities [Patents]:
In this example, we’ve used VantagePoint – BizInt Smart Charts Edition to filter the Patent Family subtable to include only US, EP, WO patents. We’ve also filtered the patent numbers in the Patent Family to create a new column with only Chinese patents.
The corresponding recipe is a step-by-step accounting of how to accomplish this task using patent data from a variety of sources, and using the Combine feature to create a single chart. This is done by running the Identify Common Patent Family name tool, sorting by the Common Patent Family field and displaying the “Patent Family” column in BizInt Smart Charts and then transferring the resulting file to VantagePoint – BizInt Smart Charts Edition (VP-SCE).
The full recipe, and the entire cookbook can be downloaded from the BizInt Solutions website. As an added bonus, there are several recipes also available for working with drug pipelines, so those of you in the pharmaceutical field have even more to look forward to in this publication.
Recipes and cookbooks provide a powerful analogy for teaching complicated subjects to people entering a new field, or experienced practitioners who are looking to expand the techniques at their disposal. Having a proven procedure to refer to while working through a detailed example is always helpful, even when a recipe is not being followed exactly and is being customized to fit a specific task. The analogy of recipes and cookbooks as a teaching tool in data science, and analysis and visualization has been accelerating over the past year, and there are now several excellent examples of how analysts can use them to communicate their techniques with their peers. In particular, the cookbook from the people at BizInt Solutions provides a nice case of how complicated step-by-step instructions can be shared using this method.
In closing, I would like to wish all of the US readers and their families, a Happy Thanksgiving holiday! If you think about it, save me a piece of pie (pumpkin is my favorite this time of year).