To know or not to know, that is the question

During the past few months, we conducted a number of pretty interesting workshops with the Data Mining experts form Bosch Corporate Research. We discussed with them in what way data-driven and rule-based approaches complement each other. You find some of our considerations in the following.

Admittedly, the starting point of our discussion (i.e. ideas that I came up some years ago) was rather fragmentary. I am a Computational Linguist and these folks are pretty much comfortable with data-driven methods. So when I started to work with Bosch Software Innovations, honestly speaking, I was surprised to find that people would model a classifier by hand, while they could run a Data Mining toolkit to automatically train one. After a lot of thinking and discussing, I back then answered the question “When to use what approach?” as follows: Business Rules are best suited for explicit knowledge, while Data Mining is best suited for implicit knowledge. Pretty straightforward, isn’t it? Alas, that’s only half the story.

Biking: Implicit Knowledge

Biking: Implicit Knowledge

The distinction between explicit and implicit was established in the 1960s by M. Polanyi. It means that some knowledge can be encoded in terms of for instance texts, figures, or formulas and can thus be communicated, while some knowledge is kind of tacit, that is to say, it is part of a person’s skills but cannot be communicated. Have a look at the pictures: How to cook a certain meal can be communicated with a recipe or a sequence of photos. It is thus a very good example for explicit knowledge. In contrast, although many people know how to ride a bike, they cannot explain how they do it. Hence, biking is a very good example for implicit knowledge.

But why is this distinction not enough? Let’s have a look at some more examples: You know your mother tongue, don’t you? You can most probably even read and write. Ok, write a rule model that makes a computer understand and produce human language. Impossible? Waste of time? Not your task, but the one of a linguist? Well, linguists (and computer scientists) have tried ever since there are computers. The result – as you know – is still pretty disappointing. The faculty of speech thus comes under the notion of implicit knowledge like the biking example. In addition to that, language seems to be an observable but not yet fully explained mechanism. There are lots of such mechanisms in the world out there: The outbreak of a volcano and the weather, many mechanisms in the human body (think of genetic predisposition and the like), most processes of cognition and creativity, etc. If you try to model such mechanisms in terms of rules, you will fall into despair soon (or you will have a life’s work).

Here is another example: Let us suppose, in a certain country you are according to a traffic regulation allowed to travel in general at up to x km/h in town (with x = 50 km/h). Now, try to work out x merely on the basis of a set of examples, i.e. with a data-driven approach. For sure, most of your examples won’t keep to the speed limit. Some examples are perhaps recorded in a travel-calmed area with a special speed limit of 30 km/h, some in a city highway with a speed limit of 120 km/h. And even if the data suggests that x = 58 km/h, this won’t help you much when you are caught by a speed camera. The same does not only hold for most regulations and laws but also for scientific or other basic principles. For instance, if you want to automatically learn the shape of a room, let the learner premise that there are right angles, otherwise the outcome may be kind of a Hundertwasser house. Hence, if you try to learn such things with a data-driven approach, you will waste your time and money, and what’s more, you will probably be off the mark (or you will create something artistically interesting).

Finally, we need to address the psychological aspect of the whole story: Let us assume that you have taught yourself to speak Vietnamese. Let us further assume that you only used a grammar and a dictionary; you have never listened to a native speaker or read a text (e.g. a newspaper article or a book) by one. All the same, since you are a strong autodidact, you are completely convinced that you are very good at it. A native speaker however would find that there are (systematic) grammatical errors and that it sounds unidiomatic (i.e. unnatural, sometimes even weird). In short, regardless of whether your task involves (1) implicit knowledge, (2) explicit knowledge, (3) unexplained mechanisms, or (4) laws/principles, you need to check if there are “unknown unknowns” (which is admittedly difficult though not impossible).

Here is an authentic example: Our colleagues from Bosch Corporate Research were asked to improve the performance of a rule-based system that predicts courses of disease. Although the individual rules were derived from scientific studies, the system’s overall performance was not much better than chance. The Data Mining experts from Bosch Corporate Research implemented a data-driven algorithm that had been published only very recently and were able to thus significantly improve the performance by selecting the most predictive rules. The medical experts, however, were astonished to find that someone with little to no domain knowledge was able to improve their system. If you like: They just did not know that they did not know.

What this all amounts to is that there are only two (more or less) clear-cut cases:

  • If you have regulations, laws, basic or scientific principles, which implies that there are no unknown unknowns or that the latter do not matter much, you should go for a rule-based approach.
  • If you have observable but not yet fully explained mechanisms, which implies that there are known and unknown unknowns, you should go for a data-driven approach.

In any other case, you will most probably need a combination of both approaches. Incidentally, this will be the topic of one of my next posts – so stay tuned.

Last but not least, I would very much like to know if you can think of further criteria that help you decide when to use what approach. What do you think?

 

About the author

Irene Cramer

Irene Cramer

Irene Cramer is chief product owner of the data analytics cloud services that are part of the Bosch IoT endeavor. For more than five years, she has been part of interdisciplinary teams that foster the application of data-driven methods within Bosch. Previously, she worked as a research engineer on two AI projects within a large network of academic and industrial partners. Irene holds a Master’s degree and a PhD in Computational Linguistics and has published various papers and monographs on language technology, data mining, and business rules.