Blog

We’re a lot like everyone else, until you get to know us, then we’re like nothing else you’ve seen before.

posted Sunday, August 10, 2014 at 7:05 pm

Our competitors use basic solutions to ingest data, score it, and index the data for retrieval and analysis.  They generally don’t have a sophisticated methods to clean and disambiguate the data, sentiment accuracy and recall is generally terrible, and there is not another company that I am aware of that can create custom detection models on the fly and re-score ALL data in real-time at the direction of the user.

Most companies leverage the same handful of solutions to analyze text.  These solutions range form natural language processing (NLP) to machine learning and most of them are dependent on dictionaries, taxonomies or ontologies.  Think of NLP and other dictionary, taxonomy and ontology approaches as an approach whereby you predefine everything you are looking for; therefore, if you have not defined it, you will not find it.  There are a series of NLP toolkits on the market that can be licensed, most notably, OpenNLP from the Apache Software Foundation, Lexalytics is another text analytics solution that is often embedded in solutions like DataSift, Lithium, Radian6 and others.

There are also a growing number of vendors that use use machine learning techniques that are primarily based on volume metrics like co-occurrences of language features that generally tell you the obvious, with little regard for the weaker signals, which are generally more valuable.  Yet, other companies like Autonomy and Clarabridge have sophisticated and complex approaches, but they require very expensive client services models due to the complexity in bringing their solutions to life.

Initially, the competition was only talking about sentiment: thumbs-up and thumbs-down.  Now companies have caught on to what we are doing with emotions and everyone is jumping on the bandwagon.  The problem is that they are trying to derive emotions using inferior solutions like OpenNLP or Lexalytics.  It is simply impossible to measure a persons emotions, state of mind or persona using key words or basic machine learning approaches in order to detect granular, accurate and actionable insights.

Most competitive solutions are also expensive from a processing perspective, so data scientists and developers game the system and reduce complexity by dumbing the process down, they eliminate stop-words, perform stemming, perform lemmatization and many other similar tasks to keep things simple (definitions below).  The problem is that while they solve the problem of making processing faster, they lose valuable texture to the data that compromises the quality of their analysis.  Machine learning solutions suffer the same challenges, developers need to determine the best “language features” to model, and small changes in the number of features analyzed can cause significant swings in performance, accuracy and recall.

Beyond the items discussed here, there are many other items that influence the quality of the analysis: spam filtering, promotion filtering, disambiguation and negation to name just a few (definitions below).  The fact is, each vendor modulates all of the dials independently, so it’s no wonder clients lack confidence in text analytics solcutions when clients see dramatically different results form each vendor.  Therefore, it should be no surprise that companies like Coke, AT&T and Dell and others have been forced to invest in human moderation factories to read individual posts every day…most platforms simply cannot be trusted.

Definitions:

Stop-words – Any group of words can be chosen as the stop words for a given purpose. For some search machines, these are some of the most common, short function words, such as the, is, at, which, and on. Other search engines remove some of the most common words—including lexical words, such as “want”—from a query in order to improve performance.

Stemming – In linguistic morphology and information retrieval, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form. The stem needs not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root.

Lemmatisation – is closely related to stemming. The difference is that a stemmer operates on a single word without knowledge of the context, and therefore cannot discriminate between words that have different meanings depending on part of speech. However, stemmers are typically easier to implement and run faster, and the reduced accuracy may not matter for some applications.

For instance:

    • The word “better” has “good” as its lemma. This link is missed by stemming, as it requires a dictionary look-up.
    • The word “walk” is the base form for the word “walking”, and hence this is matched in both stemming and lemmatisation.
    • The word “meeting” can be either the base form of a noun or a form of a verb (“to meet”) depending on the context, e.g., “in our last meeting” or “We are meeting again tomorrow”. Unlike stemming, lemmatisation can, in principle, select the appropriate lemma depending on the context.

Disambiguation – The process of detecting the true meaning of an ambiguous word in context, for example: jaguar (car, football team, animal).

Until Decooda arrived on the scene, no one has really challenged the thesis that the only way to analyze text was NLP, Lexalytics or the vast array of machine learning tools.  Decooda really stretched the boundaries of conventional wisdom and developed a completely new method to analyze text in a new and innovative way – we call it cognitive-linguistics.  We don’t use typical dictionary, taxonomy or ontology based approach.  We don’t get rid of stop-words, and we don’t do stemming or lemmatization.  We start by leveraging the most sophisticated spam filtering, promotion filtering and disambiguation engine on the fly.  Next we combine linguistics, cognitive psychology and artificial intelligence, in order to listen for both strong and weak signals to identify the biggest “ah-ha’s.”  The result is richer, more accurate and actionable data that can help you understand how consumers think, feel and act about your product or brand.  Most importantly, with four hours of training virtually can become self-sufficient.

Decooda Framework Image

From a business perspective, the breadth of our solution capabilities allows us to serve everyone with a shared interest in the voice of the customer.  Further, the inherent flexibility in our platform allows Decooda to dynamically adapt to our clients’ evolving needs on the fly.  In addition, unlike our competitors, Decooda is providing its clients with a big data platform (cloud/premise/hybrid) where all of the social and enterprise content can reside in order to be analyzed side-by-side.  In fact, we tell our clients that a relationship with Decooda puts them in a position to control their big data asset, which over time could prove to be one of the most significant assets of the enterprise.

We are a lot like everyone else, until you get to know us, then we’re like nothing else you ever seen before.