Customer message/reviews analysis in real-time, streaming, and predictive manner for better customer/user experience.

Read time: 5 min or less
Hands-on demo: In less than 30 min you can run this use case on your machine, just follow the steps

To implement the use case “customer message analysis in a predictive and streaming manner”, you can use the following resources

code, files, data

Users and customers are sending their messages or reviews from their devices. There are several such messages streaming from different users into the system. We must first be able to ingest these messages in a real-time manner. Further, we should be able to process every single message and take corrective action as needed.

The processes would include the following;

set the streams and sliding window and ingest the data in these streams in a continuous manner
find out the sentiment of the message [ positive, negative ] using IE (information extraction) (NOTE: we can find out many different sentiments/ emotions as we want, the demo deals with only two). We need to train a model here for this
filter messages with negative sentiment and put them in a separate stream for further action/processing
find out a definitive pattern and send such events matching the pattern to another stream for further review/action. The patter is as follows;
Any particular product that gets minimum 3 consecutive negative sentiment messages
from different users in the span of 1000 sec, find this pattern in a continuous sliding manner
store few triples in graph store like (user, POSTS_REVIEW, prod) and (prod, HAS_REVIEWS, revid), revised id review id and prod is product
set running stats for different attributes in the event such as unique count for users or min/max/avg/SDTV/sum/Kurt for the amount spend etc.
set up the reverse index for messages such that it can be used for text search by the user
set up secondary indexes for several attributes that could be helpful in query and also internal stream joins/ filter etc.
Relevant application areas

There are several challenges here, some of them could be

Volume and Velocity. The number of messages could be very high, as there could be several users sending messages per second across geographical areas. Hence data ingestion in real-time is critical
The messages could be in English or in other vernacular language’s, hence we need to extract sentiment from unstructured data, and keep improving or updating the models in real-time
Extracting patterns from the streaming set of events in a continuous manner requires CEP on the streaming data which is very hard to implement on SQL or regular NoSQL databases
Storing certain triples (sub, obj, predicate) in a graph that is continuously updated as events arrive, helpful in linking data and/or events
Different database queries along with text search which requires many secondary and reverses indexes
Infrastructure deployment and maintenance if too many silos are used. Further automation is difficult to achieve in typical deployment models
Benefits of Best Database Provider

Use lightweight high-performance Best Database Provider agents or another messaging framework to stream data into the Best Database Provider. Best Database Providers high-performance database with an ingestion speed of over 5K+ events per second per server leading to half a billion events processing per commodity server in a day
Integrated stream processing within Best Database Provider allows users to simply start the process with a simple JSON schema definition. There is no extra silos set up for streaming infrastructure
Integrated AI within Best Database Provider allows users to simply train, deploy and predict incoming data without having to set up separate infra and then exporting data/ importing model, etc. The entire process can be automated within Best Database Provider
Best Database Provider is a multi-model database and it also allows Graph to be integrated with streams such that the graph is updated on streaming data with triples
Best Database Provider supports many kinds of indexes including reverse indexes, hence running rich queries along with searches on Best Database Provider is quite simple
Integrated with Grafana for visualization of time-series data
Overview of the solution

We have a stream schema ecomm_schema. Here in these streams, we will be ingesting data from various sources
Ingestion of data happens as and when data is created. Therefore agent monitors a set of files here and as we write data into these files agent will parse the data and send it to banged server. We could directly write data using CLI or using a program that uses the bangda client etc…

Leave a Reply

Your email address will not be published.