Dear community. We apologize for keeping you waiting – it’s been a long time since our last article.
To be honest – we apologize but we are not sorry. We are not sorry because we spent the last couple of months being even more focused than ever on organizing a plan how to be of best service to YOU – our followers and the business.
It has been a very productive beginning of 2019 with a stress on actions rather than simply ideas.
In this series we would like to introduce another field of our expertise – visualization and BI. Actually, we want to combine Data Science & BI and get the cutting technology and strategy edge you want in the flexible and comprehensible manner you deserve.
We start by showing you an example framework approach and then elaborate on each element:
1) Understand the business
We have always prided ourselves with the industry knowledge we possess. Every now and then though you experience a new opportunity and embrace it. As they say – there is always a first time. 🙂
Thus, we always spend a good quality time understanding what makes our client strong and sustainable and what changes they have and are experiencing in terms of external and internal conjecture. As a next step, we target their pain-point areas which in our case are all the important KPIs they want to monitor and on which important decision can be made or to be afterwards dug deeper into.
2) Know the angles and slices
As with our previous business cases focused on Data Science & ML – we need to know the potential segmentators. What products, types of services, etc. we have and are they a potential slicer to create another visualization view on a deeper level that the client will benefit from.
Some of these facets can be provided by the business (talking to the business as much as possible is your best bet since they are the experts, at the end of the day) and other we manage to bring to the surface through the magnifying glass of analytics (e.g.: exploratory analysis, unsupervised learning, etc.).
3) Data
Data, Data, Data – as always, this is crucial. There are three dimensions here we would like to touch upon based on our experience:
- Sources & availability
- What data do we have? Where does it come from: one source? many sources? What is its perceived and actual quality and do we need to validate it and apply some version of data contracts along the project pipeline?
- Mapping
How do your data interact with each other – what are the interrelations available? Sometimes it turns out to be harder than expected to merge data from different sources and create combined visualization. Be advised that sometimes good ideas (dashboard contents, charts, interactive visualizations) are unfortunately not feasible due to data and relations constrains. Be prepared for the client with work-around solutions and alternative approach.
Do your best to map and validate corresponding data within your BI tool – if this is wrong all the rest will suffer.
- Artificial data
Clients are busy but you do not want to keep them waiting. Sometimes there is a gap between the start of the project and being able to actually get real data (due to different sources, mapping complicity, etc.). Our experience is that sometimes it’s easier to obtain initial idea of data schemas then the real data immediately. So one approach is to fill some data tables with artificial data yourself and thus be able to test visualization solutions and present demos to the client for preliminary feedback.
Please be advised here that actual data may twist your solution in an unexpected way so give yourself time to breathe and amend it as required prior delivery.
4) Environment
Even though you may be sticking to the same analytics software your underlying environment may vary throughout the project. Let us give you a couple of examples:
- Working (off-line) environment:
As with data – you can always do something proactively for the client. Start building demo solutions, mash-ups, etc. on your local machine until you get access to the client/ project-preferred environment.
- Client environment
Once all is set-up you will be able to shift your efforts to the client-preferred working environment – AWS EC2 instance, Sandbox, other Cloud solution, etc.. It’s always good to be prepared for unexpected developments when you shift between environments and change data connection sources.
The typical approach you have here are 3 environment instances: development (where you develop your tool), quality (testing the solution) and production (where you deploy your effort to go live). Know that these tree instances often relate to 3 instances of the data warehouse respectively often where you will also have to look-up in case of experiencing issues.
Hence, communication is crucial: if something is missing – data connection, unexpected tables lending to your working zone, unavailability of some tools – do make sure to connect relevant parties and tackle the issues. Examples are: dev ops, data engineers, business analysis, etc. Making a difference leveraging data science and analytics is not necessarily one man job – we strongly advice that communication and collaborative effort are not neglected.
5) Good idea vs. feasibility tradeoff
In this modern world of visualization capabilities you actually want to couple meaningful information with beautiful graphs, charts and interaction capabilities. Sometimes we tend to get ahead of ourselves and agree upon on visualization elements that may neither be informative nor feasible in technical aspect.
An example is when we were moving our churn solution (link to the article) from R Shiny to QlikSense – chain reactivity works quite different between the two platforms. We will discuss that in the next section.
6) Finish it: best of both worlds – data science & visualization
This is where your advanced analytics solution can actually stand out. Let us give you some examples that we have incorporated as working solutions for our projects:
- Utilization of PowerBI tailored visualizations leveraging R capabilities.
Our example (bottom left graph in the image above): revenue prediction utilizing ARIMA models. We placed an interactive script that reacts to dashboard filters which returns monthly revenue prediction per group. This was further enhanced with dynamic chart that places pointers and marker lines;
- Create even more advanced and interactive visualizations within PowerBI using R plotly widgets.
Our example (bottom right graph in the image above – example): we managed to develop a bespoke waterfall diagram with the interactive (e.g. zoom) capabilities not off-the-shelf available within PowerBI;
- Migration between platforms – R ML solution with Shiny to QlikSense:
Our example (the dashboard on the top in the image above): we managed to move our churn Machine Learning solution – both the underlying complex algorithms as well as the visualization part which was executed with Shiny – to a QlikSense server. Our struggle was with dynamic chained reactivity where a drop-down menu was splitting a numeric value in equal ranges on which some charts depend but the overall range itself depended on another feature – check-box group selection.
Since R and QlikSense do not think alike the solution was no that straightforward but our senior Qlik consultants helped us with a brainstorming push and we leaped over the hurdle. Well – expertise and dedication do pay-off.
Truthfully yours,
SeedSet Group
____________ ______________ ______________ ______________
With this series our aim is to increase data science coverage and to make data-driven decision integral part of more and more companies around the Globe.