Python, R, and Analytics
SQL is a critical skill for business intelligence. From accessing to transforming to reporting on data, SQL gives you the power to get the job done. It can help you discover exactly how your company is performing, what your customers are doing, or how people have reacted to your marketing campaigns.
Unfortunately, while SQL can tell you what has happened, it can’t tell you what will happen. What if you have questions like:
- How valuable is a lead based on company attributes and their behavior on our website?
- How much MRR will we generate in the next 30 days?
- Which customers are likely to churn next month?
These are the types of questions that take a customer to the next level of business intelligence — predictive analytics.
To perform the most advanced business intelligence, you need to move past the KPIs and metrics possible in SQL and begin to use more powerful and flexible tools such as Python and R. These are full-fledged languages used for advanced statistical analysis and modeling, and learning to harness them will enable you to grow your business far faster and more efficiently.
But it’s not easy.
The Way Things Were
In order to get the most from your work, Python and R need to be a part of your analytics stack, and they need to be integrated in a way that makes them as easy to use as SQL.
Unfortunately, Python and R typically live separate lives from the rest of your analytics tools. Even if your company is already using them, they are often used through Jupyter Notebooks or R Studio. This means that every exercise is a complex challenge of data engineering, and even when the work is done the results are removed from your visualization and reporting solutions.
Python and R often require moving between multiple tools to use. In fact, we’ve seen that analysts frequently have to use three or more tools to finish a single analysis when Python or R are involved.
This has led to innumerable challenges. Predictive analyses are slow to complete, hard to keep updated, and often fail to drive the business impact the analyst imagines once their results are generated.
A New Paradigm
What’s been missing is a way to natively integrate Python and R with the rest of the data analytics stack. Database access and data modeling in SQL should happen within the same platform that Python and R are used so that analysts can rapidly iterate on both datasets and models simultaneously. Data visualization should be easy and flexible, allowing the analyst to explore their data at the speed of thought.
An integrated process would look something like this:
- Transform your data directly in SQL, using native cross-database joins to combine data from across your company. The results are automatically version-controlled and kept up to date.
- Automatically import your new dataset to a native Python and R editor as a data frame. No more data engineering!
- Model your data using the libraries and package you already know. Your work is automatically version controlled and shareable with your team.
- Immediately visualize the results of your Python and R in widgets and share the results with your colleagues.
This new type of analytics workflow means advanced analytics can happen faster, with accurate and up-to-date data. Businesses will be able to integrate predictive analytics directly into their metrics, KPIs, and dashboards, and have a far better understanding not only of where their business has been, but where it’s going. In addition, it consolidates analyses into a location where security can be maintained, version control is automatic, and fellow analysts can easily find work.