For most of non-technical users data analysis using MySQL seems to be a complicated task reserved only for IT specialists. Indeed SQL is a ‘Structured Query Language’ which is the most common standardized programming language used to access databases. A database is broadly defined as a structured collection of data. It may be anything from a simple shopping list to a picture gallery or the vast amounts of data of an international corporation. To add, access, and process data stored in a computer database, you need a database management system.
However, the procedures of MySQL data analysis aren’t much more difficult from what you do every time you build tables in Excel or Word. It always starts from collecting data, then cleaning it from any mistakes and finally visualizing them in an analysis-ready format. In the following article we will perform a short data analysis using MySQL language and a workbench tool to illustrate this process in 3 easy steps.
1) Collect your data
Most companies experience massive influx of data that is later stored in different places, divided into different files and formats. The sheer volume of data may result in some insight being overlooked or lost. MySQL serves the purpose of aggregating data from various sources and facilitating a fruitful analysis.
In today’s blog post we will analyze some data of a company that has split its data into 3 different files: a CSV containing sales data, an Excel file that lists the customers and an XML file that contains the products.
We start with creating three different tables in our database, one for each file, and setting primary key together with expected format. Then we need to import every file into our database to have all data in the same place and with the same format.
MySQL syntax for file importing
CSV and Excel file:
2) Clean the tables
Now that we have entered our data into the MySQL database, we can go to the next step – cleaning the datasets. This is a very important process of deleting and correcting inaccurate records from a database. We must first identify incomplete, irrelevant or incorrect parts of the data and then replace, modify or delete this “dirty data”. In this step we focus on improving the data quality.
Let’s go back to our example. We now add information “Country” to the table “Customers”. The file looks as following:
Although it seems obvious to which country the information refers to, multiple records may bias our future SQL analysis. How? If we make a query to have the total sales from the United Kingdom, we assume it’s recorded literally “United Kingdom”, and consequently we’ll lose information from all the records formulated differently. Of course you could then add in your query country also other variations (‘UK’, ‘England’, ‘UNITED KINGDOM’…) but the point of cleaning the data is to prepare it for a smooth and fast analysis and avoid errors in the future.
In the cleaning process, watch out for errors caused by:
- Different formats (id in string format instead of integer)
- Thousands or decimal separator are different (. or ,)
- Null is not authorized
These SQL functions are useful during the cleaning process:
– LIKE() –> Simple pattern matching
– TRIM() –> Remove leading and trailing spaces
– REPLACE() –> Replace occurrences of a specified string
– LEFT() –> Return the specified number of characters starting from left
– RIGHT() –> Return the specified number of characters starting from right
– CONCAT() –> Use concatenate strings
– UPPER() and LOWER() –> Convert the value to uppercase or lowercase
– FORMAT() (numeric field) –> Convert a number with a specified number of decimal
– CASE WHEN field is empty THEN xxx ELSE field END –> Functionality of an IF-THEN-ELSE statement. Allow you to evaluate conditions and return a value when the first condition is met.
3) Analyze and visualize your data
After arduous cleaning we’ve finally reached the step when we can take advantage of our data by pulling out meaningful conclusions from our cross-datasource analysis. With all data visualized and transformed into business insights, you can find relevant answers to specific questions. In our example we compiled data about sales, customers and products, so the resulting information would include for example which product is the bestseller, in which country or shop we sale the most, who are the top 10 customers, which customers haven’t bought anything in the last 6 months, how many new buyers are acquired every month etc.
You can also structure your data to pull out KPIs. In order to do that you simply prepare some SQL queries and line them up in tables.
Some analysis examples include:
- Order your query with a DESC function and limit the results to 5 to have a top five.
- Calculate some rates, evolution, growth, retention.
- Display your sales per country, gender, product.
- Calculate a running total to have your cumulated new customers over months (see example below, using subqueries).
If after this short overview you still feel you are in the dark about data analysis using MySQL, we have some good news for you. With the help of innovative self-service tools, you can visualize your data in a more meaningful way without the hassle of writing your SQL queries manually. How is that possible? For example with datapine’s MySQL Query Builder you can create the queries with the help of an intuitive drag-and-drop interface that turns your data into meaningful and actionable charts. You can then give the charts the looks you like, drill down for more detailed data or share with chosen coworkers. Choose the data analysis tool that fits best the needs and capabilities of your company and start benefiting from your data today.