create a dataset for rate of petrol consumption using hiveql
Answers
Practice Query Creation
This blog of Big Data will be a good practice for Hive Beginners, for practicing query creation. At the end, you will be able to create a table, load data to the table and perform analytical analysis on the dataset provided in Hive real life use cases.
The Topic for this blog as referring to the real life dataset of Petrol suppliers. And in the second part, we are covering real world Olympic records.
The sample dataset is made in reference to today’s rate of petrol consumption.But is not an actual data. It has been modified and published in terms to learning only.
Here in this blog we will compare top petrol sellers of the world and get some hands-on experience in Hive. Go to the link If you need to download and setup Hive.
PETROL:
DATA SET:https://drive.google.com/open?id=0B1QaXx7tpw3SMTBqLUQwX0lOWnM
DATA SET DESCRIPTION:
ColumnNO. Name Example DataType
Column1: District.ID I4N 1M1 varchar
Column2: , Distributor name shell varchar
Column3: Buy rate (million) $957.70 varchar
Column4: Sell rate(million) $5779.92 varchar
Column5: volumeIN(millioncubic litter) 933 int
Column6: volumeOUT(millioncubic litter) 843, int
Column7: Year 1624 int
Creation of Table in Hive and Loading of data
create table petrol (distributer_id STRING,distributer_name STRING,amt_IN STRING,amy_OUT STRING,vol_IN INT,vol_OUT INT,year INT) row format delimited fields terminated by ‘,’ stored as textfile;
load data local inpath ‘/home/acadgild/Downloads/petrol.txt’ into table petrol;
1)In real life what is the total amount of petrol in volume sold by every distributor?
SELECT distributer_name,SUM(vol_OUT) FROM petrol GROUP BY distributer_name;
2)Which are the top 10 distributors ID’s for selling petrol and also display the amount of petrol sold in volume by them individually?
SELECT distributer_id,vol_OUT FROM petrol order by vol_OUT desc limit 10;
3)Find real life 10 distributor name who sold petrol in the least amount.
SELECT distributer_id,vol_OUT FROM petrol order by vol_OUT limit 10;
4)Try One yourself
The constraint to this query is the difference between volumeIN and volumeOuT is illegal in real life if greater than 500. As we see all distributors are receiving patrols on every next cycle.
List all distributors who have this difference, along with the year and the difference which they have in that year.
Hint: (vol_IN-vol_OUT)>500
——————————xxx xxx xxx—————————
We all know how much interesting Olympics held every time, and we all love it.In real life Olympic Games are considered the world’s foremost sports competition with more than 200 nations participating. The Olympic Games are held every four years, with the Summer and Winter Games alternating by occurring every four years but two years apart.
The sample dataset is made in reference to real life Olympic competition.But is not an actual data. It has been modified and published in terms to learning only.
Olympic:
DATA SET:https://drive.google.com/file/d/0B1QaXx7tpw3SaEE3bEFTQTMzNzg/view?usp=sharing
Olympic Data analysis using Hive
ColumnNO. Name Example DataType
Column1: AthleteName Michael Phelps STRING
Column2: , Age 23 INT
Column3: Country United States STRING
Column4: Year 2008 INT
Column5: Closing Date 8/24/2008 STRING
Column6: Sport Swimming STRING
Column7: Gold Medals 8 INT
Column8: Silver Medals 0 INT
Column9: Bronze Medals 0 INT
Column10: Total Medals 8 INT
Creation of Table in Hive and Loading of data
create table olympic (athelete STRING,age INT,country STRING,year STRING,closing STRING,sport STRING,gold INT,silver INT,bronze INT,total INT) row format delimited fields terminated by ‘\t’ stored as textfile;
load data local inpath ‘/home/acadgild/Downloads/olympic_data.csv’ into table olympic;
Using the dataset list the total number of medals won by each country in swimming.
select country,SUM(total) from olympic where sport = “Swimming” GROUP BY country;
2)Display real life number of medals India won year wise.
select year,SUM(total) from olympic where country = “India” GROUP BY year;
3)Find the total number of medals each country won display the name along with total medals.
select country,SUM(total) from olympic GROUP BY country;
4)Find the real life number of gold medals each country won.
select country,SUM(gold) from olympic GROUP BY country;
5)Try One yourself
Which country got medals for Shooting, year wise classification?
Hope this blog helped you in learning Hive with real life scenario, which we come through our everyday life.
While I leave you with a simple query to solve, keep visiting our site ACADGILD for more practice, queries on Hive and other trending technologies.