Science, asked by monika1873, 10 months ago

create a dataset for rate of petrol consumption using hiveql​

Answers

Answered by rs750421
0

Practice Query Creation

This blog of Big Data will be a good practice for Hive Beginners, for practicing query creation. At the end, you will be able to create a table, load data to the table and perform analytical analysis on the dataset provided in Hive real life use cases.

The Topic for this blog as referring to the real life dataset of Petrol suppliers. And in the second part, we are covering real world Olympic records.

The sample dataset is made in reference to today’s rate of petrol consumption.But is not an actual data. It has been modified and published in terms to learning only.

Here in this blog we will compare top petrol sellers of the world and get some hands-on experience in Hive. Go to the link If you need to download and setup Hive.

PETROL:

DATA SET:https://drive.google.com/open?id=0B1QaXx7tpw3SMTBqLUQwX0lOWnM

DATA SET DESCRIPTION:

ColumnNO.  Name  Example  DataType

Column1:  District.ID  I4N 1M1  varchar

Column2: ,  Distributor name  shell  varchar

Column3:  Buy rate (million)  $957.70  varchar

Column4:  Sell rate(million)  $5779.92  varchar

Column5:  volumeIN(millioncubic litter)  933  int

Column6:  volumeOUT(millioncubic litter)  843,  int

Column7:  Year  1624  int

Creation of Table in Hive and Loading of data

create table petrol (distributer_id STRING,distributer_name STRING,amt_IN STRING,amy_OUT STRING,vol_IN INT,vol_OUT INT,year INT) row format delimited fields terminated by ‘,’ stored as textfile;

load data local inpath ‘/home/acadgild/Downloads/petrol.txt’ into table petrol;

1)In real life what is the total amount of petrol in volume sold by every distributor?

SELECT distributer_name,SUM(vol_OUT) FROM petrol GROUP BY distributer_name;

2)Which are the top 10 distributors ID’s for selling petrol and also display the amount of petrol sold in volume by them individually?

SELECT distributer_id,vol_OUT FROM petrol order by vol_OUT desc limit 10;

3)Find real life 10 distributor name who sold petrol in the least amount.

SELECT distributer_id,vol_OUT FROM petrol order by vol_OUT limit 10;

4)Try One yourself

The constraint to this query is the difference between volumeIN and volumeOuT is illegal in real life if greater than 500. As we see all distributors are receiving patrols on every next cycle.

List all distributors who have this difference, along with the year and the difference which they have in that year.

Hint: (vol_IN-vol_OUT)>500

——————————xxx xxx xxx—————————

We all know how much interesting Olympics held every time, and we all love it.In real life Olympic Games are considered the world’s foremost sports competition with more than 200 nations participating. The Olympic Games are held every four years, with the Summer and Winter Games alternating by occurring every four years but two years apart.

The sample dataset is made in reference to real life Olympic competition.But is not an actual data. It has been modified and published in terms to learning only.

Olympic:

DATA SET:https://drive.google.com/file/d/0B1QaXx7tpw3SaEE3bEFTQTMzNzg/view?usp=sharing

Olympic Data analysis using Hive

ColumnNO.  Name  Example  DataType

Column1:  AthleteName  Michael Phelps  STRING

Column2: ,  Age  23  INT

Column3:  Country  United States  STRING

Column4:  Year  2008  INT

Column5:  Closing Date  8/24/2008  STRING

Column6:  Sport  Swimming  STRING

Column7:  Gold Medals  8  INT

Column8:  Silver Medals  0  INT

Column9:  Bronze Medals  0  INT

Column10:  Total Medals  8  INT

Creation of Table in Hive and Loading of data

create table olympic (athelete STRING,age INT,country STRING,year STRING,closing STRING,sport STRING,gold INT,silver INT,bronze INT,total INT) row format delimited fields terminated by ‘\t’ stored as textfile;

load data local inpath ‘/home/acadgild/Downloads/olympic_data.csv’ into table olympic;

   Using the dataset list the total number of medals won by each country in swimming.

select country,SUM(total) from olympic where sport = “Swimming” GROUP BY country;

2)Display real life number of medals India won year wise.

select year,SUM(total) from olympic where country = “India” GROUP BY year;

3)Find the total number of medals each country won display the name along with total medals.

select country,SUM(total) from olympic GROUP BY country;

4)Find the real life number of gold medals each country won.

select country,SUM(gold) from olympic GROUP BY country;

5)Try One yourself

Which country got medals for Shooting, year wise classification?

Hope this blog helped you in learning Hive with real life scenario, which we come through our everyday life.

While I leave you with a simple query to solve, keep visiting our site ACADGILD  for more practice, queries on Hive and other trending technologies.

Similar questions