Name: CSC370 Assignment 4 Solved
SKU: 62773
Price: 30.00 USD
Availability: InStock

Description

Rate this product

Some of the queries in this assignment involve floating point data. In the expected output shown below, there might be slight differences in floating point values due to rounding and formatting issues. You may assume that floating point values in your query results match the expected output if they are equivalent to four decimal places (e.g. if the expected output is written as ‘0.1234’, the values ‘0.123398’ and ‘0.1234567’ would both be considered correct, but the value ‘0.123’ would not).

Question 1: IMDB Queries

Create queries for each of the data retrieval problems below, using the imdb database. In the questions below, any reference to ‘films’ refers to titles with title_type = ‘movie’.

For each year between 2000 and 2017 (inclusive), list the primary name, production year, rating and number of votes of the film or films which attained the highest rating among all movies produced in that year which received at least 10000 votes. Both the rating and number of votes are stored in the ratings

Expected Query Result
primary name	year	rating	votes
Memento	2000	8.5000	942432
Gladiator	2000	8.5000	1095891
The Lord of the Rings: The Fellowship of the Ring	2001	8.8000	1370113
The Lord of the Rings: The Two Towers	2002	8.7000	1221886
The Lord of the Rings: The Return of the King	2003	8.9000	1349934
Anbe Sivam	2003	8.9000	10116
Black Friday	2004	8.6000	13513
Earthlings	2005	8.7000	14649
The Prestige	2006	8.5000	959259
The Lives of Others	2006	8.5000	290278
The Departed	2006	8.5000	975612
Like Stars on Earth	2007	8.5000	113421
The Dark Knight	2008	9.0000	1864795
Home	2009	8.6000	19621
Inception	2010	8.8000	1653611
The Intouchables	2011	8.6000	591006
The Dark Knight Rises	2012	8.4000	1269644
Django Unchained	2012	8.4000	1087769
CM101MMXI Fundamentals	2013	9.3000	38106
Interstellar	2014	8.6000	1120400
RangiTaranga	2015	8.7000	10286
The Mountain II	2016	9.6000	93071
Ayla: The Daughter of War	2017	9.1000	15807

Select the primary name and episode count of each TV series (contained in the tv_series table) for which at least 6000 episodes have been produced.

Expected Query Result
series name	episode count
Days of Our Lives	10240
Ohayou Tokushima	9502
Coronation Street	9316
Neighbours	8911
Six O’Clock News	8646
The Price Is Right	8461
One O’Clock News	8100
Jeopardy!	7599
EastEnders	7393
The Bold and the Beautiful	7236
Home and Away	7081
Wheel of Fortune	6564
Gute Zeiten, schlechte Zeiten	6413
The Tonight Show Starring Johnny Carson	6037
The Young and the Restless	6024

Question 2: BC Ferries Queries [14 marks]

The queries you write below should work correctly on any of the BC Ferries databases (ferries_1month, ferries_3months, ferries_6months, ferries_9months, ferries_1year or ferries_3years). For comparison, sample output is shown for both ferries_1month and ferries_3years.

The following definitions apply to all of the parts below unless otherwise stated.

The ‘duration’ of a sailing is defined to be the number of minutes between the scheduled departure (not the actual departure) and the arrival of that sailing.
Any reference to the ‘day of a sailing’ refers to the calendar day of the scheduled departure (not the actual departure) of that sailing.

For many routes, there are simultaneous sailings in both directions at the same time. For example, on a typical day at 9:00am, a ferry leaves Tsawwassen for Swartz Bay and a different ferry leaves Swartz Bay for Tsawwassen. Not all routes or sailings have this property. We will define two vessels as ‘paired up’ if they have both served the same route number at the same (scheduled) departure time/date. Construct a query to count the number of times each distinct pair of ferries have been paired up. Your result must not contain counts for pairs of vessels which have never been paired up. Each distinct pair of ferries should appear only once in your result, and in the result rows, the vessel names of each pair must be ordered alphabetically (so the alphabetically lowest vessel name will be listed first).

	Expected Query Result (ferries 1month)
	vessel1	vessel2	num pairings
	Coastal Inspiration	Queen of Alberni	158
	Spirit of British Columbia	Spirit of Vancouver Island	10
Expected Query Result (ferries 3years)
vessel1		vessel2		num pairings
Coastal Inspiration		Queen of Alberni		5449
Spirit of British Columbia		Spirit of Vancouver Island		3552
Queen of Cowichan		Queen of Oak Bay		3048
Coastal Celebration		Queen of New Westminster		2261
Coastal Renaissance		Queen of Oak Bay		1825
Coastal Celebration		Spirit of British Columbia		1653
Coastal Celebration		Spirit of Vancouver Island		1247
Coastal Renaissance		Spirit of Vancouver Island		847
Coastal Renaissance		Queen of Alberni		814
Coastal Celebration		Coastal Renaissance		622
Coastal Renaissance		Queen of New Westminster		594
Coastal Inspiration		Queen of New Westminster		592
Queen of Coquitlam		Queen of Cowichan		548
Queen of Coquitlam		Queen of Oak Bay		481
Queen of Alberni		Queen of Coquitlam		400
Queen of Coquitlam		Queen of New Westminster		152
Coastal Renaissance		Spirit of British Columbia		53
Coastal Celebration		Coastal Inspiration		24
Queen of New Westminster		Spirit of Vancouver Island		17
Coastal Inspiration		Spirit of British Columbia		8
Queen of Coquitlam		Queen of Surrey		3
Coastal Renaissance		Queen of Cowichan		1

The routes table contains the ‘nominal duration’ of each route, which is the expected crossing time. The nominal duration is determined by BC Ferries based on the average marine and traffic conditions, along with information like loading times and the speed of the vessels. We can test the accuracy of this calculation by computing the average time of each crossing. Construct a query to find, for each route number, the nominal duration (in minutes) and the average duration (in minutes) of a crossing based on all available data for that route. For this question, assume that the ‘duration’ of a particular sailing is the time between its scheduled departure and its arrival.

Expected Query Result (ferries 1month)
route number	nominal duration	avg duration
1	95	101.2424
3	40	44.9415
4	35	33.2479
8	20	20.5182
30	120	129.0709
Expected Query Result (ferries 3years)
route number	nominal duration	avg duration
1	95	92.8867
2	100	106.8488
3	40	45.1520
4	35	31.8939
8	20	21.9507
30	120	123.0126

Suppose we define a sailing to be ‘late’ if the duration is at least five minutes longer^[1] than the nominal duration in the routes Construct a query to find, for each month, the number of days for which at least one sailing occurred on Route 1 but no late sailings occurred on route number 1.

Expected Query Result (ferries 1month)
month	count
4	1
Expected Query Result (ferries 3years)
month	count
1	65
2	53
3	58
4	29
5	19
6	23
7	22
8	18
9	43
10	45
11	51
12	60

Construct a query to find, for each vessel with any sailings, the total number of sailings it has made, the number of late sailings it has made (which may be zero) and the fraction of its sailings that were late (that is, the number of late sailings divided by the total number of sailings). You should be careful with this query: a vessel may be involved in multiple routes, each with a different nominal duration.

	Expected Query Result (ferries 1month)
	vessel name	total sailings		late sailings		late fraction
	Coastal Inspiration	232		166		0.7155
	Coastal Renaissance	39		14		0.3590
	Queen of Alberni	163		43		0.2638
	Queen of Capilano	440		30		0.0682
	Queen of Coquitlam	68		16		0.2353
	Queen of Surrey	274		100		0.3650
	Skeena Queen	234		22		0.0940
	Spirit of British Columbia	164		98		0.5976
	Spirit of Vancouver Island	94		23		0.2447
Expected Query Result (ferries 3years)
vessel name			total sailings		late sailings		late fraction
Bowen Queen			1268		83		0.0655
Coastal Celebration			6506		406		0.0624
Coastal Inspiration			6545		2502		0.3823
Coastal Renaissance			5637		1220		0.2164
Mayne Queen			1		0		0.0000
Queen of Alberni			6903		1164		0.1686
Queen of Capilano			14277		2806		0.1965
Queen of Coquitlam			6650		2230		0.3353
Queen of Cowichan			7071		2946		0.4166
Queen of Cumberland			974		113		0.1160
Queen of New Westminster			3966		439		0.1107
Queen of Oak Bay			6609		4592		0.6948
Queen of Surrey			12759		4555		0.3570
Quinitsa			8		8		1.0000
Skeena Queen			7312		378		0.0517
Spirit of British Columbia			5539		1013		0.1829
Spirit of Vancouver Island			5916		791		0.1337

Usually, you would expect that when the beginning of a sailing is delayed (that is, when actual_departure is much later than scheduled_departure), the arrival is also delayed and the sailing is late. However, in some cases, a vessel that departs late may still arrive on time. Define a ‘made up sailing’ to be any sailing which leaves at least 15 minutes after its scheduled departure but arrives less than (or equal to) five minutes late. Write a query to list the number of made up sailings in the dataset for each vessel. Only vessels with at least one made up sailing should be listed.

	Expected Query Result (ferries 1month)
	vessel name	made up sailings
	Coastal Inspiration	4
	Coastal Renaissance	2
	Spirit of British Columbia	3
Expected Query Result (ferries 3years)
vessel name			made up sailings
Coastal Celebration			109
Coastal Inspiration			70
Coastal Renaissance			53
Queen of Alberni			11
Queen of Coquitlam			7
Queen of Cowichan			1
Queen of New Westminster			28
Queen of Oak Bay			1
Skeena Queen			2
Spirit of British Columbia			57
Spirit of Vancouver Island			75

Define a ‘good day’ for a particular route number to be a day where at least one sailing occurred and no late sailings occurred. For each route, find the maximum number of good days in a row over the entire dataset.

Note that the rather unimpressive results for the ferries_1month dataset compared to the other datasets seem to be the result of COVID-19 related service reductions and extra precautions (which may be increasing the number of late sailings).

Hint: This is probably easiest when you use a large number of CTE expressions (that is, subqueries in the WITH clause).

Expected Query Result (ferries 1month)
route number	max consecutive good days
1	1
2	0
3	1
4	3
8	4
30	0
Expected Query Result (ferries 3years)
route number	max consecutive good days
1	15
2	2
3	4
4	27
8	8
30	6

For each route, output all date ranges where the maximum number of consecutive good days (as defined in part (f) above) was achieved. For some routes, only one date range will meet this criteria, but for others (e.g. route 2 in the ferries_3years dataset) there may be multiple date ranges of the same size. The columns containing the start and end dates should contain values of type DATE (created, for example, by the make_date function).

Expected Query Result (ferries 1month)
route number	start date	end date	max consecutive good days
1	2020-04-26	2020-04-26	1
3	2020-04-15	2020-04-15	1
4	2020-04-19	2020-04-21	3
8	2020-04-29	2020-05-02	4
Expected Query Result (ferries 3years)
route number	start date	end date	max consecutive good days
1	2017-12-21	2018-01-04	15
2	2017-05-20	2017-05-21	2
2	2017-09-26	2017-09-27	2
3	2018-02-24	2018-02-27	4
4	2018-09-02	2018-09-28	27
8	2019-03-06	2019-03-13	8
30	2017-09-23	2017-09-28	6

Question 3: VWSN Queries [10 marks]

Create queries for each of the data retrieval problems below, using the vwsn 1year database.

Find the highest observed temperature in the dataset, along with the station number, station name and observation time of all cases where that temperature was reported.

Note: The expected output below is correct (that is, the observation shown is actually in the dataset, even though it appears to be unseasonably warm).

	Expected Query Result
station id	name	temperature	observation time
180	Captain Meares Elementary Secondary School	42.6000	2019-10-05 15:11:00

For each station with station ID between 1 and 10 (inclusive), list the station ID, station name, maximum temperature observed at that station and observation time of all observations in the dataset in which the maximum temperature was attained at that station. Only include stations which actually have recorded observations in the dataset.

Expected Query Result
station id	name	max temperature	observation time
1	Ian Stewart Complex/Mt. Douglas High School	29.8000	2019-08-05 16:31:00
3	Strawberry Vale Elementary School	30.2000	2019-06-12 15:15:00
3	Strawberry Vale Elementary School	30.2000	2019-06-12 15:21:00
4	Oaklands Elementary School	30.3000	2019-06-12 14:46:00
4	Oaklands Elementary School	30.3000	2019-06-12 14:52:00
5	Cedar Hill Middle School	30.2000	2019-06-12 17:12:00
6	Marigold Elementary School/Spectrum High School	29.8000	2019-06-12 14:41:00
7	Campus View Elementary	29.6000	2019-06-12 17:22:00
8	Victoria High School	29.7000	2019-06-12 14:41:00
9	Frank Hobbs Elementary School	29.2000	2019-08-05 17:27:00
10	Macaulay Elementary School	27.3000	2019-06-12 12:26:00

Find the IDs and names of all stations which have reported at least one observation at some point, but which did not report any observations in January 2020.

Expected Query Result
station id	name
13	Shoreline Middle School
16	Tillicum Elementary School
31	Sangster Elementary School
32	Colwood Elementary School
36	Hans Helgesen Elementary School
57	Wishart Elementary School
62	Deep Cove Elementary School
70	Parkland Secondary School
81	Braefoot Elementary School
94	Pender Islands Elementary and Secondary School
103	Frances Kelsey Secondary School
105	Port Renfrew Elementary School
108	Alberni Weather
109	Brentwood Elementary School
112	Saturna Elementary School
113	Mayne Island School
117	Saltspring Elementary and Middle Schools
124	Seaview Elementary School
131	Glenlyon Norfolk Junior School
133	Discovery Elementary School
136	Pleasant Valley Elementary School
160	Shawnigan Lake School
161	Bamfield Marine Sciences Centre
166	Maquinna Elementary School
180	Captain Meares Elementary Secondary School
195	North Island Distance Education School
196	Valley View Elementary School
226	Ditidaht Community School

In this question (and the next question), define the ‘daily average temperature’ for a particular day to be the average of all observations from all stations on that day. Furthermore, for each month, define the ‘10 hottest days’ to be the top 10 days (by daily average temperature) and define the ‘10 coolest days’ to be the bottom 10 days (by daily average temperature). For each month/year pair in the dataset, compute the average daily average temperature across the ten hottest days and the average daily average temperature across the ten coolest days. Notice that you are computing an average of averages (first, find the average daily temperatures of each of the ten hottest days, then average those temperatures to produce the result). Hint:

Use the rank() function with an appropriate over() clause (maybe in multiple places)

Expected Query Result
year	month	hottest10 average	coolest10 average
2019	5	16.8834	12.0071
2019	6	17.8665	13.8339
2019	7	19.0044	15.9400
2019	8	19.7309	16.5326
2019	9	17.0929	12.3717
2019	10	10.8171	6.5475
2019	11	9.2537	3.6983
2019	12	7.3156	4.1378
2020	1	7.8368	0.9990
2020	2	6.3289	3.1054
2020	3	7.3842	3.7168
2020	4	11.5621	7.1426

List the day, month and year (in separate columns) of all days whose daily average temperature (see previous question) was lower than the daily average temperature of each of the previous 28 days in the dataset. The result should not list any of the first 28 days in the dataset, since the metric cannot be computed for those days, but the data for those days should be used to compute the rest of the result. Hint: You may want to use both the min() and count() aggregation functions in combination with an over() clause that includes both partitioning and windowing.

Expected Query Result
year	month	day
2019	8	21
2019	8	23
2019	9	12
2019	9	14
2019	9	15
2019	9	16
2019	9	17
2019	9	23
2019	9	27
2019	9	30
2019	10	1
2019	10	8
2019	10	9
2019	10	10
2019	10	28
2019	10	29
2019	10	30
2019	11	26
2019	11	28
2019	11	29
2020	1	9
2020	1	12
2020	1	13
2020	1	14
2020	3	14

[1] . A duration which is exactly five minutes longer is still considered late.

Assignment-4-xvqtgl.zip

CSC370 Assignment 4 Solved

If Helpful Share:

Description

Related products

CSC370 Assignment1-Keys and Superkeys Solved

CSC370 Assignment3-IMDB Queries Solved

CSC370 Assignment 6 Solved