B561 Assignment 2 Solved

35.00 $

Category:

Description

5/5 - (1 vote)

This assignment relies on the lectures

  • SQL Part 1 and SQL Part 2 (Pure SQL);
  • Views;
  • Relational Algebra (RA); and
  • Joins and semijoins.

Furthermore, the lecture on the correspondence between Tuple Relational Calculus and SQL should help you to solve problems in this assignment.

To turn in your assignment, you will need to upload to Canvas a single file with name assignment2.sql which contains the necessary SQL statements that solve the problems in this assignment. The assignment2.sql file must be so that the AI’s can run it in their PostgreSQL environment. You should use the Assignment2Script.sql file to construct the assignment2.sql file. (Note that the data to be used for this assignment is included in this file.) In addition, you will need to upload a separate assignment2.txt file that contains the results of running your queries. Finally, you need to upload a file assignment2.pdf that contains the solutions to the problems that require it.

The problems that need to be included in the assignment2.sql are marked with a blue bullet •. The problems that need to be included in the assignment2.pdf are marked with a red bullet •. (You should aim to use

Latex to construct your .pdf file.)

Database schema and instances

For the problems in this assignment we will use the following database schema:[1]

Person(pid, pname, city)

Company(cname,headquarter) Skill(skill) worksFor(pid, cname, salary) companyLocation(cname,city) personSkill(pid,skill) hasManager(eid,mid) Knows(pid1,pid2)

In this database we maintain a set of persons (Person), a set of companies (Company), and a set of (job) skills (Skill). The pname attribute in Person is the name of the person. The city attribute in Person specifies the city in which the person lives. The cname attribute in Company is the name of the company. The headquarter attribute in Company is the name of the city wherein the company has its headquarter. The skill attribute in Skill is the name of a (job) skill.

A person can work for at most one company. This information is maintained in the worksFor relation. (We permit that a person does not work for any company.) The salary attribute in worksFor specifies the salary made by the person.

The city attribute in companyLocation indicates a city in which the company is located. (Companies may be located in multiple cities.)

A person can have multiple job skills. This information is maintained in the personSkill relation. A job skill can be the job skill of multiple persons. (A person may not have any job skills, and a job skill may have no persons with that skill.)

A pair (e,m) in hasManager indicates that person e has person m as one of his or her managers. We permit that an employee has multiple managers and that a manager may manage multiple employees. (It is possible that an employee has no manager and that an employee is not a manager.) We further require that an employee and his or her managers must work for the same company.

The relation Knows maintains a set of pairs (p1,p2) where p1 and p2 are pids of persons. The pair (p1,p2) indicates that the person with pid p1 knows the person with pid p2. We do not assume that the relation Knows is symmetric: it is possible that (p1,p2) is in the relation but that (p2,p1) is not.

The domain for the attributes pid, pid1, pid2, salary, eid, and mid is integer. The domain for all other attributes is text. We assume the following foreign key constraints:

  • pid is a foreign key in worksFor referencing the primary key pid in Person;
  • cname is a foreign key in worksFor referencing the primary key cname in Company;
  • cname is a foreign key in companyLocation referencing the primary key cname in Company;
  • pid is a foreign key in personSkill referencing the primary key pid in Person;
  • skill is a foreign key in personSkill referencing the primary key skill in Skill;
  • eid is a foreign key in hasManager referencing the primary key pid in Person;
  • mid is a foreign key in hasManager referencing the primary key pid in Person;
  • pid1 is a foreign key in Knows referencing the primary key pid in Person; and
  • pid2 is a foreign key in Knows referencing the primary key pid in Person

The file Assignment2Script.sql contains the data supplied for this assignment.

Pure SQL and RA SQL

In this assignemt, we distinguish between Pure SQL and RA SQL. Below we list the only features that are allowed in Pure SQL and in RA SQL.

In particular notice that

  • JOIN, NATURAL JOIN, and CROSS JOIN are not allowed in Pure SQL.
  • The predicates [NOT] IN, SOME, ALL, [NOT] EXISTS are not allowed in RA SQL.

The only features allowed in Pure SQL

SELECT … FROM … WHERE WITH …

UNION, INTERSECT, EXCEPT operations

EXISTS and NOT EXISTS predicates

IN and NOT IN predicates

ALL and SOME predicates

VIEWs that can only use the above RA SQL features

The only features allowed in Pure SQL features

SELECT … FROM … WHERE WITH …

UNION, INTERSECT, EXCEPT operations

JOIN … ON …, NATURAL JOIN, and CROSS JOIN operations VIEWs that can only use the above RA SQL features commas in the FROM clause are not allowed

1        Formulating queries in Pure SQL with and without set predicates

An important consideration in formulating queries is to contemplate how they can be written in different, but equivalent, ways. In fact, this is an aspect of programming in general and, as can expected, is also true for SQL. A learning outcome of this course is to acquire skills for writing queries in different ways. One of the main motivations for this is to learn that different formulations of the same query can dramatically impact performance, sometimes by orders of magnitude.

For the problems in this section, you will need to formulate queries in Pure SQL with and without set predicates. You can use the SQL operators INTERSECT, UNION, and EXCEPT, unless otherwise specified. You are however allowed and encouraged to use views including temporary and user-defined views.

To illustrate what you need to do, consider the following example.

Example 1 Consider the query “Find the pid and name of each person who (a) works for a company located in Bloomington and (b) knows a person who lives in Chicago.

  • Formulate this query in Pure SQL by only using the EXISTS or NOT EXISTS set predicates. You can not use the set operations INTERSECT, UNION, and EXCEPT. A possible solution is

select p.pid, p.pname from          Person p where exists (select 1 from      worksFor w where p.pid = w.pid and exists (select 1 from    companyLocation c where w.cname = c.cname and c.city = ’Bloomington’)) and

exists (select

from              Knows k where p.pid = k.pid1 and exists (select 1 from         Person p2 where k.pid2 = p2.pid and p2.city = ’Chicago’));

  • Formulate this query in Pure SQL by only using the IN, NOT IN, SOME, or ALL set membership predicates. You can not use the set operations INTERSECT, UNION, and EXCEPT. A possible solution is

select p.pid, p.pname from          Person p where p.pid in (select w.pid from             worksFor w where w.cname in (select c.cname from     companyLocation c where c.city = ’Bloomington’)) and

p.pid in (select k.pid1 from           Knows k where k.pid2 in (select p2.pid

from   Person p2 where p2.city = ’Chicago’)); Another possible solution using the SOME and IN predicates

select p.pid, p.pname from          Person p where p.pid = some (select w.pid from   worksFor w where w.cname = some (select c.cname from   companyLocation c where c.city = ’Bloomington’)) and

p.pid in (select k.pid1 from           Knows k where k.pid2 in (select p2.pid

from               Person p2 where p2.city = ’Chicago’));

(c) Formulate this query in Pure SQL without using set predicates. A possible solution is

select p.pid, p.pname

from              Person p, worksFor w, companyLocation c where p.pid = w.pid and

w.cname = c.cname and

c.city = ’Bloomington’

intersect select p1.pid, p1.pname from      Person p1, Knows k, Person p2 where k.pid1 = p1.pid and k.pid2 = p2.pid and p2.city = ’Chicago’;

We now turn to the problems for this section.

  1. Consider the query “Find the pid and name of each person who works for Google and who has a strictly higher salary than some other person who he or she knows and who also works for Google.”
    • Formulate this query in Pure SQL by only using the EXISTS or NOT EXISTS set predicates.
    • Formulate this query in Pure SQL by only using the IN, NOT IN, SOME, or ALL set membership predicates.
    • Formulate this query in Pure SQL without using set predicates.
  2. Consider the query “Find the cname of each company with headquarter in Cupertino, but that is not located in Indianapolis, along with the pid, name, and salary of each person who works for that company and who has the next-to-lowest salary at that company.[2]
    • Formulate this query in Pure SQL by only using the EXISTS or NOT EXISTS set predicates. You can not use the set operations INTERSECT, UNION, and EXCEPT.
    • Formulate this query in Pure SQL by only using the IN, NOT IN, SOME, or ALL set membership predicates. You can not use the set operations INTERSECT, UNION, and EXCEPT.
    • Formulate this query in Pure SQL without using set predicates.
  3. Consider the query “Find each (c,p) pair where c is the cname of a company and p is the pid of a person who works for that company and who is known by all other persons who work for that company.
    • Formulate this query in Pure SQL by only using the EXISTS or NOT EXISTS set predicates. You can not use the set operations INTERSECT, UNION, and EXCEPT.
    • Formulate this query in Pure SQL by only using the IN, NOT IN, SOME, or ALL set membership predicates. You can not use the set operations INTERSECT, UNION, and EXCEPT.
    • Formulate this query in Pure SQL without using set predicates.
  4. Consider the query “Find each skill that is not a jobskill of any person who works for Yahoo or for Netflix.”
  • Formulate this query in Pure SQL using subqueries and set predicates. You can not use the set operations INTERSECT, UNION, and EXCEPT.
  • Formulate this query in Pure SQL without using predicates.
  1. Consider the query “Find the pid and name of each person who manages all-but-one person who works for Google.
  • Formulate this query in Pure SQL using subqueries and set predicates. You can not use the set operations INTERSECT, UNION, and EXCEPT.
  • Formulate this query in Pure SQL without using set predicates.

2              Formulating queries in Relational Algebra and RA SQL

Reconsider the queries in Section 1. The goal of the problems in this section is to formulate these queries in Relational Algebra in standard notation and in RA SQL.

There are some further restrictions:

  • You can only use WHERE clauses that use conditions involving constants. For example conditions of the form “’a’” are allowed, but conditions of the form ’t.Aθ s.B’ are not allowed. The latter conditions can be absorbed in JOIN operations in the FROM clause.
  • You can not use commas in any FROM Rather, you should use JOIN operations.

You can use the following letters, or indexed letters, to denote relation names in RA expressions:

P, P1, P2,···Person

C, C1, C2,···Company

S, S1, S2,···Skill

W, W1, W2,···worksFor cL, cL1, cL2,···companyLocation pS, pS1, pS2,···personSkill

M, M1, M2,···hasManager

K, K1, K2,···Knows

To illustrate what you need to do reconsider the query in Example 1 in Section 1.

Example 2 Consider the query “Find the pid and name of each person who (a) works for a company located in Bloomington and (b) knows a person who lives in Chicago.”

(a) Formulate this query in Relational Algebra in standard notation.

A possible solution is

πpid,pname(Person ./ worksFor ./ πcname(σcity=Bloomington(companyLocation)))∩ πPerson1.pid,Person1.pname(Person1 ./Person1.pid=pid1 Knows ./pid2=Person2.pid πPerson2.pid(σcity=Chicago(Person2)))

If we use the letters in the above box, this expression becomes more succinct:

πpid,pname(P ./ W ./ πcname(σcity=Bloomington(cL)))∩ πP1.pid,P1.pname(P1 ./P1.pid=pid1 K ./pid2=P2.pid πP2.pid(σcity=Chicago(P2)))

You are also allowed to introduce letters that denote expressions. For example, let E and F denote the expression πpid,pname(P ./ W ./ πcname(σcity=Bloomington(cL))) and

πP1.pid,P1.pname(P1 ./P1.pid=pid1 K ./pid2=P2.pid πP2.pid(σcity=Chicago(P2))), respectively. Then we can write the solution as follows:

E F.

(b) Formulate this query in RA SQL. A possible solution is

select pid, pname from                 Person

NATURAL JOIN worksFor

NATURAL JOIN (select cname from               companyLocation where city = ’Bloomington’) C

INTERSECT select P1.pid, P1.pname from  Person P1

JOIN Knows ON (P1.pid = pid1)

JOIN (select pid from    Person

where city = ’Chicago’) P2 ON (pid2 = P2.pid) order by 1,2;

Observe that the WHERE clauses only use conditions involving constants.

We now turn to the problems in this section.

  1. Reconsider Problem 1. Find the pid and name of each person who works for Google and who has a strictly higher salary than some other person who he or she knows and who also works for

Google.

  • Formulate this query in Relational Algebra in standard notation.
  • Formulate this query in RA SQL.
  1. Reconsider Problem 2. Find the cname of each company with headquarter in Bloomington, but not located in Indianapolis, along with the pid, name, and salary of each person who works for that company and who has the next-to-lowest salary at that company.
    • Formulate this query in Relational Algebra in standard notation.
    • Formulate this query in RA SQL.
  2. Reconsider Problem 3. Find each (c,p) pair where c is the cname of a company and p is the pid of a person who works for that company and who is known by all other persons who work for that company.
    • Formulate this query in Relational Algebra in standard notation.
    • Formulate this query in RA SQL.
  3. Reconsider Problem 4. Find each skill that is not a jobskill of any person who works for Yahoo or for Netflix.
    • Formulate this query in Relational Algebra in standard notation.
    • Formulate this query in RA SQL.
  4. Reconsider Problem 5. Find the pid and name of each person who manages all-but-one person who works for Google.
  • Formulate this query in Relational Algebra in standard notation.
  • Formulate this query in RA SQL.

3        Formulating constraints using Relational Algebra

Recall that it is possible to express constraints in TRC and as boolean SQL queries. It is also possible to write constraints using RA expressions. Let E, F, and G denote RA expressions. Then we can write RA expression comparisons that express constraints:

E 6= ∅      which is true if E evaluates to an non-empty relation

  • = ∅ which is true if E evaluates to the empty relation
  • G which is true if F evaluates to a relation that is a subset of the relation obtained from G

F 6⊆ G    which is true if F evaluates to a relation that is not a subset of the relation obtained from G

Here are some examples of writing constraints in this manner.

Example 3 Some person works for Google.” This constraint can be written as follows:

πpid(σcname=Google(worksFor)) 6= ∅.

Indeed, the RA expression

πpid(σcname=Google(worksFor))

computes the set of all person pids who work for Google. If this set is not empty then there are indeed persons who work for Google.

Incidentally, the constraint “No one works for Google” can be written as follows:

πpid(σcname=Google(worksFor)) = ∅.

Example 4 Each person has at least two skills. This constraint can be written as follows:

πpid(P) ⊆ πpS1.pid(σpS1.skill6=pS2.skill(pS1 ./pS1.pid=pS2.pid pS2)).

Indeed,

πpid(P)

computes the set of all person pids and

πpS1.pid(σpS1.skill6=pS2.skill(pS1 ./pS1.pid=pS2.pid pS2))

computes the set of all pids of persons who have at least two skills. When the first set is contained in the second we must have that each person has at least two skills.

Incidentally, the constraint “Some person has fewer than two skills” can be written as follows:

πpid(P) 6⊆ πpS1.pid(σpS1.skill6=pS2.skill(pS1 ./pS1.pid=pS2.pid pS2)).

We now turn to the problems in this section.

Formulate each of the following constraints using RA expressions as illustrated in Example 3 and Example 4.

  1. Some person has a salary that is strictly lower than that of each of the persons who he or she manages.
  2. No person knows all persons who work for Google.
  3. Each person knows all of his or her managers.
  4. Each employee and his or her managers work for the same company.
  5. The attribute pid is a primary key of the Person relation.

4        Formulating queries in SQL using views

Formulate the following views and queries in SQL. You are allowed to combine the features of both Pure SQL and RA SQL.

  1. Create a view Triangle that contains each triple of pids of different persons (p1,p2,p3) that mutually know each other.

Then test your view.

  1. Define a parameterized view SalaryBelow(cname text, salary integer) that returns, for a given company identified by cname and a given salary value, the subrelation of Person of persons who work for company cname and whose salary is strictly below salary.

Test your view for the parameter values ( ’IBM’,60000), (’IBM’,50000), and (’Apple’,65000).

  1. Define a parameterized view KnowsPersonAtCompany(p integer, c text) that returns for a person with pid p the subrelation of Person of persons who p knows and who work for the company with cname c.

Test your view for the parameters (1001, ‘Amazon’), (1001,‘Apple’), and (1015,‘Netflix’).

  1. Define a parameterized view KnownByPersonAtCompany(p integer, c text) that returns the subrelation of Person of persons who know the person with pid p and who work for the company with cname c.

Test your view for the parameters (1001, ‘Amazon’), (1001,‘Apple’), and (1015,‘Netflix’).

  1. Let Tree(parent : integer,child : integer) be a rooted parentchild tree. So a pair (n,m) in Tree indicates that node n is a parent of node m. The sameGeneration(n1, n2) binary relation is inductively defined using the following two rules:
    • If n is a node in T, then the pair (n,n) is in the sameGeneration (Base rule)
    • If n1 is the parent of m1 in Tree and n2 is the parent of m2 in Tree and (n1,n2) is a pair in the sameGeneration relation then (m1,m2) is a pair in the sameGeneration

(Inductive Rule)

Write a recursive view for the sameGeneration relation.

Test your view.

[1] The primary key, which may consist of one or more attributes, of each of these relations is underlined.

[2] By definition, a salary s is next-to-lowest if there exists salary s1 with s1 < s, but there do not exist two salaries s1 and s2 with s2 <s1 <s.

  • Assignment_2-d4vn13.zip