CS2400 Homework 6-Email Addresses  Solved  

30.00 $

Category:
Click Category Button to View Your Next Assignment | Homework

You'll get a download link with a: zip solution files instantly, after Payment

Securely Powered by: Secure Checkout

Description

Rate this product

 

Search engine companies, like Google, often search internet websites for the purpose of data collections. Programs, called web scrapers, collect information for the purpose of indexing the sites and collecting other information such as email addresses, phone numbers, etc. You’ve been asked to access a website and only extract email addresses from it. The site file often has the extension “.html”. See below for an example site that you can use.

Email addresses are tagged within the site as follows (note that other formats may be included):

 

<a href=”mailto:[email protected]”> Send email </a>

<a href=”mailto:[email protected]”> Send email </a>

<a href=”mailto:[email protected]”> Send email </a>

 

Where “[email protected]” is the email address, “bob” is the user name, and “ohio.edu” is the domain name.

Write a program that processes a website file and extracts all the email addresses from the site and stores the emails in parallel arrays or vectors (emails, users, domains). If you’re using arrays, you may assume that the number of emails will not exceed 1000. You only need to extract email addresses that conform to the tag formats specified above.

Output the following the number of lines process and the number of unique emails extracted to the screen:

 

51 lines processed

20 emails found

 

Write a function that outputs the data to a file as follows:

 

Email                         user           domain

————————————————————–

[email protected]                  bob            ohio.edu

[email protected]     bob.smith      ace.cs.ohio.edu [email protected]         cs2400    gmail.com

[email protected]         bob            bob-cats.ohio.edu

 

Read the input file one line at a time (hint: use getline) and process it. Note that a line may have more than one email address. Process lines until the end of the input file is reached. For each email address extracted, split it into a user and a domain. Use three arrays to store the email addresses, users, and domains.  Before storing the email into the arrays, make sure the email is not in the array already. The array of emails should only contain unique email addresses. Your program should only output unique email addresses to the output file.

 

The name of the input and output file names must be provided at the command line. For example:

./a.out website.html output.txt

Report errors if the number of arguments is incorrect or the either file is not accessible.

 

You may use any function or library discussed in class or in the chapters we covered from your textbook. Do not use any other libraries or functions.

 

Hints:

  • splitEmailAddress: A function that splits an email address and returns the two parts of it as reference parameters.
  • isFound: A search function for an array of strings.
  • getLineEmails: A function that takes a string and extracts all the emails from a single line into the parallel arrays and check for uniqueness. You may want to call splitEmailAddress each time.

 

How to get a sample data file

  • Browse to the website
  • https://www.ohio.edu/engineering/about/people/departmentallisting.cfm#ElectricalEngineeringandComputerScience
  • Save the source code of the file o In Chrome: Right click on the page background, select save as “html” file. Choose a name for the file. o In Safari: Right click on the page background, select “Save Page as” and make sure the “Page Source” is selected. Choose a name for your file.
  • A sample file is provided with the assignment.
  • 06-hw-extract-emails-GmcgeeCodes-pye20n.zip