IFB104 – Assignment 2, Part A: News Feed Aggregator Solved

24.99 $

Category: Tags: , , , , , , , , ,
Click Category Button to View Your Next Assignment | Homework

You'll get a download link with a: zip solution files instantly, after Payment

Securely Powered by: Secure Checkout

Description

5/5 - (1 vote)

Overview
This is the first part of a two-part assignment. This part is worth 21% of your final grade for IFB104. Part B will be worth a further 4%. Part B is intended as a last-minute extension to the assignment, thereby testing the maintainability of your solution to Part A and your ability to work under time pressure. The instructions for completing Part B will not be released until Week 12. Whether or not you complete Part B you will submit only one solution, and receive only one mark, for the whole 25% assignment.
This a complex and challenging assignment. If you are unable to solve the whole problem, submit whichever parts you can get working. You will receive partial marks for incomplete solutions.
Motivation
The way we consume news has changed dramatically in recent years. The days of morning and afternoon home newspaper deliveries are long gone. Where readers were once restricted to a handful of local news sources, we now have a bewildering range of online options from around the world. Most newspapers, radio and television stations now make their news services available online, in addition to new purely online news services. Making sense of this cacophony is a challenge.
One response is news aggregation services. These allow readers to create their own news channels by mixing their preferred news sources together into a single stream. In this assignment you will create your own news aggregation application in Python. Your program will have a Graphical User Interface that allows its user to select how many stories they want to see from each source and then export an HTML document containing the selected stories. This document can be examined in a standard web browser or printed as a hardcopy.
This “capstone” assignment is designed to incorporate all of the concepts taught in IFB104. To complete it you will need to: (a) use Tkinter to create an interactive Graphical User Interface; (b) download web documents using a Python script and use pattern matching to extract specific elements from them; and (c) generate an HTML document that integrates the extracted elements, presenting them in an attractive, easy-to-read format.
Goal
Your aim in this assignment is to develop an interactive “app” which allows its users to select how many news stories they want to see from several different news sources. There must be at least four different sources, two of them “live” news feeds and two “archives” of previously-downloaded news items. Most importantly, the two online web documents from which you get your “live” news must be ones that are updated on a continuous basis (at least daily but preferably much more often) so your program needs to be resilient to changes in the source documents. These two news sources must also come from different web sites (i.e., different web servers), to allow for one of the sites being temporarily offline.
Using these news sources you are required to build an IT system with the following general architecture.

Your application will be a Python program with a Graphical User Interface. Under the user’s control, it allows news feeds to be previewed in the GUI, from both online and archived news sources. When the user is happy with their selections they can then export the selected stories as an HTML document. This document will contain full detail of each story and can be studied by the user in any standard web browser.
This is a large and complex project, so its design allows it to be completed in distinct stages. You should aim to build the system incrementally, rather than trying to solve the whole problem at once. A suggested development sequence is:
1. Develop code that allows the static, archived news stories to be previewed in the GUI.
2. Extend your solution so that it allows “live” news stories to be previewed in the GUI.
3. Extend your solution further so that the user’s selected stories can be exported as an HTML document.
If you can’t complete the whole assignment submit whatever parts you can get working. You will get partial marks for incomplete solutions (see the marking guide below).
Illustrative example
The screenshot below shows our example solution’s GUI when first started. We’ve called it The ‘Smooth Blend’ News Mixer and have included a suitably evocative image of someone reading news from an RSS feed, but you should choose your own name and GUI design.

The GUI has four ‘spin box’ widgets allowing the user to select how many stories they want to see from each source, a scrollable text area for displaying previews of the selected stories, and a push button for exporting the selections. You do not need to copy our example and are encouraged to design your own GUI with equivalent functionality. For instance, pull-down menus or text entry boxes could be used for making the selections rather than spin boxes.
Selecting archived stories
When the user chooses a number of stories from the two archived sources, the application extracts headlines and publication dates for each story from local files, previously downloaded from the web. For instance, in the screenshot below the user has chosen to see two stories from our archived copy of the Queensland Times and one from the Crikey archive file.

Exporting the selected stories
Happy with their selections, the user then presses the “Export” button. This causes our application to generate an HTML file called news.html (in the same folder as the Python program). This document contains copies of the same stories previewed in the GUI, plus additional detail including a photo and a short story summary.

The document is well presented and the various elements of each story all match one another. Importantly, all the images in the exported document, including the “splash” image up the top and the story photos, are online images, not ones stored on our local computer. To ensure that the exported web document is portable and can be viewed on any computer, the photos are all links to online images using appropriate HTML “img” tags and URLs.
You are not required to follow the details of our demonstration GUI or exported HTML document. You are strongly encouraged to use your own skills and initiative to design your own solution, provided it has all the functionality and features described herein.
Selecting current stories
Tiring of reading “old news”, our user next selects some “live” stories in the GUI as shown below, as well as changing the choice of archived stories in the mix.

Our user also selected four stories from FOX News Entertainment. Scrolling down in the preview pane reveals these stories as well.

Exporting the selected stories (again)

Extracting the HTML elements
To produce the news story details for displaying in the GUI and exporting as part of our HTML document, our application used regular expressions to extract elements from the relevant source web documents, whether they were stored in the static archive or are downloaded from the Internet whenever the program runs.

Sometimes it’s easier to use other Python features as well as, or instead of, regular expressions to help extract the data. For instance, we found that our regular expression for extracting headlines from the FOX Entertainment web site also matched the two “FOX News” titles at the top of the web page in addition to the headlines we wanted. Rather than complicating our regular expression, we therefore simply deleted the first two items returned by findall each time we extracted headlines from this page. (We also found that the URLs for the photos had inconsistent formats in this site, making them difficult to extract, so we don’t recommend using this site in your own solution.) Most importantly, you must extract elements in a general way that will still work when the contents of the source web page are updated.
Obviously working with such complex code is challenging. You should begin with your static, “archived” documents to get some practice at pattern matching before trying the dynamically changeable web documents downloaded from online.
Care was also taken to ensure that no HTML/XML tags or other HTML entities appeared in the extracted text when displayed in either the GUI or the exported HTML document. In some cases it was necessary to delete or replace such mark-ups in the text after it was extracted from the original web document. The information seen by the user must not contain any extraneous tags or unusual characters that would interfere with the appearance of the news stories either in the GUI or the exported document.
Exporting the HTML document
Our program creates the exported HTML document by writing code into a text file, integrating the various elements extracted from the news feeds. Two segments of the HTML code generated by our Python program are shown below. Although not intended for human consumption, the HTML code is nonetheless laid out neatly, and with comments indicating the purpose of each part. Your HTML code must also be well presented to facilitate future maintenance of your application.

Robustness
Another important aspect of your solutin is that it must be resilient to error. The biggest risk with this kind of program is problems accessing the source web sites. We have attempted to make our download function as robust as possible. In particular, if it detects an error while downloading a web document it returns the special value None instead of a character string, so your program should allow for this. (We don’t claim that the download function is infallible, however, because the results it produces are dependent on the behaviour of your specific Internet connection. For instance, some systems will generate a default web document when an online site can’t be reached, in which case the download function will be unaware that a failure has occurred and won’t return None.)
For instance, in our demonstration solution the GUI alerts the user to a failure to download a web site as follows.

Therefore, as insurance against the risk of a web site failing completely, your program’s two “live” web sources must come from different web servers. One way of achieving this is to ensure that the part of the address at the beginning of each site’sURL is entirely distinct. For example, our sample solution used two totally different sources for the “live” news feeds, the Canberra Times and FOX Entertainment. These two sites have the following URLs and clearly come from different web servers.

(Since they never change, there is no need to use distinct servers for the two “archived” documents. Nonetheless, we did so in our sample solution to make the program more interesting.)
Specific requirements and marking guide
To complete this part of the assignment you are required to produce an application in Python 3 with features equivalent to those above, using the provided news_aggregator.py template file as your starting point. In addition you must provide the two (or more) previously-downloaded web documents that serve as your archive of “old news” and one or more image files needed to support your GUI. (However, all of the images in the exported HTML file must be online images and must not be included in your submission.) Your complete solution must support at least the following features.
• An intuitive Graphical User Interface (4%). Your application must provide an attractive, easy-to-use GUI which has all the features needed for the user to choose how many news stories they want from each of four news feeds (two “archived” and two “live”), preview the headlines for their selections, and export the complete stories as a web document. You have a free choice of which Tkinter widgets to use to do the job, as long as they are effective and clear for the user. This interface must have the following features:
o An image which acts as a “logo” to identify your application. The image file should be included in the same folder as your Python application.
o One or more widgets that allow the user to select how many stories they want to see from each of four news feeds (two “archived” and two “live”).
o One or more widgets that allow the user to see details of the stories selected (headlines, sources and publication dates).
o One or more widgets that allow the user to choose whether or not to export their story selections as an HTML document.
Note that this criterion concerns the front-end user interface only, not the back-end functionality. Functionality is assessed in the following criteria.
• Previewing archived news stories in the GUI (4%). Your GUI must be capable of displaying the top stories, in the quantities selected by the user, from each of two distinct sources of “archived” news, allowing selection of up to ten stories per source.
For each story the GUI must display o the headline, o the news source (usually the name of a newspaper, magazine, TV or radio station), and
• Previewing “live”news stories in the GUI (4%). Your GUI must be capable of displaying the top stories, in the quantities selected by the user, from each of two distinct sources of “live” news, allowing selection of up to ten stories per source. For each story the GUI must display o the headline,
o the news source (usually the name of a newspaper, magazine, TV or radio station), and
The necessary elements must be extracted from HTML/XML files directly downloaded from the web while your Python program is running. Pattern matching must be used to extract the relevant elements from the documents so that the code still works even after the online documents are updated. The chosen source web sites must be ones that are updated on a regular basis, at least daily and preferably hourly. The two source web sites must come from different web servers (as insurance against one of the web sites being offline when your assignment is assessed).
o A heading identifying your application.
o A “splash” image characterising your application, downloaded from online when the generated HTML document is viewed (i.e., not from a local file on the host computer).
o Details of each of the news stories selected by the user in the GUI. For each story at least the following information must be displayed:
 The headline.
 A photograph or image illustrating the story.
 A short story summary or description.
 The identity of the original news feed (typically a newspaper, magazine, TV or radio station).
All of this information must be extracted via pattern matching from HTML documents downloaded from the web. Most importantly, each of these sets of items must all belong together, e.g., you can’t have the headline of one story paired with a photo from another story. Each of the elements must be extracted from the original document(s) separately and used to construct your own HTML document.
When viewed in a web browser the exported document must be neatly laid out and appear well-presented regardless of the browser window’s dimensions. The textual parts extracted from the original documents must not contain any visible HTML tags or entities or any other spurious characters. The images must all be links to images found online, not in local files, must be of a size compatible with the rest of the document, and their original aspect ratio must be preserved (i.e., they should not be stretched in just one direction).
However, your solution is not required to follow precisely our example shown above. Instead you are strongly encouraged to be creative in your choices of web sites, the design of your Graphical User Interface, and the design of your generated HTML document.
Support tools
To get started on this task you need to download various web documents of your choice and work out how to extract the necessary elements for displaying data in the GUI and generating the HTML output file. You also need to allow for the fact that the contents of the web documents from which you get your data will change regularly, so you cannot hardwire the locations of the elements into your program. Instead you must use Python’s string find method and/or regular expression findall function to extract the necessary elements, no matter where they appear in the HTML/XML source code.
To help you develop your solution, we have included two small Python programs with these instructions.
1. downloader is a Python program containing a function called download that downloads and saves the source code of a web document as a text file, as well as returning the document’s contents to the caller as a character string. A copy of this function also appears in the provided program template. You can use it both to save copies of your chosen web documents for storage in your “archive”, as well as to download “live” web documents in your Python application at run time. Although recommended, you are not required to use this function in your solution, if you prefer to write your own “downloading” code to do the job.
2. regex_tester is an interactive program introduced in the lectures and workshops which makes it easy to experiment with different regular expressions on small text segments. You can use this together with downloaded text from the web to help perfect your regular expressions. (There are also many online tools that do the same job you can use instead.)
Portability
Internet ethics: Responsible scraping
In this situation it’s possible to trick the web server into delivering you the desired document by having your Python script impersonate a standard web browser. To do this you need to change the “user agent” identity enclosed in the request sent to the web server. The provided download function has an option that disguises its true identitity. We leave it to your own conscience whether or not you wish to activate this feature, but note that this assignment can be completed successfully without resorting to such subterfuge.
Deliverables
You should develop your solution by completing and submitting the provided Python template file news_aggregator.py. Submit this in a “zip” archive containing all the files needed to support your application as follows:
1. Your news_aggregator.py solution. Make sure you have completed the statement at the beginning of the Python file to confirm that this is your own individual work by inserting your name and student number in the places indicated. Submissions without a completed statement will be assumed not to be your own work.
2. One or more small image files needed to support your GUI interface, but no other image files.
Once you have completed your solution and have zipped up these items submit them to Blackboard as a single file. Submit your solution compressed as a “zip” archive. Do not use other compression formats such as “rar” or “7z”.
Apart from working correctly your Python and HTML code must be well-presented and easy to understand, thanks to (sparse) commenting that explains the purpose of significant elements and helpful choices of variable, parameter and function names. Professional presentation of your code will be taken into account when marking this assignment.
If you are unable to solve the whole problem, submit whatever parts you can get working. You will receive partial marks for incomplete solutions.
How to submit your solution
The following links point to Rich Site Summary, a.k.a. Really Simple Syndication, web feed documents. RSS documents are written in XML and are used for publishing information that is updated frequently in a format that can be displayed by RSS reader software. Such documents have a simple standardised format, so we can rely on them always formatting their contents in the same way, making it relatively easy to extract specific elements from the document’s source code via pattern matching.
However, a disadvantage of using RSS feeds is that they can be hard to find! Often you can discover them only by looking for the RSS symbol at the bottom of web pages or by trialand-error to create the corresponding URL. Because RSS feeds are not intended for human consumption, they don’t usually feature prominently in the results of web searches using standard search engines such as Google, DuckDuckGo, Bing, etc. You will need to do some exploration online to find suitable news feeds for your solution.
The following web sites are RSS feeds which we found could be viewed in a standard web browser. We have not confirmed that these are all ideally suited to the assignment. You will need to work that out for yourself. In particular, we have not checked to see if their servers deny access to programs other than web browsers, nor have we checked to see what data they deliver to Python programs.
The following RSS news feeds appear to have headlines, publication dates, story summaries and links to photos all on the same page. (We used some of these for our demonstration solution.)
o http://www.sbs.com.au/news/rss/Section/Top+Stories — SBS News: Top stories o https://www.9news.com.au/rss — Channel 9 (Australia) news o https://www.qt.com.au/feeds/rss/homepage — Queensland Times o http://www.canberratimes.com.au/rss.xml — Canberra Times: Local news o https://www.whitsundaytimes.com.au/feeds/rss/homepage — Whitsunday Times o https://www.crikey.com.au/feed/ — Crikey: Australian news and politics o http://www.goulburnpost.com.au/rss.xml — Goulburn Post: Local news o https://www.darkreading.com/rss_simple.asp — Dark Reading: Security news
o https://www.abc.net.au/radionational/feed/2890360/podcast.xml — ABC Radio National Breakfast news (includes audio links)
o http://www.perthnow.com.au/feed — Perth Now: Breaking news o https://www.cnet.com/rss/news/ — CNet technology news
o https://www.news-mail.com.au/feeds/rss/homepage — Bundaberg News-Mail o http://rss.upi.com/news/news.rss — UPI news
o https://www.coffscoastadvocate.com.au/feeds/rss/homepage — Coffs Coast Advocate o https://www.dailyexaminer.com.au/feeds/rss/homepage — Grafton Daily Examiner o https://www.sunshinecoastdaily.com.au/feeds/rss/homepage — Sunshine Coast Daily o https://www.dailymail.co.uk/articles.rss — Daily Mail (UK): Latest news o https://www.dailymail.co.uk/auhome/index.rss — Daily Mail: Australian news o https://www.dailymail.co.uk/tvshowbiz/index.rss — Daily Mail: TV and Showbiz o https://www.dailymail.co.uk/sciencetech/index.rss — Daily Mail: Science news
o https://www.blogger.com/feeds/2886832199291333748/posts/default — Vivian Violine’s blog: Entertainment and business news
o http://rss.cnn.com/rss/cnn_topstories.rss — CNN News: Top Stories
o https://www.themorningbulletin.com.au/feeds/rss/homepage — Rockhampton Morning Bulletin o http://feeds.nbcnews.com/feeds/topstories — NBC News o https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml — New York Times: Top Stories o http://www.nytimes.com/services/xml/rss/nyt/Business.xml — New York Times: Business News o https://rss.nytimes.com/services/xml/rss/nyt/Technology.xml — New York Times: Tech News o http://rssfeeds.usatoday.com/usatoday-NewsTopStories — USA Today: Top stories o http://rssfeeds.usatoday.com/usatoday-LifeTopStories — USA Today: Lifestyle and entertainment o http://rssfeeds.usatoday.com/usatoday-TechTopStories — USA Today: Tech news (not updated as often as other USA Today pages)
o https://threatpost.com/feed/ — Threatpost: Security news o https://www.infoworld.com/index.rss — InfoWorld o https://www.thechronicle.com.au/feeds/rss/homepage — Toowoomba Chronicle o https://www.indiatvnews.com/rssnews/topstory.xml — India TV News: Top stories o https://www.indiatvnews.com/rssnews/topstory-world.xml — India TV News: World news o https://www.indiatvnews.com/rssnews/topstory-entertainment.xml — India TV News: Entertainment o http://feeds.foxnews.com/foxnews/world — Fox News: World news o http://feeds.foxnews.com/foxnews/entertainment — Fox News: Entertainment news o http://feeds.foxnews.com/foxnews/science — Fox News: Science news o http://feeds.foxnews.com/foxnews/tech — Fox News: Technology news o https://nypost.com/feed/ — New York Post: All stories o https://nypost.com/news/feed/ — New York Post: Breaking news o https://nypost.com/entertainment/feed/ — New York Post: Entertainment news o https://www.engadget.com/rss.xml — Technology and gadget news o https://www.wired.com/feed — Wired magazine: Technology news o https://abcnews.go.com/abcnews/internationalheadlines — ABC (USA) international news o http://www.lemonde.fr/rss/une.xml — Le Monde: News from France (in French) o https://feeds.gizmodo.com.au/gizmodoaustralia — Gizmodo’s Australian feed: News for geeks o http://syndication.eonline.com/syndication/feeds/rssfeeds/topstories.xml — E! Entertainment news:
Top stories o http://syndication.eonline.com/syndication/feeds/rssfeeds/tvnews.xml — E! Entertainment news: TV news
o http://feeds2.feedburner.com/TheNextWeb — The Next Web’s news for geeks o https://www.thelocal.es/feeds/rss.php — Local Spain: International news
o http://feeds.searchengineland.com/searchengineland — News about Internet search engines
o https://www.buzzfeed.com/world.xml — Buzzfeed’s world news
o https://feeds.lifehacker.com.au/lifehackeraustralia — LifeHacker’s Australian feed: Lifestyle for the Internet generation
o http://www.cbc.ca/cmlink/rss-world — CBC (Canada) world news o http://www.globalissues.org/news/feed — Global issues news
o http://feeds.washingtonpost.com/rss/world — The Washington Post newspaper o http://timesofindia.indiatimes.com/rssfeeds/296589292.cms — The Times of India world news o https://www.rt.com/rss/news/ — RT world news
o http://feeds.feedburner.com/time/world — Time magazine world news o https://www.northernstar.com.au/feeds/rss/homepage — Northern Star newspaper: Headline items o https://sputniknews.com/export/rss2/world/index.xml — Sputnik international multimedia news o http://www.independent.co.uk/news/world/rss — The Independent newspaper (UK) world news o http://feeds.feedburner.com/daily-express-world-news — Daily Express (UK) world news o https://www.mirror.co.uk/news/world-news/?service=rss — Daily Mirror (UK) world news
o http://www.latimes.com/world/rss2.0.xml — The Los Angeles Times world news o http://feeds.skynews.com/feeds/rss/world.xml — Sky News: World news o https://feeds.skynews.com/feeds/rss/entertainment.xml — Sky News: Entertainment
o http://en.rfi.fr/general/rss — Radio France International news
o http://feeds.news24.com/articles/news24/World/rss — News24 (South Africa) world news o http://www.rawstory.com/category/world/feed — Raw Story (USA) world news o http://globalnews.ca/world/feed — Global News: World news
o http://www.ctvnews.ca/rss/world/ctvnews-ca-world-public-rss-1.822289 — CTV (Canada) world news o http://www.france24.com/en/top-stories/rss — France 24: Top stories o http://www.seattletimes.com/nation-world/world/feed — Seattle Times: World news o http://www.channelnewsasia.com/rssfeeds/8395884 — Channel News Asia: World news o https://www.pri.org/stories/feed/everything — Public Radio International news (includes audio) o https://www.neweurope.eu/category/world/feed — New Europe Newspaper: World news
The Australian Broadcasting Corporation has a wide range of RSS Feeds, but no longer appears to promote their use. The ABC’s RSS Feed web site is now offline. Nonetheless, these sites have all the elements needed for the assignment (while they last!).
o ABC News: Just In — http://www.abc.net.au/news/feed/51120/rss.xml o ABC News: Top Stories — http://www.abc.net.au/news/feed/45910/rss.xml o ABC News: Australia — http://www.abc.net.au/news/feed/46182/rss.xml o ABC News: Business — http://www.abc.net.au/news/feed/51892/rss.xml o ABC News: Sport — http://www.abc.net.au/news/feed/45924/rss.xml o ABC News: New South Wales — http://www.abc.net.au/news/feed/52498/rss.xml o ABC News: Victoria — http://www.abc.net.au/news/feed/54242/rss.xml o ABC News: Queensland — http://www.abc.net.au/news/feed/50990/rss.xml o ABC News: Western Australia — http://www.abc.net.au/news/feed/52764/rss.xml o ABC News: South Australia — http://www.abc.net.au/news/feed/54702/rss.xml o ABC News: Tasmania — http://www.abc.net.au/news/feed/50042/rss.xml o ABC News: ACT — http://www.abc.net.au/news/feed/48320/rss.xml o ABC News: Northern Territory — http://www.abc.net.au/news/feed/53408/rss.xml
o https://www.autoexpress.co.uk/car-news/feed — Auto Express Car News (slightly complicated site) o http://rss.tvguide.com/breakingnews — TV Guide breaking news (complex site) o https://www.autocar.co.uk/rss — Autocar motoring news (complex site) o https://www.theverge.com/rss/frontpage — The Verge: Tech news (complex site) o https://nakedsecurity.sophos.com/feed/ — Cyber security news (inconsistent article style) o https://www.theguardian.com/world/rss — The Guardian newspaper (Beware: Only a few stories were listed when we checked!)
o https://feeds.kotaku.com.au/kotakuaustralia — Gaming news (complex page)
o https://world.wng.org/taxonomy/term/72/feed — World News Group: International news (complex page)
o https://www.androidpolice.com/feed/ — Android operating system news (complex page) o http://feeds.mashable.com/Mashable — News for the connected generation (complex page) o https://www.theverge.com/rss/index.xml — Technology and gadget news (complex page) o http://www.polygon.com/rss/index.xml — Entertainment news (story summaries are very brief) o https://www.prnewswire.com/rss/all-news-releases-from-PR-newswire-news.rss — PR Newswire (not all stories have photos on first page)
o https://www.vox.com/rss/index.xml — News in detail (complex site, full stories instead of summaries)
o http://feeds.feedburner.com/WarNewsUpdates — Warfare news (complex site) o http://feeds.macrumors.com/MacRumors-All — News about Apple (very complex site) o https://www.yahoo.com/news/rss/world — Yahoo! news (not all stories have an image) o http://feeds.mashable.com/Mashable — Mashable geek news (complex page)
o http://www.thedenverchannel.com/rss/ — Denver Channel 7 news
o https://www.theregister.co.uk/security/headlines.atom — The Register: Security news o https://www.tripwire.com/state-of-security/feed/ — Tripwire: Security news o http://feeds.feedburner.com/Securityweek?format=xml — Security Week o http://feeds.arstechnica.com/arstechnica/index — Ars Technica: All news o http://feeds.arstechnica.com/arstechnica/technology-lab — Ars Technica: IT news o http://rss.slashdot.org/Slashdot/slashdot — Slashdot news for nerds o http://www.couriermail.com.au/feed — Courier Mail newspaper: Breaking news o http://feeds.bbci.co.uk/news/technology/rss.xml — BBC News: Technology o http://www.themercury.com.au/feed — The Mercury: Latest news o http://www.goldcoastbulletin.com.au/feed — Gold Coast Bulletin: Breaking news o http://www.businessnews.com.au/rssfeed/latest.rss — Business News: Latest news o http://feeds.watoday.com.au/rssheadlines/top.xml — Western Australia Today: Top news o http://feeds.brisbanetimes.com.au/rssheadlines/top.xml — Brisbane Times: Latest news o http://www.theaustralian.com.au/feed/ — The Australian newspaper: Latest news o http://feeds.theage.com.au/rssheadlines/top.xml — The Age newspaper: Latest news o https://feeds.a.dj.com/rss/RSSWorldNews.xml — Wall Street Journal: World news o https://feeds.a.dj.com/rss/RSSWSJD.xml — Wall Street Journal: Technology news o https://feeds.a.dj.com/rss/RSSLifestyle.xml — Wall Street Journal: Lifestyle o http://feeds.sydneysun.com/rss/ae0def0d9b645403 — Sydney Sun newspaper: Headlines o https://www.aceshowbiz.com/rss/asb_news.xml — Ace Showbiz News o http://www.news.com.au/feed — News Corp (Australia) news
o https://feeds.feedburner.com/techcrunch — TechCrunch web technology news o http://ifpnews.com/feed — Iran Front Page news o https://www.usnews.com/rss/news — U.S. News: Breaking news o https://www.espn.com/espn/rss/news — ESPN News: Latest news o https://zeenews.india.com/rss/india-national-news.xml — ZeeNews (India): World news o https://zeenews.india.com/rss/entertainment-news.xml — ZeeNews (India): Entertainment news o http://www1.cbn.com/cbnnews/world/feed — CBN News: World news
o http://www.washingtontimes.com/rss/headlines/news/world — Washington Times: World news o https://www.smh.com.au/rss/world.xml — Sydney Morning Herald: World news o https://www.cbsnews.com/latest/rss/world — CBS (USA) world news o https://www.thesun.co.uk/news/worldnews/feed — The Sun (UK) world news o https://www.cnbc.com/id/100727362/device/rss/rss.html — CNBC (USA) world news o http://feeds.nature.com/nature/rss/current — Nature, the world’s leading scientific journal o https://www.aljazeera.com/xml/rss/all.xml — Al Jazeera’s English news service o http://feeds.feedburner.com/ndtvnews-world-news — NDTV (India) world news o http://feeds.bbci.co.uk/news/world/rss.xml — BBC News: World o http://feeds.bbci.co.uk/news/rss.xml — BBC News: Home o https://newslanes.com/feed — Newslanes: World news o http://rss.slashdot.org/Slashdot/slashdot — News for nerds o http://feeds.reuters.com/reuters/topNews — Reuters: Top News o http://feeds.reuters.com/Reuters/worldNews — Reuters: World News o http://feeds.reuters.com/reuters/technologyNews — Reuters: Technology News o http://feeds.reuters.com/reuters/businessNews — Reuters: Business News o http://feeds.arstechnica.com/arstechnica/index/ — IT news o http://feeds.feedburner.com/Techcrunch — Web news o https://www.npr.org/rss/rss.php — National Public Radio (USA) news o https://www.reddit.com/r/technology/.rss — Reddit’s tech news o https://defence-blog.com/feed — Defence news

o https://www.computerweekly.com/rss/All-Computer-Weekly-content.xml — Computer Weekly o https://www.thestar.com/feeds.articles.news.world.rss — Toronto Star world news (some articles have stories and photos but not all)
o https://feeds.howtogeek.com/HowToGeek — How-To Geek’s news (links take you to different source web sites, so hard to use for the assignment)
o https://news.ycombinator.com/rss — News for code hackers (links take you to different source web sites, so very hard to use for the assignment)
o https://www.reddit.com/r/worldnews/top/.rss — Reddit’s world news o http://indaily.com.au/feed — InDaily newspaper o https://zeenews.india.com/rss/india-national-news.xml — ZeeNews: Indian national news o http://www.dailytelegraph.com.au/news/world/rss — Daily Telegraph: World news o http://www.heraldsun.com.au/rss — Herald Sun newspaper: Breaking news
o http://www.ntnews.com.au/news/rss — Northern Territory newspaper (not clear how often it is updated)
o http://www.townsvillebulletin.com.au/news/rss — Townsville Bulletin newspaper (not clear how often it is updated)
Appendix B: Web sites that block access to Python scripts
If you suspect that your Python program isn’t being allowed to access your chosen web page, use the downloader program to check whether or not Python programs are being sent an access denied message. When viewed in a web browser, such messages typically look something like the following example. In this case blog www.wayofcats.com has used antimalware application Cloudflare to block access to the blog’s contents by our Python program.

In this situation you are encouraged to choose another source of data. Although it’s possible to trick some web sites into delivering blocked pages to a Python script by changing the “user agent” signature sent to the server in the request we don’t recommend doing so, partly because this solution is not reliable and partly because it could be considered unethical to deliberately override the web site owner’s wishes.

  • Assignment-2-ntkzbv.zip