The file energy.csv (attached) contains the values of energy generated for each country in Terawatt Hours for different years (from OECD Factbook 2011: Economic, Environmental and Social Statistics). The data contains missing values (denoted by ..) and also aggregate rows for the EU, OECD, and World countries.
You are to create a program in Python that performs the following:
- Loads the energy.csv file (assume it’s in the current directory) and create a DataFrame object from it.
- Replaces the missing values with the mean energy production for that country.
- Removes the data for the aggregate values (EU27, OECD, World)
- Adds a new column called ‘Continent’ and fills it in with the continent that corresponds to each country.
- Creates a DataFrame that contains the continent name as the index and the following columns:
- ‘num_countries’ = number of countries for this continent
- ‘mean’ = mean energy production of the countries in this continent
- ‘small_production’ = 1 if continent mean is less than the mean production of all countries minus one standard deviation; 0 otherwise
- ‘avg_production’ = 1 if continent mean is greater than the mean production of all countries minus one standard deviation, but less than the mean production of all countries plus one standard deviation; 0 otherwise
- ‘large_production’ = 1 if continent mean is greater than the mean production of all countries plus one standard deviation; 0 otherwise
- Display the new DataFrame to the screen
- The name of your source code file should be py. All your code should be within a single file.
- You need to use the pandas DataFrame object for storing data.
- Your code should follow good coding practices, including good use of whitespace and use of both inline and block comments.
- You need to use meaningful identifier names that conform to standard naming conventions.
- At the top of each file, you need to put in a block comment with the following information: your name, date, course name, semester, and assignment name.
What to Turn In
You will turn in the single DataPrep.py file using BlackBoard.
Sample Program Output
70-511, [semester] [year]
NAME: [put your name here]
PROGRAMMING ASSIGNMENT #5
num_countries mean small avg large Europe 25 161.572326 0 1 0
Asia 7 607.983830 0 1 0
North America 3 1560.438095 0 0 1
South America 2 199.148901 0 1 0
Australia 1 218.021429 0 1 0
Africa 1 213.846154 0 1 0
Oceania 1 39.378571 0 1 0