This Homework is due 11/22 at midnight
This Homework is out of 100 points. Each question is labelled with their corresponding point value.
For this homework we will be using the car_df dataset below
#Do not modify this code
import pandas as pd
cars_df = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/MASS/Cars93.csv')
A. Look at the first 5 rows of data in cars_df. (5 pts)
cars_df.head()
B. The first column in the cars dataset is a redundant index (python automatically adds a numeric row index to the data). (5 pts)
Remove the first column of data in cars_df.
cars_df = cars_df.drop(["Unnamed: 0"], axis=1)
cars_df
C. Create a variable named cars_mpg that contains the columns manufacturer, model, MPG.city, and MPG.highway. (5 pts)
cars_mpg = cars_df [["Manufacturer", "Model", "MPG.city", "MPG.highway"]]
cars_mpg
D. We want to see which cars have more than 20 mpg in the city. (10 pts)
Create a new column called mpg_city_20 and use a boolean operator to create a mask for this column with True or False values.
mpg_city_20 = cars_df["MPG.city"] > 20
cars_df['mpg_city_20'] = mpg_city_20
cars_df
E. Using a Boolean create a subset of the data when all vehicles have 20 mpg or more in the city. (10pts)
(Create a new variable for only cars where mpg_city_20 is True)
Name this new variable high_mpg_city_df.
high_mpg_city_df = cars_df[mpg_city_20]
high_mpg_city_df
-The second line defines the 'Price' column in cars_df as the variable cars_price.
-This function takes each data element in the 'Price' column and adds it to the total_price. We then divide by the number of prices using len().
-Setting total_price = 0 ensures the total_price is reset everytime we use this function.
#Do not modify this code
cars_price = cars_df['Price']
def mean_price(price):
total_price = 0
for i in range(len(cars_price)):
total_price = total_price + cars_price[i]
print(total_price/len(cars_price))
mean_price(cars_price)
A. Use the function .mean() to check the function above is correct. (5 pts)
You need to set the 'axis =' parameter for the mean function so check the pandas documentation on how to use it.
Only find the mean for the Price column in the cars_df dataframe.
cars_df['Price'].mean(axis=0)
B. Using the function mean_price as an example, create a similar function that checks the average mpg for all vehicles in the cars_df dataframe. (15 pts)
Name your function mean_mpg.
carmpg = cars_df['MPG.city']
def mean_mpg(mpg):
total_mpg = 0
for i in range(len(mpg)):
total_mpg = total_mpg + mpg[i]
print(total_mpg/len(mpg))
mean_mpg(carmpg)
C. Use the mean function to check your function works correctly. (5 pts)
cars_df['MPG.city'].mean(axis=0)
D. To summarize column data you can use .value_counts(). Use this function to summarize the Type column of cars_df. (10 pts)
Check pandas documentaton if you need help using value_counts().
cars_df["Type"].value_counts()
midsize_small_type_count summarizes the count of midsize and small vehicles in the type column.
-If car type is equal to 'Midsize' we add 1 to the midsize_type_count.
-If car type is equal to 'Small' we add 1 to the small_type_count.
-If car type is anything else nothing happens.
-We then print the total car type counts using the print form at the bottom.
#Do not modify this code
cars_type = cars_df['Type']
def midsize_small_type_count(Type):
midsize_type_count = 0
small_type_count = 0
for i in range(len(cars_price)):
if cars_type[i] == 'Midsize':
midsize_type_count += 1
elif cars_type[i] == 'Small':
small_type_count += 1
print("Midsize:",midsize_type_count)
print("Small:",small_type_count)
midsize_small_type_count(cars_type)
E. Create individual functions that give the count of compact, sporty, large, and van vehicles. (20 pts)
Name each function [car_type]_type_count (i.e. compact_type_count)
#Compact count
def compact_type_count (Type):
compactsize_type_count = 0
for i in range(len(cars_price)):
if cars_type[i] == 'Compact':
compactsize_type_count += 1
print("Compact:",compactsize_type_count)
compact_type_count(cars_type)
#Sporty count
def sporty_type_count (Type):
sportysize_type_count = 0
for i in range(len(cars_price)):
if cars_type[i] == 'Sporty':
sportysize_type_count += 1
print("Sporty:",sportysize_type_count)
sporty_type_count(cars_type)
#Large Count
def large_type_count (Type):
largesize_type_count = 0
for i in range(len(cars_price)):
if cars_type[i] == 'Large':
largesize_type_count += 1
print("Large:",largesize_type_count)
large_type_count(cars_type)
#Van Count
def van_type_count (Type):
vansize_type_count = 0
for i in range(len(cars_price)):
if cars_type[i] == 'Van':
vansize_type_count += 1
print("Van:",vansize_type_count)
van_type_count(cars_type)
F. Combine the compact, sport, large, and van counts into one function. (10 pts)
(Remember the first statement should be "if". Since this if-else statment is inclusive the last statement can be elif).
def combined_type_count(Type):
compactsize_type_count = 0
sportysize_type_count = 0
largesize_type_count = 0
vansize_type_count = 0
for i in range(len(cars_price)):
if cars_type[i] == 'Compact':
compactsize_type_count += 1
elif cars_type[i] == 'Sporty':
sportysize_type_count += 1
elif cars_type[i] == 'Large':
largesize_type_count += 1
elif cars_type[i] == 'Van':
vansize_type_count += 1
print("Compact:",compactsize_type_count)
print("Sporty:",sportysize_type_count)
print("Large:",largesize_type_count)
print("Van:",vansize_type_count)
combined_type_count(cars_type)