Name: Henry J. Hu

CYB-674 Cyber Data Fusion

Professor: Nikolas Rebovich

CYB 674 HW 2

This Homework is due 11/22 at midnight

This Homework is out of 100 points. Each question is labelled with their corresponding point value.

Instructions

  1. Fill in your name above
  2. Fill in the code/text blocks to answer each question. These will be below the written question.
  3. Do not change any of the existing code provided.
  4. Run the entire notebook before submitting it on Engage to make sure that the code actually runs without errors.

Question 1: Dataframes (35 Pts)

For this homework we will be using the car_df dataset below

In [1]:
#Do not modify this code
import pandas as pd
cars_df = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/MASS/Cars93.csv')

A. Look at the first 5 rows of data in cars_df. (5 pts)

In [2]:
cars_df.head()
Out[2]:
Unnamed: 0 Manufacturer Model Type Min.Price Price Max.Price MPG.city MPG.highway AirBags ... Passengers Length Wheelbase Width Turn.circle Rear.seat.room Luggage.room Weight Origin Make
0 1 Acura Integra Small 12.9 15.9 18.8 25 31 None ... 5 177 102 68 37 26.5 11.0 2705 non-USA Acura Integra
1 2 Acura Legend Midsize 29.2 33.9 38.7 18 25 Driver & Passenger ... 5 195 115 71 38 30.0 15.0 3560 non-USA Acura Legend
2 3 Audi 90 Compact 25.9 29.1 32.3 20 26 Driver only ... 5 180 102 67 37 28.0 14.0 3375 non-USA Audi 90
3 4 Audi 100 Midsize 30.8 37.7 44.6 19 26 Driver & Passenger ... 6 193 106 70 37 31.0 17.0 3405 non-USA Audi 100
4 5 BMW 535i Midsize 23.7 30.0 36.2 22 30 Driver only ... 4 186 109 69 39 27.0 13.0 3640 non-USA BMW 535i

5 rows × 28 columns

B. The first column in the cars dataset is a redundant index (python automatically adds a numeric row index to the data). (5 pts)

Remove the first column of data in cars_df.

In [3]:
cars_df = cars_df.drop(["Unnamed: 0"], axis=1)
In [4]:
cars_df
Out[4]:
Manufacturer Model Type Min.Price Price Max.Price MPG.city MPG.highway AirBags DriveTrain ... Passengers Length Wheelbase Width Turn.circle Rear.seat.room Luggage.room Weight Origin Make
0 Acura Integra Small 12.9 15.9 18.8 25 31 None Front ... 5 177 102 68 37 26.5 11.0 2705 non-USA Acura Integra
1 Acura Legend Midsize 29.2 33.9 38.7 18 25 Driver & Passenger Front ... 5 195 115 71 38 30.0 15.0 3560 non-USA Acura Legend
2 Audi 90 Compact 25.9 29.1 32.3 20 26 Driver only Front ... 5 180 102 67 37 28.0 14.0 3375 non-USA Audi 90
3 Audi 100 Midsize 30.8 37.7 44.6 19 26 Driver & Passenger Front ... 6 193 106 70 37 31.0 17.0 3405 non-USA Audi 100
4 BMW 535i Midsize 23.7 30.0 36.2 22 30 Driver only Rear ... 4 186 109 69 39 27.0 13.0 3640 non-USA BMW 535i
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
88 Volkswagen Eurovan Van 16.6 19.7 22.7 17 21 None Front ... 7 187 115 72 38 34.0 NaN 3960 non-USA Volkswagen Eurovan
89 Volkswagen Passat Compact 17.6 20.0 22.4 21 30 None Front ... 5 180 103 67 35 31.5 14.0 2985 non-USA Volkswagen Passat
90 Volkswagen Corrado Sporty 22.9 23.3 23.7 18 25 None Front ... 4 159 97 66 36 26.0 15.0 2810 non-USA Volkswagen Corrado
91 Volvo 240 Compact 21.8 22.7 23.5 21 28 Driver only Rear ... 5 190 104 67 37 29.5 14.0 2985 non-USA Volvo 240
92 Volvo 850 Midsize 24.8 26.7 28.5 20 28 Driver & Passenger Front ... 5 184 105 69 38 30.0 15.0 3245 non-USA Volvo 850

93 rows × 27 columns

C. Create a variable named cars_mpg that contains the columns manufacturer, model, MPG.city, and MPG.highway. (5 pts)

In [5]:
cars_mpg = cars_df [["Manufacturer", "Model", "MPG.city", "MPG.highway"]]
In [6]:
cars_mpg
Out[6]:
Manufacturer Model MPG.city MPG.highway
0 Acura Integra 25 31
1 Acura Legend 18 25
2 Audi 90 20 26
3 Audi 100 19 26
4 BMW 535i 22 30
... ... ... ... ...
88 Volkswagen Eurovan 17 21
89 Volkswagen Passat 21 30
90 Volkswagen Corrado 18 25
91 Volvo 240 21 28
92 Volvo 850 20 28

93 rows × 4 columns

D. We want to see which cars have more than 20 mpg in the city. (10 pts)

Create a new column called mpg_city_20 and use a boolean operator to create a mask for this column with True or False values.

In [7]:
mpg_city_20 = cars_df["MPG.city"] > 20
cars_df['mpg_city_20'] = mpg_city_20
cars_df
Out[7]:
Manufacturer Model Type Min.Price Price Max.Price MPG.city MPG.highway AirBags DriveTrain ... Length Wheelbase Width Turn.circle Rear.seat.room Luggage.room Weight Origin Make mpg_city_20
0 Acura Integra Small 12.9 15.9 18.8 25 31 None Front ... 177 102 68 37 26.5 11.0 2705 non-USA Acura Integra True
1 Acura Legend Midsize 29.2 33.9 38.7 18 25 Driver & Passenger Front ... 195 115 71 38 30.0 15.0 3560 non-USA Acura Legend False
2 Audi 90 Compact 25.9 29.1 32.3 20 26 Driver only Front ... 180 102 67 37 28.0 14.0 3375 non-USA Audi 90 False
3 Audi 100 Midsize 30.8 37.7 44.6 19 26 Driver & Passenger Front ... 193 106 70 37 31.0 17.0 3405 non-USA Audi 100 False
4 BMW 535i Midsize 23.7 30.0 36.2 22 30 Driver only Rear ... 186 109 69 39 27.0 13.0 3640 non-USA BMW 535i True
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
88 Volkswagen Eurovan Van 16.6 19.7 22.7 17 21 None Front ... 187 115 72 38 34.0 NaN 3960 non-USA Volkswagen Eurovan False
89 Volkswagen Passat Compact 17.6 20.0 22.4 21 30 None Front ... 180 103 67 35 31.5 14.0 2985 non-USA Volkswagen Passat True
90 Volkswagen Corrado Sporty 22.9 23.3 23.7 18 25 None Front ... 159 97 66 36 26.0 15.0 2810 non-USA Volkswagen Corrado False
91 Volvo 240 Compact 21.8 22.7 23.5 21 28 Driver only Rear ... 190 104 67 37 29.5 14.0 2985 non-USA Volvo 240 True
92 Volvo 850 Midsize 24.8 26.7 28.5 20 28 Driver & Passenger Front ... 184 105 69 38 30.0 15.0 3245 non-USA Volvo 850 False

93 rows × 28 columns

E. Using a Boolean create a subset of the data when all vehicles have 20 mpg or more in the city. (10pts)

(Create a new variable for only cars where mpg_city_20 is True)

Name this new variable high_mpg_city_df.

In [8]:
high_mpg_city_df = cars_df[mpg_city_20]
In [9]:
high_mpg_city_df
Out[9]:
Manufacturer Model Type Min.Price Price Max.Price MPG.city MPG.highway AirBags DriveTrain ... Length Wheelbase Width Turn.circle Rear.seat.room Luggage.room Weight Origin Make mpg_city_20
0 Acura Integra Small 12.9 15.9 18.8 25 31 None Front ... 177 102 68 37 26.5 11.0 2705 non-USA Acura Integra True
4 BMW 535i Midsize 23.7 30.0 36.2 22 30 Driver only Rear ... 186 109 69 39 27.0 13.0 3640 non-USA BMW 535i True
5 Buick Century Midsize 14.2 15.7 17.3 22 31 Driver only Front ... 189 105 69 41 28.0 16.0 2880 USA Buick Century True
11 Chevrolet Cavalier Compact 8.5 13.4 18.3 25 36 None Front ... 182 101 66 38 25.0 13.0 2490 USA Chevrolet Cavalier True
12 Chevrolet Corsica Compact 11.4 11.4 11.4 25 34 Driver only Front ... 184 103 68 39 26.0 14.0 2785 USA Chevrolet Corsica True
14 Chevrolet Lumina Midsize 13.4 15.9 18.4 21 29 None Front ... 198 108 71 40 28.5 16.0 3195 USA Chevrolet Lumina True
20 Chrysler LeBaron Compact 14.5 15.8 17.1 23 28 Driver & Passenger Front ... 183 104 68 41 30.5 14.0 3085 USA Chrysler LeBaron True
22 Dodge Colt Small 7.9 9.2 10.6 29 33 None Front ... 174 98 66 32 26.5 11.0 2270 USA Dodge Colt True
23 Dodge Shadow Small 8.4 11.3 14.2 23 29 Driver only Front ... 172 97 67 38 26.5 13.0 2670 USA Dodge Shadow True
24 Dodge Spirit Compact 11.9 13.3 14.7 22 27 Driver only Front ... 181 104 68 39 30.5 14.0 2970 USA Dodge Spirit True
26 Dodge Dynasty Midsize 14.8 15.6 16.4 21 27 Driver only Front ... 192 105 69 42 30.5 16.0 3080 USA Dodge Dynasty True
28 Eagle Summit Small 7.9 12.2 16.5 29 33 None Front ... 174 98 66 36 26.5 11.0 2295 USA Eagle Summit True
30 Ford Festiva Small 6.9 7.4 7.9 31 33 None Front ... 141 90 63 33 26.0 12.0 1845 USA Ford Festiva True
31 Ford Escort Small 8.4 10.1 11.9 23 30 None Front ... 171 98 67 36 28.0 12.0 2530 USA Ford Escort True
32 Ford Tempo Compact 10.4 11.3 12.2 22 27 None Front ... 177 100 68 39 27.5 13.0 2690 USA Ford Tempo True
33 Ford Mustang Sporty 10.8 15.9 21.0 22 29 Driver only Rear ... 180 101 68 40 24.0 12.0 2850 USA Ford Mustang True
34 Ford Probe Sporty 12.8 14.0 15.2 24 30 Driver only Front ... 179 103 70 38 23.0 18.0 2710 USA Ford Probe True
36 Ford Taurus Midsize 15.6 20.2 24.8 21 30 Driver only Front ... 192 106 71 40 27.5 18.0 3325 USA Ford Taurus True
38 Geo Metro Small 6.7 8.4 10.0 46 50 None Front ... 151 93 63 34 27.5 10.0 1695 non-USA Geo Metro True
39 Geo Storm Sporty 11.5 12.5 13.5 30 36 Driver only Front ... 164 97 67 37 24.5 11.0 2475 non-USA Geo Storm True
40 Honda Prelude Sporty 17.0 19.8 22.7 24 31 Driver & Passenger Front ... 175 100 70 39 23.5 8.0 2865 non-USA Honda Prelude True
41 Honda Civic Small 8.4 12.1 15.8 42 46 Driver only Front ... 173 103 67 36 28.0 12.0 2350 non-USA Honda Civic True
42 Honda Accord Compact 13.8 17.5 21.2 24 31 Driver & Passenger Front ... 185 107 67 41 28.0 14.0 3040 non-USA Honda Accord True
43 Hyundai Excel Small 6.8 8.0 9.2 29 33 None Front ... 168 94 63 35 26.0 11.0 2345 non-USA Hyundai Excel True
44 Hyundai Elantra Small 9.0 10.0 11.0 22 29 None Front ... 172 98 66 36 28.0 12.0 2620 non-USA Hyundai Elantra True
45 Hyundai Scoupe Sporty 9.1 10.0 11.0 26 34 None Front ... 166 94 64 34 23.5 9.0 2285 non-USA Hyundai Scoupe True
52 Mazda 323 Small 7.4 8.3 9.1 29 37 None Front ... 164 97 66 34 27.0 16.0 2325 non-USA Mazda 323 True
53 Mazda Protege Small 10.9 11.6 12.3 28 36 None Front ... 172 98 66 36 26.5 13.0 2440 non-USA Mazda Protege True
54 Mazda 626 Compact 14.3 16.5 18.7 26 34 Driver only Front ... 184 103 69 40 29.5 14.0 2970 non-USA Mazda 626 True
59 Mercury Capri Sporty 13.3 14.1 15.0 23 26 Driver only Front ... 166 95 65 36 19.0 6.0 2450 USA Mercury Capri True
61 Mitsubishi Mirage Small 7.7 10.3 12.9 29 33 None Front ... 172 98 67 36 26.0 11.0 2295 non-USA Mitsubishi Mirage True
63 Nissan Sentra Small 8.7 11.8 14.9 29 33 Driver only Front ... 170 96 66 33 26.0 12.0 2545 non-USA Nissan Sentra True
64 Nissan Altima Compact 13.0 15.7 18.3 24 30 Driver only Front ... 181 103 67 40 28.5 14.0 3050 non-USA Nissan Altima True
66 Nissan Maxima Midsize 21.0 21.5 22.0 21 26 Driver only Front ... 188 104 69 41 28.5 14.0 3200 non-USA Nissan Maxima True
67 Oldsmobile Achieva Compact 13.0 13.5 14.0 24 31 None Front ... 188 103 67 39 28.0 14.0 2910 USA Oldsmobile Achieva True
68 Oldsmobile Cutlass_Ciera Midsize 14.2 16.3 18.4 23 31 Driver only Front ... 190 105 70 42 28.0 16.0 2890 USA Oldsmobile Cutlass_Ciera True
71 Plymouth Laser Sporty 11.4 14.4 17.4 23 30 None 4WD ... 173 97 67 39 24.5 8.0 2640 USA Plymouth Laser True
72 Pontiac LeMans Small 8.2 9.0 9.9 31 41 None Front ... 177 99 66 35 25.5 17.0 2350 USA Pontiac LeMans True
73 Pontiac Sunbird Compact 9.4 11.1 12.8 23 31 None Front ... 181 101 66 39 25.0 13.0 2575 USA Pontiac Sunbird True
78 Saturn SL Small 9.2 11.1 12.9 28 38 Driver only Front ... 176 102 68 40 26.5 12.0 2495 USA Saturn SL True
79 Subaru Justy Small 7.3 8.4 9.5 33 37 None 4WD ... 146 90 60 32 23.5 10.0 2045 non-USA Subaru Justy True
80 Subaru Loyale Small 10.5 10.9 11.3 25 30 None 4WD ... 175 97 65 35 27.5 15.0 2490 non-USA Subaru Loyale True
81 Subaru Legacy Compact 16.3 19.5 22.7 23 30 Driver only 4WD ... 179 102 67 37 27.0 14.0 3085 non-USA Subaru Legacy True
82 Suzuki Swift Small 7.3 8.6 10.0 39 43 None Front ... 161 93 63 34 27.5 10.0 1965 non-USA Suzuki Swift True
83 Toyota Tercel Small 7.8 9.8 11.8 32 37 Driver only Front ... 162 94 65 36 24.0 11.0 2055 non-USA Toyota Tercel True
84 Toyota Celica Sporty 14.2 18.4 22.6 25 32 Driver only Front ... 174 99 69 39 23.0 13.0 2950 non-USA Toyota Celica True
85 Toyota Camry Midsize 15.2 18.2 21.2 22 29 Driver only Front ... 188 103 70 38 28.5 15.0 3030 non-USA Toyota Camry True
87 Volkswagen Fox Small 8.7 9.1 9.5 25 33 None Front ... 163 93 63 34 26.0 10.0 2240 non-USA Volkswagen Fox True
89 Volkswagen Passat Compact 17.6 20.0 22.4 21 30 None Front ... 180 103 67 35 31.5 14.0 2985 non-USA Volkswagen Passat True
91 Volvo 240 Compact 21.8 22.7 23.5 21 28 Driver only Rear ... 190 104 67 37 29.5 14.0 2985 non-USA Volvo 240 True

50 rows × 28 columns

Question 2: Functions and If-Else Statements (65 pts)

Below is a function that returns the average price of all cars in cars_df.

-The second line defines the 'Price' column in cars_df as the variable cars_price.

-This function takes each data element in the 'Price' column and adds it to the total_price. We then divide by the number of prices using len().

-Setting total_price = 0 ensures the total_price is reset everytime we use this function.

In [10]:
#Do not modify this code
cars_price = cars_df['Price']

def mean_price(price):
    total_price = 0
    for i in range(len(cars_price)):
        total_price = total_price + cars_price[i]
    print(total_price/len(cars_price))

mean_price(cars_price)
19.509677419354837

A. Use the function .mean() to check the function above is correct. (5 pts)

You need to set the 'axis =' parameter for the mean function so check the pandas documentation on how to use it.

Only find the mean for the Price column in the cars_df dataframe.

In [11]:
cars_df['Price'].mean(axis=0)
Out[11]:
19.50967741935484

B. Using the function mean_price as an example, create a similar function that checks the average mpg for all vehicles in the cars_df dataframe. (15 pts)

Name your function mean_mpg.

In [12]:
carmpg = cars_df['MPG.city']

def mean_mpg(mpg):
    total_mpg = 0
    for i in range(len(mpg)):
        total_mpg = total_mpg + mpg[i]
    print(total_mpg/len(mpg))

mean_mpg(carmpg)
22.365591397849464

C. Use the mean function to check your function works correctly. (5 pts)

In [13]:
cars_df['MPG.city'].mean(axis=0)
Out[13]:
22.365591397849464

D. To summarize column data you can use .value_counts(). Use this function to summarize the Type column of cars_df. (10 pts)

Check pandas documentaton if you need help using value_counts().

In [14]:
cars_df["Type"].value_counts()
Out[14]:
Midsize    22
Small      21
Compact    16
Sporty     14
Large      11
Van         9
Name: Type, dtype: int64

Below is a function that summarizes the same information individually.

midsize_small_type_count summarizes the count of midsize and small vehicles in the type column.

-If car type is equal to 'Midsize' we add 1 to the midsize_type_count.

-If car type is equal to 'Small' we add 1 to the small_type_count.

-If car type is anything else nothing happens.

-We then print the total car type counts using the print form at the bottom.

In [15]:
#Do not modify this code
cars_type = cars_df['Type']
def midsize_small_type_count(Type):
    midsize_type_count = 0
    small_type_count = 0
    for i in range(len(cars_price)):
        if cars_type[i] == 'Midsize':
            midsize_type_count += 1
        elif cars_type[i] == 'Small':
            small_type_count += 1
    print("Midsize:",midsize_type_count)
    print("Small:",small_type_count)

midsize_small_type_count(cars_type)
Midsize: 22
Small: 21

E. Create individual functions that give the count of compact, sporty, large, and van vehicles. (20 pts)

Name each function [car_type]_type_count (i.e. compact_type_count)

In [16]:
#Compact count
In [17]:
def compact_type_count (Type):
    compactsize_type_count = 0
    for i in range(len(cars_price)):
        if cars_type[i] == 'Compact':
            compactsize_type_count += 1
    print("Compact:",compactsize_type_count)

compact_type_count(cars_type)
Compact: 16
In [18]:
#Sporty count
In [19]:
def sporty_type_count (Type):
    sportysize_type_count = 0
    for i in range(len(cars_price)):
        if cars_type[i] == 'Sporty':
            sportysize_type_count += 1
    print("Sporty:",sportysize_type_count)

sporty_type_count(cars_type)
Sporty: 14
In [20]:
#Large Count
In [21]:
def large_type_count (Type):
    largesize_type_count = 0
    for i in range(len(cars_price)):
        if cars_type[i] == 'Large':
            largesize_type_count += 1
    print("Large:",largesize_type_count)

large_type_count(cars_type)
Large: 11
In [22]:
#Van Count
In [23]:
def van_type_count (Type):
    vansize_type_count = 0
    for i in range(len(cars_price)):
        if cars_type[i] == 'Van':
            vansize_type_count += 1
    print("Van:",vansize_type_count)

van_type_count(cars_type)
Van: 9

F. Combine the compact, sport, large, and van counts into one function. (10 pts)

(Remember the first statement should be "if". Since this if-else statment is inclusive the last statement can be elif).

In [24]:
def combined_type_count(Type):
    compactsize_type_count = 0
    sportysize_type_count = 0
    largesize_type_count = 0
    vansize_type_count = 0
    for i in range(len(cars_price)):
        if cars_type[i] == 'Compact':
            compactsize_type_count += 1
        elif cars_type[i] == 'Sporty':
            sportysize_type_count += 1
        elif cars_type[i] == 'Large':
            largesize_type_count += 1            
        elif cars_type[i] == 'Van':
            vansize_type_count += 1            
            
    print("Compact:",compactsize_type_count)
    print("Sporty:",sportysize_type_count)
    print("Large:",largesize_type_count)
    print("Van:",vansize_type_count)    
    
combined_type_count(cars_type)
Compact: 16
Sporty: 14
Large: 11
Van: 9
In [ ]: