Name: Henry J. Hu

Master in Data Science at Utica College

CYB-674 Cyber Data Fusion

Professor: Nikolas Rebovich

CYB 674 Homework 1

This homework is Due Midnight 11/15

There is a file upload activity on Engage where you can submit your HW.

This Homework is out of 100 points. Each question counts for 5 points.

Instructions

  1. Fill in your name above
  2. Fill in the code/text blocks to answer each question. These will be below the written question.
  3. Do not change any of the existing code provided.
  4. Run the entire notebook before submitting it on Engage to make sure that the code actually runs without errors.

Data Types and Lists (35 points)

A. Create a variable named x and write a Boolean Expression that evaluates to True if the variable is 697.

In [48]:
x = 697 #Assigned variable x an integer value of 697
In [49]:
x == 697 #Boolean comparison between variable x and the integer 697
Out[49]:
True

B. Create a variable named x and write a Boolean Expression that evaluates to True if the variable is "not" 99.

In [50]:
x = 100 #Assigned variable x an integer value of 100
In [51]:
x != 99 #Boolean comparison between variable x and the integer 99
Out[51]:
True

C. Create a variable named x and write a Boolean Expression that evaluates to True if the variable is greater than 4 or less than -7.

In [52]:
x = 5 #Assigned variable x an integer value of 5
In [53]:
(x > 4) | (x < -7) #Boolean and logical comparison
Out[53]:
True

D. Create a variable named x and write a Boolean Expression that evaluates to True if the variable is less than 4 and greater than -7.

In [54]:
x = -5 #Assigned variable x an integer value of -5
In [55]:
(x < 4) & (x > -7) #Boolean and logical comparison
Out[55]:
True

Use the list below for questions E to H

In [56]:
#DO NOT CHANGE CODE IN THIS CELL
#Make sure to run this cell so the variable midterm_grades is saved
midterm_grades = [ 74, 67]

E. Add a grade of 94 to the end of the list

In [57]:
midterm_grades.append(94) #Appended the integer 94 to the end of the array
In [58]:
print(midterm_grades)
[74, 67, 94]

F. Add a grade of 80 to the front of the list

In [59]:
midterm_grades.insert(0, 80) #Inserted the integer 80 at the beginning of the array
In [60]:
print(midterm_grades)
[80, 74, 67, 94]

G. Find the index location of 67 in midterm_grades

In [61]:
x = midterm_grades.index(67) #Locating the index of element 67
In [62]:
print(x)
2

H. Using a negative index, return 74 from midterm_grades

In [63]:
x = midterm_grades[-3] #Locating the third element going backward
In [64]:
print(x)
74

Arrays and Series (35 points)

Use the list below for this section in addition to midterm_grades

In [65]:
#DO NOT CHANGE CODE IN THIS CELL
#Make sure to run this cell so the variable final_grades is saved
final_grades = [90, 84, 70, 65]

A. Import the pandas and numpy libraries using the coding format outlined in class.

In [66]:
import pandas as pd
import numpy as np

B. Convert midterm_grades and final_grades into numPy arrays (You should have 2 seperate arrays).

Name these arrays midterm_grades_array and final_grades_array.

In [67]:
midterm_grades_array = np.array(midterm_grades) #Converting list to array
In [68]:
print (midterm_grades_array)
[80 74 67 94]
In [69]:
final_grades_array = np.array(final_grades) #Converting list to array
In [70]:
print (final_grades_array)
[90 84 70 65]

C. Using the np.average() function, find the average score for midterm grades in the class. (Put the array varaible between the parenthesis).

In [71]:
x = np.average(midterm_grades_array) #Calculating the average of midterm grades
In [72]:
print(x)
78.75

D. You are giving all your students 2 extra credit points on their midterm grade. Make the necessary changes to midterm_grades_array.

Note: Make sure to save the changes to the midterm_grades_array variable.

In [73]:
midterm_grades_array = midterm_grades_array + 2 #Add 2 to each midterm grade
In [74]:
print (midterm_grades_array)
[82 76 69 96]

E. Use the np.average() function again to get the class average. It should have increased by 2 points.

In [75]:
x = np.average(midterm_grades_array) #Calculating the average of the new midterm grades
In [76]:
print(x)
80.75

F. You want to add student names to midterm_grades. Convert the array into a series and add name labels to each grade.

The students names in order are Robert, Mary, Jane, and Frank.

Name these variables as midterm_grades_series and final_grades_series.

In [77]:
#Converting the array midterm_grades_array into a pandas series
In [78]:
midterm_grades_series = pd.Series(midterm_grades_array,index = ["Robert","Mary","Jane","Frank"])
In [79]:
print (midterm_grades_series)
Robert    82
Mary      76
Jane      69
Frank     96
dtype: int32
In [80]:
#Converting the array final_grades_array into a pandas series
In [81]:
final_grades_series = pd.Series(final_grades_array,index = ["Robert", "Mary", "Jane", "Frank"])
In [82]:
print (final_grades_series)
Robert    90
Mary      84
Jane      70
Frank     65
dtype: int32

G. Use a Boolean Operator see which student had a higher grade on their midterm than their final.

This should result in a boolean series.

In [83]:
boolean_series = midterm_grades_series > final_grades_series
In [84]:
print(boolean_series)
Robert    False
Mary      False
Jane      False
Frank      True
dtype: bool

Dataframes (30 Points)

You can easily convert a pandas series into a pandas dataframe.

Look at the code below and notice that you can use the DataFrame function with your series to make a dataframe.

Since series are only one dimensional you must specify the name of the first column otherwise it will be labelled as 0.

Running this code should result in 1 column labelled midterm_grades and rows labelled with student names.

In [85]:
#DO NOT CHANGE CODE IN THIS CELL
#Make sure to run this cell so the dataframe grades is saved
grades_df = pd.DataFrame(midterm_grades_series ,
                         columns = ['midterm_grades']   )

A. Add a column to the grades_df variable named final_grades and set the data elements in that column equal to final_grades_series.

(You are adding the final grades data to the dataframe)

In [86]:
print(grades_df)
        midterm_grades
Robert              82
Mary                76
Jane                69
Frank               96
In [87]:
#Adding the series final_grades_series as a column to dataframe grades_df
In [88]:
grades_df ["final_grades"] = final_grades_series
In [89]:
print(grades_df)
        midterm_grades  final_grades
Robert              82            90
Mary                76            84
Jane                69            70
Frank               96            65

B. Using .head() select the first 2 rows of data

In [90]:
grades_df.head(n=2)
Out[90]:
midterm_grades final_grades
Robert 82 90
Mary 76 84

C. Using .describe() get a summary of the data.

In [91]:
grades_df.describe(include = "all")
Out[91]:
midterm_grades final_grades
count 4.000000 4.00000
mean 80.750000 77.25000
std 11.470978 11.70114
min 69.000000 65.00000
25% 74.250000 68.75000
50% 79.000000 77.00000
75% 85.500000 85.50000
max 96.000000 90.00000

D. Select the first column of grades_df.

In [92]:
 grades_df.iloc[:,0:1]
Out[92]:
midterm_grades
Robert 82
Mary 76
Jane 69
Frank 96

E. Select the first row in grades_df using .loc().

In [93]:
grades_df.loc["Robert"]
Out[93]:
midterm_grades    82
final_grades      90
Name: Robert, dtype: int32

F. Select Frank's row of data using .iloc().

In [94]:
grades_df.iloc[3:4,:]
Out[94]:
midterm_grades final_grades
Frank 96 65