Name: Henry J. Hu

CYB-674 Cyber Data Fusion

Professor: Nikolas Rebovich

CYB 674 HW 3

This Homework is due December 7th at midnight

This Homework is out of 100 points. Each question is labelled with their corresponding point value.

Instructions

  1. Fill in your name above
  2. Fill in the code/text blocks to answer each question. These will be below the written question.
  3. Do not change any of the existing code provided.
  4. Run the entire notebook before submitting it on Engage to make sure that the code actually runs without errors.

Question 1: Cleaning Null Values (50 Pts)

Your client is interested in analyzing a dataset that contains Cyber Security Breaches.

Before you can do any analysis you must first look at the data and clean up any nulls located.

Use your best judgement when assessing how to deal with null values (replacing the null versus removing the null). You will use different methods based upon which column you are looking at.

A. Import the data for HW 3, pandas, and numpy libraries (5pts)

In [1]:
import pandas as pd
import numpy as np
security_breach_df = pd.read_csv('C:/Users/henry/Henry_J_Hu/HW_3_Data.csv')
security_breach_df
Out[1]:
Unnamed: 0 Number Name_of_Covered_Entity State Business_Associate_Involved Individuals_Affected Date_of_Breach Type_of_Breach Location_of_Breached_Information Date_Posted_or_Updated Summary breach_start breach_end year
0 1 0 Brooke Army Medical Center TX NaN 1000.0 10/16/2009 Theft Paper 6/30/2014 A binder containing the protected health infor... 10/16/2009 NaN 2009
1 2 1 Mid America Kidney Stone Association, LLC MO NaN 1000.0 9/22/2009 Theft Network Server 5/30/2014 Five desktop computers containing unencrypted ... 9/22/2009 NaN 2009
2 3 2 NaN AK NaN NaN 10/12/2009 Theft Other Portable Electronic Device, Other 1/23/2014 NaN 10/12/2009 NaN 2009
3 4 3 Health Services for Children with Special Need... DC NaN 3800.0 10/9/2009 Loss Laptop 1/23/2014 A laptop was lost by an employee while in tran... 10/9/2009 NaN 2009
4 5 4 L. Douglas Carlson, M.D. CA NaN 5257.0 9/27/2009 Theft Desktop Computer 1/23/2014 A shared Computer that was used for backup was... 9/27/2009 NaN 2009
5 6 5 David I. Cohen, MD CA NaN 857.0 9/27/2009 Theft Desktop Computer 1/23/2014 A shared Computer that was used for backup was... 9/27/2009 NaN 2009
6 7 6 Michele Del Vicario, MD CA NaN 6145.0 9/27/2009 Theft Desktop Computer 1/23/2014 A shared Computer that was used for backup was... 9/27/2009 NaN 2009
7 8 7 Joseph F. Lopez, MD CA NaN 952.0 9/27/2009 Theft Desktop Computer 1/23/2014 A shared Computer that was used for backup was... 9/27/2009 NaN 2009
8 9 8 Mark D. Lurie, MD CA NaN 5166.0 9/27/2009 Theft Desktop Computer 1/23/2014 A shared Computer that was used for backup was... 9/27/2009 NaN 2009
9 10 9 City of Hope National Medical Center CA NaN NaN 9/27/2009 Theft Laptop 1/23/2014 A laptop computer was stolen from a workforce ... 9/27/2009 NaN 2009
10 11 10 The Children's Hospital of Philadelphia PA NaN 943.0 10/20/2009 Theft NaN 1/23/2014 NaN 10/20/2009 NaN 2009
11 12 11 Cogent Healthcare of Wisconsin, S.C. TN NaN 6400.0 10/11/2009 Theft Laptop 4/23/2014 A laptop was stolen from a locked office at th... 10/11/2009 NaN 2009
12 13 12 Universal American NY Democracy Data & Communications, LLC ( 83000.0 11/12/2009 Other Paper 1/23/2014 In its breach report and during the course of ... 11/12/2009 NaN 2009
13 14 13 Kern Medical Center CA NaN 596.0 10/31/2009 Theft Other 1/23/2014 NaN 10/31/2009 NaN 2009
14 15 14 Keith W. Mann, DDS, PLLC NC Rick Lawson, Professional Computer Services 2000.0 12/8/2009 Hacking/IT Incident Desktop Computer, Network Server, Electronic M... 1/23/2014 NaN 12/8/2009 NaN 2009
15 16 15 Detroit Department of Health and Wellness Prom... MI NaN 10000.0 10/22/2009 Theft Other Portable Electronic Device 1/23/2014 NaN 10/22/2009 NaN 2009
16 17 16 Detroit Department of Health and Wellness Prom... MI NaN 646.0 11/26/2009 Theft Laptop, Desktop Computer 1/23/2014 A desktop and four laptop computers were stole... 11/26/2009 NaN 2009
17 18 17 NaN CA NaN 610.0 9/22/2009 Other E-mail 1/23/2014 NaN 9/22/2009 NaN 2009
18 19 18 Daniel J. Sigman MD PC MA NaN 1860.0 12/11/2009 Theft Other Portable Electronic Device, Other, Elect... 1/23/2014 Computer backup tapes containing EPHI for the ... 12/11/2009 NaN 2009
19 20 19 Massachusetts Eye and Ear Infirmary MA NaN 1076.0 11/10/2009 Theft Other 1/23/2014 NaN 11/10/2009 NaN 2009
20 21 20 BlueCross BlueShield Association DC Service Benefits Plan Administrative Services ... 3400.0 10/26/2009 Theft Paper 6/30/2014 The covered entity's (CE) business associate (... 10/26/2009 NaN 2009
21 22 21 BlueCross BlueShield Association DC Merkle Direct Marketing 15000.0 10/7/2009 Theft Paper 4/24/2014 The covered entity's (CE) business associate (... 10/7/2009 NaN 2009
22 23 22 Kaiser Permanente Medical Care Program CA NaN 15500.0 12/1/2009 Theft NaN 1/23/2014 NaN 12/1/2009 NaN 2009
23 24 23 Blue Island Radiology Consultants IL United Micro Data 2562.0 12/9/2009 Theft Other 6/30/2014 The covered entity's (CE's) business associate... 12/9/2009 NaN 2009
24 25 24 Goodwill Industries of Greater Grand Rapids, Inc. MI NaN 10000.0 12/15/2009 Theft Other 1/23/2014 On December 15, 2009, a safe was stolen from G... 12/15/2009 NaN 2009
25 26 25 Children's Medical Center of Dallas TX NaN 3800.0 11/19/2009 Loss Other Portable Electronic Device, Other 1/23/2014 NaN 11/19/2009 NaN 2009
26 27 26 NaN TX NaN NaN 11/19/2009 Theft Laptop 1/23/2014 NaN 11/19/2009 NaN 2009
27 28 27 Ashley and Gray DDS MO NaN 9309.0 1/10/2010 Theft Desktop Computer 1/23/2014 NaN 1/10/2010 NaN 2010
28 29 28 Advocate Health Care IL NaN 812.0 11/24/2009 Theft Laptop 1/23/2014 On November 24, 2009, an Advocate nurse's lapt... 11/24/2009 NaN 2009
29 30 29 The Methodist Hospital TX NaN 689.0 1/18/2010 Theft Other 1/23/2014 An unencrypted laptop computer was stolen from... 1/18/2010 NaN 2010
30 31 30 University of California, San Francisco CA NaN 7300.0 11/30/2009 Theft Laptop 1/23/2014 NaN 11/30/2009 NaN 2009
31 32 31 Carle Clinic Association IL NaN 1300.0 1/13/2010 Theft Other, Paper 1/23/2014 NaN 1/13/2010 NaN 2010
32 33 32 Educators Mutual Insurance Association of Utah UT Health Behavior Innovations (HBI) 5700.0 12/27/2009 Theft Other 1/23/2014 NaN 12/27/2009 NaN 2009
33 34 33 University Medical Center of Southern Nevada NV NaN 5103.0 10/31/2009 Theft Paper 1/23/2014 Between the dates of July 31, 2009 and Novembe... 10/31/2009 NaN 2009
34 35 34 Center for Neurosciences AZ NaN 1100.0 12/15/2009 Theft Laptop 1/23/2014 NaN 12/15/2009 NaN 2009
35 36 35 Brown University RI Blue Cross Blue Shield of RI 528.0 12/11/2009 Other Paper 1/23/2014 On January 5, 2010, BCBSRI was notified that a... 12/11/2009 NaN 2009
36 37 36 MMM Heath Care Inc. PR MSO of Puerto Rico, Inc. NaN 2/4/2010 Theft Paper 6/3/2014 The covered entity's (CE) business associate (... 2/4/2010 NaN 2010
37 38 37 PMC Medicare Choice PR MSO of Puerto Rico 605.0 2/4/2010 Theft Paper 6/3/2014 The covered entity's (CE) business associate (... 2/4/2010 NaN 2010
38 39 38 Cardiology Consultants/Baptist Health Care Cor... FL NaN 8000.0 12/19/2009 Theft Desktop Computer 6/30/2014 A desktop computer that contained the e-PHI of... 12/19/2009 NaN 2009
39 40 39 NaN TN NaN 3900.0 12/23/2009 Theft Paper 6/24/2014 The covered entity (CE) mailed the wrong infor... 12/23/2009 NaN 2009
40 41 40 Lucille Packard Children's Hospital CA NaN 532.0 1/11/2010 Other Desktop Computer 1/23/2014 NaN 1/11/2010 NaN 2010
41 42 41 University of New Mexico Health Sciences Center NM NaN 1900.0 2/8/2010 Other Desktop Computer 1/23/2014 NaN 2/8/2010 NaN 2010
42 43 42 Advanced NeuroSpinal Care CA NaN 3500.0 12/30/2009 Theft Network Server 4/22/2014 A computer containing the electronic protected... 12/30/2009 NaN 2009
43 44 43 Aspen Dental Care P.C. CO NaN NaN 10/4/2009 Theft NaN 6/30/2014 A computer hard drive containing encrypted pat... 10/4/2009 NaN 2009
44 45 44 Shands at UF FL NaN 12580.0 1/27/2010 Theft Laptop 1/23/2014 A laptop containing certain information collec... 1/27/2010 NaN 2010
45 46 45 Wyoming Department of Health WY NaN 9023.0 12/2/2009 Unauthorized Access/Disclosure Network Server 1/23/2014 NaN 12/2/2009 NaN 2009
46 47 46 Thrivent Financial for Lutherans WI NaN 9500.0 1/29/2010 Theft Laptop 1/23/2014 On January 29, 2010, there was a break-in at o... 1/29/2010 NaN 2010
47 48 47 North Carolina Baptist Hospital NC NaN 554.0 2/15/2010 Theft Paper 1/23/2014 NaN 2/15/2010 NaN 2010
48 49 48 Montefiore Medical Center NY NaN 625.0 2/20/2010 Theft NaN 6/3/2014 An unencrypted laptop computer containing the ... 2/20/2010 NaN 2010
49 50 49 Ernest T. Bice, Jr. DDS, P.A. TX NaN 21000.0 2/20/2010 Theft Other Portable Electronic Device, Other 1/23/2014 Three unencrypted external back-up drives were... 2/20/2010 NaN 2010

B. Find the shape of the data (5 pts)

In [2]:
security_breach_df.shape
Out[2]:
(50, 14)

C. Look at the column names of the data and remove any empty or unnecessary columns. (5 pts)

In [3]:
security_breach_df.isnull().sum()
Out[3]:
Unnamed: 0                           0
Number                               0
Name_of_Covered_Entity               4
State                                0
Business_Associate_Involved         41
Individuals_Affected                 5
Date_of_Breach                       0
Type_of_Breach                       0
Location_of_Breached_Information     4
Date_Posted_or_Updated               0
Summary                             19
breach_start                         0
breach_end                          50
year                                 0
dtype: int64
In [4]:
security_breach_df = security_breach_df.drop('Unnamed: 0', axis=1)
security_breach_df = security_breach_df.drop('Number', axis=1)
security_breach_df = security_breach_df.drop('breach_end', axis=1)
security_breach_df
Out[4]:
Name_of_Covered_Entity State Business_Associate_Involved Individuals_Affected Date_of_Breach Type_of_Breach Location_of_Breached_Information Date_Posted_or_Updated Summary breach_start year
0 Brooke Army Medical Center TX NaN 1000.0 10/16/2009 Theft Paper 6/30/2014 A binder containing the protected health infor... 10/16/2009 2009
1 Mid America Kidney Stone Association, LLC MO NaN 1000.0 9/22/2009 Theft Network Server 5/30/2014 Five desktop computers containing unencrypted ... 9/22/2009 2009
2 NaN AK NaN NaN 10/12/2009 Theft Other Portable Electronic Device, Other 1/23/2014 NaN 10/12/2009 2009
3 Health Services for Children with Special Need... DC NaN 3800.0 10/9/2009 Loss Laptop 1/23/2014 A laptop was lost by an employee while in tran... 10/9/2009 2009
4 L. Douglas Carlson, M.D. CA NaN 5257.0 9/27/2009 Theft Desktop Computer 1/23/2014 A shared Computer that was used for backup was... 9/27/2009 2009
5 David I. Cohen, MD CA NaN 857.0 9/27/2009 Theft Desktop Computer 1/23/2014 A shared Computer that was used for backup was... 9/27/2009 2009
6 Michele Del Vicario, MD CA NaN 6145.0 9/27/2009 Theft Desktop Computer 1/23/2014 A shared Computer that was used for backup was... 9/27/2009 2009
7 Joseph F. Lopez, MD CA NaN 952.0 9/27/2009 Theft Desktop Computer 1/23/2014 A shared Computer that was used for backup was... 9/27/2009 2009
8 Mark D. Lurie, MD CA NaN 5166.0 9/27/2009 Theft Desktop Computer 1/23/2014 A shared Computer that was used for backup was... 9/27/2009 2009
9 City of Hope National Medical Center CA NaN NaN 9/27/2009 Theft Laptop 1/23/2014 A laptop computer was stolen from a workforce ... 9/27/2009 2009
10 The Children's Hospital of Philadelphia PA NaN 943.0 10/20/2009 Theft NaN 1/23/2014 NaN 10/20/2009 2009
11 Cogent Healthcare of Wisconsin, S.C. TN NaN 6400.0 10/11/2009 Theft Laptop 4/23/2014 A laptop was stolen from a locked office at th... 10/11/2009 2009
12 Universal American NY Democracy Data & Communications, LLC ( 83000.0 11/12/2009 Other Paper 1/23/2014 In its breach report and during the course of ... 11/12/2009 2009
13 Kern Medical Center CA NaN 596.0 10/31/2009 Theft Other 1/23/2014 NaN 10/31/2009 2009
14 Keith W. Mann, DDS, PLLC NC Rick Lawson, Professional Computer Services 2000.0 12/8/2009 Hacking/IT Incident Desktop Computer, Network Server, Electronic M... 1/23/2014 NaN 12/8/2009 2009
15 Detroit Department of Health and Wellness Prom... MI NaN 10000.0 10/22/2009 Theft Other Portable Electronic Device 1/23/2014 NaN 10/22/2009 2009
16 Detroit Department of Health and Wellness Prom... MI NaN 646.0 11/26/2009 Theft Laptop, Desktop Computer 1/23/2014 A desktop and four laptop computers were stole... 11/26/2009 2009
17 NaN CA NaN 610.0 9/22/2009 Other E-mail 1/23/2014 NaN 9/22/2009 2009
18 Daniel J. Sigman MD PC MA NaN 1860.0 12/11/2009 Theft Other Portable Electronic Device, Other, Elect... 1/23/2014 Computer backup tapes containing EPHI for the ... 12/11/2009 2009
19 Massachusetts Eye and Ear Infirmary MA NaN 1076.0 11/10/2009 Theft Other 1/23/2014 NaN 11/10/2009 2009
20 BlueCross BlueShield Association DC Service Benefits Plan Administrative Services ... 3400.0 10/26/2009 Theft Paper 6/30/2014 The covered entity's (CE) business associate (... 10/26/2009 2009
21 BlueCross BlueShield Association DC Merkle Direct Marketing 15000.0 10/7/2009 Theft Paper 4/24/2014 The covered entity's (CE) business associate (... 10/7/2009 2009
22 Kaiser Permanente Medical Care Program CA NaN 15500.0 12/1/2009 Theft NaN 1/23/2014 NaN 12/1/2009 2009
23 Blue Island Radiology Consultants IL United Micro Data 2562.0 12/9/2009 Theft Other 6/30/2014 The covered entity's (CE's) business associate... 12/9/2009 2009
24 Goodwill Industries of Greater Grand Rapids, Inc. MI NaN 10000.0 12/15/2009 Theft Other 1/23/2014 On December 15, 2009, a safe was stolen from G... 12/15/2009 2009
25 Children's Medical Center of Dallas TX NaN 3800.0 11/19/2009 Loss Other Portable Electronic Device, Other 1/23/2014 NaN 11/19/2009 2009
26 NaN TX NaN NaN 11/19/2009 Theft Laptop 1/23/2014 NaN 11/19/2009 2009
27 Ashley and Gray DDS MO NaN 9309.0 1/10/2010 Theft Desktop Computer 1/23/2014 NaN 1/10/2010 2010
28 Advocate Health Care IL NaN 812.0 11/24/2009 Theft Laptop 1/23/2014 On November 24, 2009, an Advocate nurse's lapt... 11/24/2009 2009
29 The Methodist Hospital TX NaN 689.0 1/18/2010 Theft Other 1/23/2014 An unencrypted laptop computer was stolen from... 1/18/2010 2010
30 University of California, San Francisco CA NaN 7300.0 11/30/2009 Theft Laptop 1/23/2014 NaN 11/30/2009 2009
31 Carle Clinic Association IL NaN 1300.0 1/13/2010 Theft Other, Paper 1/23/2014 NaN 1/13/2010 2010
32 Educators Mutual Insurance Association of Utah UT Health Behavior Innovations (HBI) 5700.0 12/27/2009 Theft Other 1/23/2014 NaN 12/27/2009 2009
33 University Medical Center of Southern Nevada NV NaN 5103.0 10/31/2009 Theft Paper 1/23/2014 Between the dates of July 31, 2009 and Novembe... 10/31/2009 2009
34 Center for Neurosciences AZ NaN 1100.0 12/15/2009 Theft Laptop 1/23/2014 NaN 12/15/2009 2009
35 Brown University RI Blue Cross Blue Shield of RI 528.0 12/11/2009 Other Paper 1/23/2014 On January 5, 2010, BCBSRI was notified that a... 12/11/2009 2009
36 MMM Heath Care Inc. PR MSO of Puerto Rico, Inc. NaN 2/4/2010 Theft Paper 6/3/2014 The covered entity's (CE) business associate (... 2/4/2010 2010
37 PMC Medicare Choice PR MSO of Puerto Rico 605.0 2/4/2010 Theft Paper 6/3/2014 The covered entity's (CE) business associate (... 2/4/2010 2010
38 Cardiology Consultants/Baptist Health Care Cor... FL NaN 8000.0 12/19/2009 Theft Desktop Computer 6/30/2014 A desktop computer that contained the e-PHI of... 12/19/2009 2009
39 NaN TN NaN 3900.0 12/23/2009 Theft Paper 6/24/2014 The covered entity (CE) mailed the wrong infor... 12/23/2009 2009
40 Lucille Packard Children's Hospital CA NaN 532.0 1/11/2010 Other Desktop Computer 1/23/2014 NaN 1/11/2010 2010
41 University of New Mexico Health Sciences Center NM NaN 1900.0 2/8/2010 Other Desktop Computer 1/23/2014 NaN 2/8/2010 2010
42 Advanced NeuroSpinal Care CA NaN 3500.0 12/30/2009 Theft Network Server 4/22/2014 A computer containing the electronic protected... 12/30/2009 2009
43 Aspen Dental Care P.C. CO NaN NaN 10/4/2009 Theft NaN 6/30/2014 A computer hard drive containing encrypted pat... 10/4/2009 2009
44 Shands at UF FL NaN 12580.0 1/27/2010 Theft Laptop 1/23/2014 A laptop containing certain information collec... 1/27/2010 2010
45 Wyoming Department of Health WY NaN 9023.0 12/2/2009 Unauthorized Access/Disclosure Network Server 1/23/2014 NaN 12/2/2009 2009
46 Thrivent Financial for Lutherans WI NaN 9500.0 1/29/2010 Theft Laptop 1/23/2014 On January 29, 2010, there was a break-in at o... 1/29/2010 2010
47 North Carolina Baptist Hospital NC NaN 554.0 2/15/2010 Theft Paper 1/23/2014 NaN 2/15/2010 2010
48 Montefiore Medical Center NY NaN 625.0 2/20/2010 Theft NaN 6/3/2014 An unencrypted laptop computer containing the ... 2/20/2010 2010
49 Ernest T. Bice, Jr. DDS, P.A. TX NaN 21000.0 2/20/2010 Theft Other Portable Electronic Device, Other 1/23/2014 Three unencrypted external back-up drives were... 2/20/2010 2010

D. Produce a statistical summary of the data (5 pts)

In [5]:
security_breach_df.describe(include = "all")
Out[5]:
Name_of_Covered_Entity State Business_Associate_Involved Individuals_Affected Date_of_Breach Type_of_Breach Location_of_Breached_Information Date_Posted_or_Updated Summary breach_start year
count 46 50 9 45.000000 50 50 46 50 31 50 50.000000
unique 44 22 9 NaN 38 5 12 8 31 38 NaN
top Detroit Department of Health and Wellness Prom... CA Service Benefits Plan Administrative Services ... NaN 9/27/2009 Theft Paper 1/23/2014 A desktop computer that contained the e-PHI of... 9/27/2009 NaN
freq 2 12 1 NaN 6 41 10 37 1 6 NaN
mean NaN NaN NaN 6336.222222 NaN NaN NaN NaN NaN NaN 2009.240000
std NaN NaN NaN 12623.995819 NaN NaN NaN NaN NaN NaN 0.431419
min NaN NaN NaN 528.000000 NaN NaN NaN NaN NaN NaN 2009.000000
25% NaN NaN NaN 943.000000 NaN NaN NaN NaN NaN NaN 2009.000000
50% NaN NaN NaN 3400.000000 NaN NaN NaN NaN NaN NaN 2009.000000
75% NaN NaN NaN 7300.000000 NaN NaN NaN NaN NaN NaN 2009.000000
max NaN NaN NaN 83000.000000 NaN NaN NaN NaN NaN NaN 2010.000000

E. See which columns have null values (5pts)

In [6]:
security_breach_df.isnull().any()
Out[6]:
Name_of_Covered_Entity               True
State                               False
Business_Associate_Involved          True
Individuals_Affected                 True
Date_of_Breach                      False
Type_of_Breach                      False
Location_of_Breached_Information     True
Date_Posted_or_Updated              False
Summary                              True
breach_start                        False
year                                False
dtype: bool

D. See how many null values you have total in the data (5pts)

In [7]:
security_breach_df.isnull().sum().sum()
Out[7]:
73

E. Using the functions we went over in class, either remove the null values or change them to a non-null value. (5pts)

In [8]:
security_breach_df ['Name_of_Covered_Entity'] = security_breach_df ['Name_of_Covered_Entity'].fillna('No Name')
security_breach_df ['Name_of_Covered_Entity'] = security_breach_df ['Name_of_Covered_Entity'].replace('NaN', 'No Name')

security_breach_df ['Business_Associate_Involved'] = security_breach_df ['Business_Associate_Involved'].fillna('No Name')
security_breach_df ['Business_Associate_Involved'] = security_breach_df ['Business_Associate_Involved'].replace('NaN', '')

security_breach_df["Individuals_Affected"].fillna(security_breach_df["Individuals_Affected"].mean(), inplace = True)

security_breach_df ['Location_of_Breached_Information'] = security_breach_df ['Location_of_Breached_Information'].fillna('No Location')
security_breach_df ['Location_of_Breached_Information'] = security_breach_df ['Location_of_Breached_Information'].replace('NaN', 'No Location')

security_breach_df ['Summary'] = security_breach_df ['Summary'].fillna('No Information')
security_breach_df ['Summary'] = security_breach_df ['Summary'].replace('NaN', 'No Information')

security_breach_df
Out[8]:
Name_of_Covered_Entity State Business_Associate_Involved Individuals_Affected Date_of_Breach Type_of_Breach Location_of_Breached_Information Date_Posted_or_Updated Summary breach_start year
0 Brooke Army Medical Center TX No Name 1000.000000 10/16/2009 Theft Paper 6/30/2014 A binder containing the protected health infor... 10/16/2009 2009
1 Mid America Kidney Stone Association, LLC MO No Name 1000.000000 9/22/2009 Theft Network Server 5/30/2014 Five desktop computers containing unencrypted ... 9/22/2009 2009
2 No Name AK No Name 6336.222222 10/12/2009 Theft Other Portable Electronic Device, Other 1/23/2014 No Information 10/12/2009 2009
3 Health Services for Children with Special Need... DC No Name 3800.000000 10/9/2009 Loss Laptop 1/23/2014 A laptop was lost by an employee while in tran... 10/9/2009 2009
4 L. Douglas Carlson, M.D. CA No Name 5257.000000 9/27/2009 Theft Desktop Computer 1/23/2014 A shared Computer that was used for backup was... 9/27/2009 2009
5 David I. Cohen, MD CA No Name 857.000000 9/27/2009 Theft Desktop Computer 1/23/2014 A shared Computer that was used for backup was... 9/27/2009 2009
6 Michele Del Vicario, MD CA No Name 6145.000000 9/27/2009 Theft Desktop Computer 1/23/2014 A shared Computer that was used for backup was... 9/27/2009 2009
7 Joseph F. Lopez, MD CA No Name 952.000000 9/27/2009 Theft Desktop Computer 1/23/2014 A shared Computer that was used for backup was... 9/27/2009 2009
8 Mark D. Lurie, MD CA No Name 5166.000000 9/27/2009 Theft Desktop Computer 1/23/2014 A shared Computer that was used for backup was... 9/27/2009 2009
9 City of Hope National Medical Center CA No Name 6336.222222 9/27/2009 Theft Laptop 1/23/2014 A laptop computer was stolen from a workforce ... 9/27/2009 2009
10 The Children's Hospital of Philadelphia PA No Name 943.000000 10/20/2009 Theft No Location 1/23/2014 No Information 10/20/2009 2009
11 Cogent Healthcare of Wisconsin, S.C. TN No Name 6400.000000 10/11/2009 Theft Laptop 4/23/2014 A laptop was stolen from a locked office at th... 10/11/2009 2009
12 Universal American NY Democracy Data & Communications, LLC ( 83000.000000 11/12/2009 Other Paper 1/23/2014 In its breach report and during the course of ... 11/12/2009 2009
13 Kern Medical Center CA No Name 596.000000 10/31/2009 Theft Other 1/23/2014 No Information 10/31/2009 2009
14 Keith W. Mann, DDS, PLLC NC Rick Lawson, Professional Computer Services 2000.000000 12/8/2009 Hacking/IT Incident Desktop Computer, Network Server, Electronic M... 1/23/2014 No Information 12/8/2009 2009
15 Detroit Department of Health and Wellness Prom... MI No Name 10000.000000 10/22/2009 Theft Other Portable Electronic Device 1/23/2014 No Information 10/22/2009 2009
16 Detroit Department of Health and Wellness Prom... MI No Name 646.000000 11/26/2009 Theft Laptop, Desktop Computer 1/23/2014 A desktop and four laptop computers were stole... 11/26/2009 2009
17 No Name CA No Name 610.000000 9/22/2009 Other E-mail 1/23/2014 No Information 9/22/2009 2009
18 Daniel J. Sigman MD PC MA No Name 1860.000000 12/11/2009 Theft Other Portable Electronic Device, Other, Elect... 1/23/2014 Computer backup tapes containing EPHI for the ... 12/11/2009 2009
19 Massachusetts Eye and Ear Infirmary MA No Name 1076.000000 11/10/2009 Theft Other 1/23/2014 No Information 11/10/2009 2009
20 BlueCross BlueShield Association DC Service Benefits Plan Administrative Services ... 3400.000000 10/26/2009 Theft Paper 6/30/2014 The covered entity's (CE) business associate (... 10/26/2009 2009
21 BlueCross BlueShield Association DC Merkle Direct Marketing 15000.000000 10/7/2009 Theft Paper 4/24/2014 The covered entity's (CE) business associate (... 10/7/2009 2009
22 Kaiser Permanente Medical Care Program CA No Name 15500.000000 12/1/2009 Theft No Location 1/23/2014 No Information 12/1/2009 2009
23 Blue Island Radiology Consultants IL United Micro Data 2562.000000 12/9/2009 Theft Other 6/30/2014 The covered entity's (CE's) business associate... 12/9/2009 2009
24 Goodwill Industries of Greater Grand Rapids, Inc. MI No Name 10000.000000 12/15/2009 Theft Other 1/23/2014 On December 15, 2009, a safe was stolen from G... 12/15/2009 2009
25 Children's Medical Center of Dallas TX No Name 3800.000000 11/19/2009 Loss Other Portable Electronic Device, Other 1/23/2014 No Information 11/19/2009 2009
26 No Name TX No Name 6336.222222 11/19/2009 Theft Laptop 1/23/2014 No Information 11/19/2009 2009
27 Ashley and Gray DDS MO No Name 9309.000000 1/10/2010 Theft Desktop Computer 1/23/2014 No Information 1/10/2010 2010
28 Advocate Health Care IL No Name 812.000000 11/24/2009 Theft Laptop 1/23/2014 On November 24, 2009, an Advocate nurse's lapt... 11/24/2009 2009
29 The Methodist Hospital TX No Name 689.000000 1/18/2010 Theft Other 1/23/2014 An unencrypted laptop computer was stolen from... 1/18/2010 2010
30 University of California, San Francisco CA No Name 7300.000000 11/30/2009 Theft Laptop 1/23/2014 No Information 11/30/2009 2009
31 Carle Clinic Association IL No Name 1300.000000 1/13/2010 Theft Other, Paper 1/23/2014 No Information 1/13/2010 2010
32 Educators Mutual Insurance Association of Utah UT Health Behavior Innovations (HBI) 5700.000000 12/27/2009 Theft Other 1/23/2014 No Information 12/27/2009 2009
33 University Medical Center of Southern Nevada NV No Name 5103.000000 10/31/2009 Theft Paper 1/23/2014 Between the dates of July 31, 2009 and Novembe... 10/31/2009 2009
34 Center for Neurosciences AZ No Name 1100.000000 12/15/2009 Theft Laptop 1/23/2014 No Information 12/15/2009 2009
35 Brown University RI Blue Cross Blue Shield of RI 528.000000 12/11/2009 Other Paper 1/23/2014 On January 5, 2010, BCBSRI was notified that a... 12/11/2009 2009
36 MMM Heath Care Inc. PR MSO of Puerto Rico, Inc. 6336.222222 2/4/2010 Theft Paper 6/3/2014 The covered entity's (CE) business associate (... 2/4/2010 2010
37 PMC Medicare Choice PR MSO of Puerto Rico 605.000000 2/4/2010 Theft Paper 6/3/2014 The covered entity's (CE) business associate (... 2/4/2010 2010
38 Cardiology Consultants/Baptist Health Care Cor... FL No Name 8000.000000 12/19/2009 Theft Desktop Computer 6/30/2014 A desktop computer that contained the e-PHI of... 12/19/2009 2009
39 No Name TN No Name 3900.000000 12/23/2009 Theft Paper 6/24/2014 The covered entity (CE) mailed the wrong infor... 12/23/2009 2009
40 Lucille Packard Children's Hospital CA No Name 532.000000 1/11/2010 Other Desktop Computer 1/23/2014 No Information 1/11/2010 2010
41 University of New Mexico Health Sciences Center NM No Name 1900.000000 2/8/2010 Other Desktop Computer 1/23/2014 No Information 2/8/2010 2010
42 Advanced NeuroSpinal Care CA No Name 3500.000000 12/30/2009 Theft Network Server 4/22/2014 A computer containing the electronic protected... 12/30/2009 2009
43 Aspen Dental Care P.C. CO No Name 6336.222222 10/4/2009 Theft No Location 6/30/2014 A computer hard drive containing encrypted pat... 10/4/2009 2009
44 Shands at UF FL No Name 12580.000000 1/27/2010 Theft Laptop 1/23/2014 A laptop containing certain information collec... 1/27/2010 2010
45 Wyoming Department of Health WY No Name 9023.000000 12/2/2009 Unauthorized Access/Disclosure Network Server 1/23/2014 No Information 12/2/2009 2009
46 Thrivent Financial for Lutherans WI No Name 9500.000000 1/29/2010 Theft Laptop 1/23/2014 On January 29, 2010, there was a break-in at o... 1/29/2010 2010
47 North Carolina Baptist Hospital NC No Name 554.000000 2/15/2010 Theft Paper 1/23/2014 No Information 2/15/2010 2010
48 Montefiore Medical Center NY No Name 625.000000 2/20/2010 Theft No Location 6/3/2014 An unencrypted laptop computer containing the ... 2/20/2010 2010
49 Ernest T. Bice, Jr. DDS, P.A. TX No Name 21000.000000 2/20/2010 Theft Other Portable Electronic Device, Other 1/23/2014 Three unencrypted external back-up drives were... 2/20/2010 2010

F. Explain in writing what you did for the nulls in each column and why you did it. Remember to specify which column you are talking about.

The cell below is in markdown format so all you need to do is enter text. (15pts)

For column Name_of_Covered_Entity, I used the function fillna() to replace any null value with the word 'No Name'. I also used the replace() function to replace any text values of 'NaN' with the word 'No Name'. The reason I chose to replace these null values or the text values of 'NaN' with the word 'No Name' because I want to provide clarity to these missing values.

For column Business_Associate_Involved, I used the function fillna() to replace any null value with the word 'No Name'. I also used the replace() function to replace any text values of 'NaN' with the word 'No Name'. The reason I chose to replace these null values or the text values of 'NaN' with the word 'No Name' because I want to provide clarity to these missing values.

For column Individuals_Affected, I replaced any null values with the mean all the existing values in the field. The reason I chose to replace these null values with the mean all the existing values in the field because I want to provide clarity to these missing values.

For column Location_of_Breached_Information, I used the function fillna() to replace any null value with the word 'No Location'. I also used the replace() function to replace any text values of 'NaN' with the word 'No Location'. The reason I chose to replace these null values or the text values of 'NaN' with the word 'No Location' because I want to provide clarity to these missing values.

For column Summary, I used the function fillna() to replace any null value with the word 'No Information'. I also used the replace() function to replace any text values of 'NaN' with the word 'No Information'. The reason I chose to replace these null values or the text values of 'NaN' with the word 'No Information' because I want to provide clarity to these missing values.

Question 2: Functions and Formatted Output

Your client wants you to create a function that outputs the Individuals Affected, Date of Breach, and Type of Breach for all rows that contain the word 'laptop' within the Summary column.

A. Convert the columns you will be using in this section into series data structures. (5 pts)

Name them 'laptop', 'individual_affected', 'type_of_breach', 'breach_date'

In [9]:
laptop = security_breach_df['Summary']
laptop
Out[9]:
0     A binder containing the protected health infor...
1     Five desktop computers containing unencrypted ...
2                                        No Information
3     A laptop was lost by an employee while in tran...
4     A shared Computer that was used for backup was...
5     A shared Computer that was used for backup was...
6     A shared Computer that was used for backup was...
7     A shared Computer that was used for backup was...
8     A shared Computer that was used for backup was...
9     A laptop computer was stolen from a workforce ...
10                                       No Information
11    A laptop was stolen from a locked office at th...
12    In its breach report and during the course of ...
13                                       No Information
14                                       No Information
15                                       No Information
16    A desktop and four laptop computers were stole...
17                                       No Information
18    Computer backup tapes containing EPHI for the ...
19                                       No Information
20    The covered entity's (CE) business associate (...
21    The covered entity's (CE) business associate (...
22                                       No Information
23    The covered entity's (CE's) business associate...
24    On December 15, 2009, a safe was stolen from G...
25                                       No Information
26                                       No Information
27                                       No Information
28    On November 24, 2009, an Advocate nurse's lapt...
29    An unencrypted laptop computer was stolen from...
30                                       No Information
31                                       No Information
32                                       No Information
33    Between the dates of July 31, 2009 and Novembe...
34                                       No Information
35    On January 5, 2010, BCBSRI was notified that a...
36    The covered entity's (CE) business associate (...
37    The covered entity's (CE) business associate (...
38    A desktop computer that contained the e-PHI of...
39    The covered entity (CE) mailed the wrong infor...
40                                       No Information
41                                       No Information
42    A computer containing the electronic protected...
43    A computer hard drive containing encrypted pat...
44    A laptop containing certain information collec...
45                                       No Information
46    On January 29, 2010, there was a break-in at o...
47                                       No Information
48    An unencrypted laptop computer containing the ...
49    Three unencrypted external back-up drives were...
Name: Summary, dtype: object
In [10]:
individual_affected = security_breach_df['Individuals_Affected']
individual_affected
Out[10]:
0      1000.000000
1      1000.000000
2      6336.222222
3      3800.000000
4      5257.000000
5       857.000000
6      6145.000000
7       952.000000
8      5166.000000
9      6336.222222
10      943.000000
11     6400.000000
12    83000.000000
13      596.000000
14     2000.000000
15    10000.000000
16      646.000000
17      610.000000
18     1860.000000
19     1076.000000
20     3400.000000
21    15000.000000
22    15500.000000
23     2562.000000
24    10000.000000
25     3800.000000
26     6336.222222
27     9309.000000
28      812.000000
29      689.000000
30     7300.000000
31     1300.000000
32     5700.000000
33     5103.000000
34     1100.000000
35      528.000000
36     6336.222222
37      605.000000
38     8000.000000
39     3900.000000
40      532.000000
41     1900.000000
42     3500.000000
43     6336.222222
44    12580.000000
45     9023.000000
46     9500.000000
47      554.000000
48      625.000000
49    21000.000000
Name: Individuals_Affected, dtype: float64
In [11]:
type_of_breach = security_breach_df['Type_of_Breach']
type_of_breach
Out[11]:
0                               Theft
1                               Theft
2                               Theft
3                                Loss
4                               Theft
5                               Theft
6                               Theft
7                               Theft
8                               Theft
9                               Theft
10                              Theft
11                              Theft
12                              Other
13                              Theft
14                Hacking/IT Incident
15                              Theft
16                              Theft
17                              Other
18                              Theft
19                              Theft
20                              Theft
21                              Theft
22                              Theft
23                              Theft
24                              Theft
25                               Loss
26                              Theft
27                              Theft
28                              Theft
29                              Theft
30                              Theft
31                              Theft
32                              Theft
33                              Theft
34                              Theft
35                              Other
36                              Theft
37                              Theft
38                              Theft
39                              Theft
40                              Other
41                              Other
42                              Theft
43                              Theft
44                              Theft
45    Unauthorized Access/Disclosure 
46                              Theft
47                              Theft
48                              Theft
49                              Theft
Name: Type_of_Breach, dtype: object
In [12]:
breach_date = security_breach_df['Date_of_Breach']
breach_date
Out[12]:
0     10/16/2009
1      9/22/2009
2     10/12/2009
3      10/9/2009
4      9/27/2009
5      9/27/2009
6      9/27/2009
7      9/27/2009
8      9/27/2009
9      9/27/2009
10    10/20/2009
11    10/11/2009
12    11/12/2009
13    10/31/2009
14     12/8/2009
15    10/22/2009
16    11/26/2009
17     9/22/2009
18    12/11/2009
19    11/10/2009
20    10/26/2009
21     10/7/2009
22     12/1/2009
23     12/9/2009
24    12/15/2009
25    11/19/2009
26    11/19/2009
27     1/10/2010
28    11/24/2009
29     1/18/2010
30    11/30/2009
31     1/13/2010
32    12/27/2009
33    10/31/2009
34    12/15/2009
35    12/11/2009
36      2/4/2010
37      2/4/2010
38    12/19/2009
39    12/23/2009
40     1/11/2010
41      2/8/2010
42    12/30/2009
43     10/4/2009
44     1/27/2010
45     12/2/2009
46     1/29/2010
47     2/15/2010
48     2/20/2010
49     2/20/2010
Name: Date_of_Breach, dtype: object

str.contains is a function that looks through a string and determines if a word is within that string. It returns a True if it finds the word and a False if it does not.

B. Using str.contains, locate the rows that contain 'laptop' in the summary column. Save this output to a column called Laptop_Binary. (10 pts)

You can find the pandas documentation for this function below.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html

In [13]:
Laptop_Binary = security_breach_df['Summary'].str.contains('laptop')
Laptop_Binary
Out[13]:
0     False
1     False
2     False
3      True
4     False
5     False
6     False
7     False
8     False
9      True
10    False
11     True
12    False
13    False
14    False
15    False
16     True
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26    False
27    False
28     True
29     True
30    False
31    False
32    False
33    False
34    False
35    False
36    False
37    False
38    False
39    False
40    False
41    False
42    False
43    False
44     True
45    False
46     True
47    False
48     True
49     True
Name: Summary, dtype: bool

B. Create a variable named laptop that is equal to the Laptop_Binary column. Then, create a for loop using an if statement that prints all of the True values from the Laptop_Binary column. (15 pts)

Note: Remember that True and False are boolean, so do not put quotes around them.

Your loop should return 10 True values (10 rows of True).

In [14]:
laptop = Laptop_Binary
laptop
for i in range(len(laptop)):
    if laptop[i]:
        print (laptop[i])
True
True
True
True
True
True
True
True
True
True

C. Use .format() create an output for your for loop from the columns specified by the client above. Make sure the output is easy to read and each column of output is lined up. (20pts)

Note: Remember to include the index variable for each variable in .format() (i.e. laptop[ i ] )

In [15]:
for i in range(len(laptop)):
    if laptop[i]:
        print("Individuals Affected: {Individuals_Affected:<8.0f} Date of Breach: {Date_of_Breach:<13s} Type of Breach: {Type_of_Breach:<6s}".format(Index_Variable = i, Individuals_Affected = individual_affected[i], Date_of_Breach = breach_date[i], Type_of_Breach = type_of_breach[i]))
Individuals Affected: 3800     Date of Breach: 10/9/2009     Type of Breach: Loss  
Individuals Affected: 6336     Date of Breach: 9/27/2009     Type of Breach: Theft 
Individuals Affected: 6400     Date of Breach: 10/11/2009    Type of Breach: Theft 
Individuals Affected: 646      Date of Breach: 11/26/2009    Type of Breach: Theft 
Individuals Affected: 812      Date of Breach: 11/24/2009    Type of Breach: Theft 
Individuals Affected: 689      Date of Breach: 1/18/2010     Type of Breach: Theft 
Individuals Affected: 12580    Date of Breach: 1/27/2010     Type of Breach: Theft 
Individuals Affected: 9500     Date of Breach: 1/29/2010     Type of Breach: Theft 
Individuals Affected: 625      Date of Breach: 2/20/2010     Type of Breach: Theft 
Individuals Affected: 21000    Date of Breach: 2/20/2010     Type of Breach: Theft 

D. Using the for loop you created in question C, create a function called laptop_breaches.

This function will take one argument called 'column' which will be the 'Laptop_Binary' column in this case.

laptop_breaches will output the three columns outlined by the client above.

Call the function to make sure it works.

In [16]:
def laptop_breaches (column):
    for i in range(len(column)):
        if column[i]:
            print("Individuals Affected: {Individuals_Affected:<8.0f} Date of Breach: {Date_of_Breach:<13s} Type of Breach: {Type_of_Breach:<6s}".format(Index_Variable = i, Individuals_Affected = individual_affected[i], Date_of_Breach = breach_date[i], Type_of_Breach = type_of_breach[i]))
laptop_breaches(Laptop_Binary)
Individuals Affected: 3800     Date of Breach: 10/9/2009     Type of Breach: Loss  
Individuals Affected: 6336     Date of Breach: 9/27/2009     Type of Breach: Theft 
Individuals Affected: 6400     Date of Breach: 10/11/2009    Type of Breach: Theft 
Individuals Affected: 646      Date of Breach: 11/26/2009    Type of Breach: Theft 
Individuals Affected: 812      Date of Breach: 11/24/2009    Type of Breach: Theft 
Individuals Affected: 689      Date of Breach: 1/18/2010     Type of Breach: Theft 
Individuals Affected: 12580    Date of Breach: 1/27/2010     Type of Breach: Theft 
Individuals Affected: 9500     Date of Breach: 1/29/2010     Type of Breach: Theft 
Individuals Affected: 625      Date of Breach: 2/20/2010     Type of Breach: Theft 
Individuals Affected: 21000    Date of Breach: 2/20/2010     Type of Breach: Theft 
In [ ]: