DataQuest 1: Step 1 - Python Introduction
Posted on 09/09/2018, in Data Science, Python.This note is used for my notes about the Data Scientist path on dataquest. I take this note after I have some basics on python with other notes, that’s why I just write down some new-for-me things.
- Mission 1 - Python basics
- Mission 2 - Files and loops
- Mission 80 - Booleans and If statements
- Mission 3 - List operators
- Mission 70 - Ditionaries
- Mission 4 - Introduction to functions
- Mission 28 - Debugging errors
- Mission 9 - Guided project: Explore U.S. Births
- Mission 5 - Modules
- Mission 66 - Classses and Objects
- Mission 7 - Error Handling
- Mission 16 - List Comprehensions
- Challenge 17 - Variable Scopes
- Mission 82 - Regular expressions
- Mission 84 - Dates in Python
- Mission 218 - Guided Project: Exploring Gun Deaths in the US
Mission 1 - Python basics
Mission 2 - Files and loops
- Add an element to a list:
a.append(<element>)
- Open a file:
f = open("test.txt", "r")
wherer
as read this file. - Read a file content:
a = f.read()
- Split a string:
a.split("<symbol>")
-
Procedure:
f = open("<file-name>", "r") data = f.read() data_split = data.split("\n") list_of_list = [] for row in data_split: list_of_list.append(row.split(","))
Mission 80 - Booleans and If statements
Mission 3 - List operators
Mission 70 - Ditionaries
- Key can be a number.
- Search key in dict:
<key> in <dict>
- Create an empty dictionary:
a = {}
Mission 4 - Introduction to functions
-
Structure
def <function-name>(<input>): <arguments> return <output>
Mission 28 - Debugging errors
SyntaxError
: lack of"
orde
instead ofdef
IndentationError
: differences indentation for lines.- See more the exceptions and list of errors
- Runtime errors
TypeError
: errors in type of variables.Traceback
: block of codes may cause the errors.ValueError
: convert a string to a float, for example.
IndexError
: try to access an element that’s not in a list’s indexAttributeError
: try to call a method or attribute on an object that doesn’t contain it
Mission 9 - Guided project: Explore U.S. Births
Sample code
def month_births(input_lstlst):
births_per_month = {}
for lst in input_lstlst:
if lst[1] in births_per_month:
births_per_month[lst[1]] += lst[4]
else:
births_per_month[lst[1]] = lst[4]
return births_per_month
cdc_month_births= month_births(cdc_list)
cdc_month_births
Mission 5 - Modules
A module is a collection of functions and variables that have been bundled together in a single file.
-
Read the list with
csv
import csv f = open("<data>.csv") csvreader = csv.reader(f) data = list(csvreader)
Mission 66 - Classses and Objects
- Class’s name shoule be in PascalCase (Capitalize the fiest letter of each word)
- Internal functions of a class are called methods
- we had to pass in an argument to the
__init__()
method,self
- If you didn’t have
self
, then the class wouldn’t know where to store the internal data you wanted to keep -
There is a corresponding between
.type
inside class andtype()
methodclass Dataset: def __init__(self): self.type = "csv"
-
enumerate
: look for the indexes and get the values corresponding to this indexfor idx, value in enumerate(['foo', 'bar']): print(idx, value)
set(<list>)
gives unique variables of this list (or transform a list to be a set)- Python special methods : like
__init__
- When we implemented
__init__()
, it told the python interpreter that anything within that method is what we want to initialize when we create our object. __str__
: return display of a dataset
- When we implemented
-
Sample code of
Dataset
class in this missionclass Dataset: def __init__(self, data): self.header = data[0] self.data = data[1:] # Add the special method here def __str__(self): return str(self.data[0:10]) def column(self, label): if label not in self.header: return None index = 0 for idx, element in enumerate(self.header): if label == element: index = idx column = [] for row in self.data: column.append(row[index]) return column def count_unique(self, label): unique_results = set(self.column(label)) count = len(unique_results) return count
nfl_dataset = Dataset(nfl_data) print(nfl_dataset) ~~~
Mission 7 - Error Handling
-
If you surround the code that causes an error with a try/except block, the error will be handled, and the code will continue to run:
try: int('') except Exception: print("There was an error")
- When the Python interpreter generates an exception, it actually creates an instance of the
Exception
class except Exception as exc:
assign the instance of theException
class to the variableexc
.-
We can use the pass keyword to avoid generating an error (if we don’t want to do anything)
try: int('') except Exception: pass
Mission 16 - List Comprehensions
enumerate()
allows us to have 2 variables in the body of afor
loop.enumerate
of a list of lists return idx as a idx of rows-
Using
foor
loop insideanimal_lengths = [len(animal) for animal in animals] teams = [row[1] for row in nfl_suspensions]
None
object (typeNoneType
) and usingvar is None
to checkis
check for object equality instead of==
comparing logical (errors when comparingTrue
withNone
)-
<dict>.items()
method, which allows us to iterate through keys and values at the same time.for key, val in <dict>.items():
Challenge 17 - Variable Scopes
- Once we overwrite the
sum
variable with a value, we can’t access the function anymore. - Local scope variable has the same name with global ones doesn’t lead to an error!
- If local var doesn’t exist, python will check the global var with the same name and use it but cannot change it
- local > global > built-in functions/variables
- Defind
global <var>
in separated line and then use it later.
Mission 82 - Regular expressions
-
What it is?
-
Another example, looks like wildcat (using
.
as an replacement character) - Some special cases:
.
for a character,^a
all strings start witha
,a$
all strings end witha
[bcr]at
any characters within[]
can be filled the space- Using
\
to escape special charaters "cat|dog"
would match"catfish"
and"hotdog"
for begin and end charaters"[0-9]"
will match any character that falls between0
and9
"[a-z]"
: lowercase"[0-9]{4}"
: repeat the pattern"[0-9]"
four times by writing
- Using
re
module (package) for the regular expression. -
re.search(regex, string)
: whether isstring
is a match forregex
if re.search("^[\[\(][Ss]erious[\]\)]",post[0]) is not None: serious_start_count += 1
re.sub("yo", "hello", "yo world")
gives"Hello world"
re.findall("[a-z]", "abc123")
would return["a", "b", "c"]
, because those are the substrings that match the regex.
Mission 84 - Dates in Python
time module
import time
time
module represents Unix timestamps (from the epoch - 1970)time.time()
gives the current timestamp (the number of seconds from epoch)time.gmtime(<time-stamp>)
give human readablestruct_time
class<struct_time>.tm_year
gives year<struct_time>.tm_mon
gives month (1-12)<struct_time>.tm_mday
gives day (1-31)<struct_time>.tm_hour
gives hour (0-23)<struct_time>.tm_min
gives minute (0-59)
datetime module
import datetime
: we can perform arithmetic on time.- These
datetime
instances appear similar tostruct_time
instances. Attributes:year
,month
,day
,hour
,minute
,second
,microsecond
datetime.datetime.utcnow()
gives the current utc timedatetime.datetime.now()
givesdatetime.datetime(2018, 9, 9, 7, 45, 36, 986100)
datetime.timedelta
(cf) if we want to perform arithmetic.datetime.timedelta(weeks = 1, days = 23)
- attributes:
weeks
,days
,hours
,minutes
,seconds
,milliseconds
,microseconds
import datetime kirks_birthday = datetime.datetime(year = 2233, month = 3, day = 22) diff = datetime.timedelta(weeks = 15) before_kirk = kirks_birthday - diff
-
datetime.datetime.strftime()
(cf) converts to human readablemarch3 = datetime.datetime(year = 2010, month = 3, day = 3) pretty_march3 = march3.strftime("%b %d, %Y") print(pretty_march3)
-
datetime.datetime.strptime()
contrary tostrftime
march3 = datetime.datetime.strptime("Mar 03, 2010", "%b %d, %Y")
datetime.datetime.fromtimestamp()
converts from unix timestamp to datetime
Mission 218 - Guided Project: Exploring Gun Deaths in the US
-
Any beginning setting up
import csv with open("guns.csv", "r") as f: reader = csv.reader(f) data = list(reader) print(data[0:5])
-
Count how many in each sex type (sample code)
sexes = [row[5] for row in data] sex_counts = {} for item in sexes: if item in sex_counts: sex_counts[item] += 1 else: sex_counts[item] = 1 print(sex_counts)
-
Remember that: using
<dict>.items()
for dictionary andenumerates(<list>)
for list infor key, val in ...