IBM Data Course 4: Python for DS
Posted on 03/04/2019, in Data Science.This note was first taken when I learnt the IBM Data Professional Certificate course on Coursera.
settings_backup_restore
Go back to Course 3.
keyboard_arrow_right
Go to Course 5.
tocIn this post
Week 1: Python Basics
- Types
- Types:
int,float,str,bool - Check the type:
type(11) - Boolean:
True,False int(12.8)orint(12.3)will return12
- Types:
- Expressions and Variables
- Operators:
+,-*,/. /returns afloat//returns aint11//5returns1
- Operators:
- String
- Think a string as an order sequence,
a = 'dinhanhthi' - Can use indexes (start from 0) to consider a character in a string,
a[0]returnsd - Indexes can be a negative number,
a[-1]returnsi - Index can be a range,
a[1:3]returnsin(don’t consider the last number in the range) - stride:
a[::2]returnsdnahh - If
b='123456'thenb[1::2]returns'246' - lenght of string:
len(a)returns10 - combine 2 string, use
+ - we also can use multiply on string:
a*3returnsdinhanhthidinhanhthidinhanhthi - string is immutable, we cannot do this
a[0]=s \nreturns new line in string'treturns a tab in string\\returns\- print everything:
print(r'dinh \n anh')returnsdinh \n anh
- Think a string as an order sequence,
- String methods
a.upper()returnsDINHANHTHIa.replace('dinh','tran')returnstrananhthi- Find a substring:
a.find(anh)returns4
Week 2 : Python Data Structures
- Tuples
- Ordered sequence,
a = (1,'thi',3.2,4,5) type(a)returnstuplea[1]returns'thi'a[-1]returns5b = a + (6,'dinh')returns(1,'thi',2,3.2,4,5,6,'dinh')- Can apply slicing,
a[1:3]returns('thi',3.2) len(a)returns 5- Tuples is immutable (we can’t change them)
a=(1,3,2),b=sorted(a)returns a list[1,2,3]- if
a=(1,'thi'), we cannot applysorted - tuple is nesting,
c = (1,2,(3,4),'thi') c[2][1]returns4
- Ordered sequence,
- Lists
- Ordered sequence,
a = ['thi',2,3,4] - List is mutable,
a[1]=5then a takes['thi',5,3,4] - Nesting,
b = [[1,2],'thi',3.2] a + breturns['thi',2,3,4,[1,2],'thi',3.2]a.extend([5,6])returns['thi',2,3,4,5,6]a.append([5,6])returns['thi',2,3,4,[5,6]](only 1 element to be appended)- Every time we apply methods, list changes
- Remove some element:
del(a[0])and then a takes[2,3,4] 'a,b,c,d'.split(",")returns['a','b','c','d']d='[1,2]',e=d. If d changes, e changes too. They refer to the same list.- Clone:
e=d[:]. e and d are different - Get helps:
help(a)
- Ordered sequence,
- Dictionaries
- Type of collections in python,
a = {'key1':1, 'key2':[1,2]} a['key1']returns1- Add new entry:
a['key3'] = (1,2) - Delete:
del(a['key2']) - Check:
'key3' in areturnsTrue a.keys()returns['key1','key2','key3']- The same:
a.values() a = {'key':1, 'key':2}then a returns{'key':2}
- Type of collections in python,
- Sets
- A type of collections,
a = {'thi',2,3} - They are unordered.
- Have unique element
b = {1,2,1}then b takes{1,2}set(<list>)returns a set. Duplicated elements are removed to keep only one.a.add(5)a.remove('thi')- Check:
2 in areturnsTrue - Intersect between 2 sets:
c = a & b - Union:
c = a.union(b) - Check the subset?
a.issubset(b)to check if a is a subset of b?
- A type of collections,
Week 3: Python Programming Fundamentals
- If
- Comparison Operators:
a==b,>, <, >=, <=, != -
Logic operators:
not(a),a or b,a and bif (conditions): statements if (conditions): statements else: statements if (conditions): statements elif (conditions): statements else: statements
- Comparison Operators:
- For
range(5)returns a list[0,1,2,3,4]-
range(10,15)returns[10,11,12,13,14]for i in range(5): statements a = ['1','2','3'] for i in a: statements for idx, val in a: statements # idx: index of elements in a # val: values of element in a
-
While
while (conditions): statements - Functions
sorted(a)returns new sorted list or tuple but a doesn’t changea.sort()makes a change-
Global var can be used in the local function but not inverse
def name(variables): statements return value def MJ(): print("Micheal Jackson") def NoWork(): pass // return None
- Objects and Classes
- An object has following
- type
- an internal data representation (a blueprint)
- methods
- An object is an instance of particular type
type(<object>)finds the type of an object- class = type’s method = all functions that class/type provides
-
Use
dir(<nameOfObject>)to check all methods inside a class/objectclass Circle(onject): def __init__(self, radius, color): self.radius = radius self.color = color def add_radius(self,r): self.radius = self.radius + r RedCircle = Circle(10,'red') RedCircle.radius // returns 10 RedCircle.add_radius(8) // returns 18
- An object has following
Week 4: Working with Data in Python
- Reading a file with open
file1 = open(<path>,'r')wherew(writing),r(reading),a(appending)file1is file objectfile1.namefor name of file,file1.modeto seer- Close the file:
file1.close() file1.closedcheck if file is closed or not?- Using
withto automatically close the file after using it.with open("filename.txt","r") as file1: file_stuff = file1.read() print(file_stuff) file1.readlines()returns a list, each element as a line in file.file1.readline()return each line.
- Writing files with Open
file1 = open(<path>,'w')with open("filename.txt","w") as file1: file1.write("line 1")- We can use mode
"a"to add new line to the filewith open("filename.txt","a") as file1: file1.write("line 2") -
We can copy a file as follows
with open("filename1.txt","r") as file1: with open("filename2.txt","w") as file2: for line in file1: file2.write(line)
- Loading data with Pandas
import pandas as pddf = pd.read_csv(<csv_path>)df.head()returns first 5 lines in dataframedf.read_excel(<excel_path>)- Create (by hand) a df as a dictionary where keys as a columns, each value is a list stading for a row.
x = df[['lenght']]create a new df consisting 1 column.- or multiple columns:
x = df[['col1','col2','col3']] df.ix[0,0]consider the element at row 0 and column 0df.ix[1,'artist']consider the 1st row and column named “artist”df.ix[1:3,2:1]df.ix[1:3,'artist':'released']
- Working with and saving data in Pandas
df['released'].unique()returns a column containing unique values.df[df['released']>=1980]returns a df whose column “released” has value >= 1980- Save df:
df1.to_csv(<filename.csv>)
- One dimensional Numpy
import numpy as npa = np.array([0,1,2,3,4])a[2]returns 2type(a)returns np.ndarraya.dtypereturnsdtype('int64')a.sizereturns number of elements in the arraya.ndimreturns the number of array’s dimension (1)a.shapereturns the shape of the array (5,)- Change value of each element in the array:
a[0] = 100 d = a[1:3]returnsd=array([1,2])a + breturns elementwise adding.a*2: each element multiplies by 2a*b: multiply elementwisenp.dot(a,b): dot producta+1: add 1 to each elementa.mean(): find the mean of all elements in the arraya.max(): largest value in the arraynp.pi, np.sin()np.linear(-2,2,num-5): creates an array containing 5 elemnets from -2 to 2- Plot:
import matplotlib.pyplot as plt plt.plot(x,y)where x,y are 2 arrays
- Two Dimensional Numpy:
a = [[1,2],[3,4]]and then cast the listA = np.array(a)A.ndimreturns 2 : the number of nested listsA.shapereturns (2,2)A.sizereturns 4A[0][1],A[0:2,1]
Week 5: Fake Album Cover Game
- Web scraping or harvesting data involves extracting data from websites.
from IPython.display import Image as IPythonImageand then useIPythonImage(filename='sample-out.png')to display image in python notebook.- Random Wiki post: https://en.wikipedia.org/wiki/Special:Random
import requests
wikipedia_link='https://en.wikipedia.org/wiki/Special:Random'
raw_random_wikipedia_page = requests.get(wikipedia_link)
page = raw_random_wikipedia_page.text // page source as a string "page"
print(page)