02 Filtering & Sorting

hyerimir 2022. 8. 7. 17:09

2022. 8. 7. 17:09

https://www.datamanim.com/dataset/99_pandas/pandasMain.html

판다스 연습 튜토리얼 — DataManim

Question 43 df의 데이터 중 new_price값이 lst에 해당하는 경우의 데이터 프레임을 구하고 그 갯수를 출력하라 lst =[1.69, 2.39, 3.39, 4.45, 9.25, 10.98, 11.75, 16.98]

www.datamanim.com

Question 20

데이터를 로드하라.

DataUrl = 'https://raw.githubusercontent.com/Datamanim/pandas/main/chipo.csv'
df = pd.read_csv(DataUrl)
Ans = type(df)
Ans

Question 21

quantity컬럼 값이 3인 데이터를 추출하여 첫 5행을 출력하라

Ans = df.loc[df['quantity']==3].head()
Ans

Question 22

quantity컬럼 값이 3인 데이터를 추출하여 index를 0부터 정렬하고 첫 5행을 출력하라

Ans = df.loc[df['quantity']==3].head().reset_index(drop=True)
Ans

df.reset_index(drop=True)

Question 23

quantity , item_price 두개의 컬럼으로 구성된 새로운 데이터 프레임을 정의하라

Ans =df[['quantity','item_price']]
Ans

Question 24

item_price 컬럼의 달러표시 문자를 제거하고 float 타입으로 저장하여 new_price 컬럼에 저장하라

df['new_price'] = df['item_price'].str[1:].astype('float')
Ans = df['new_price'].head()
Ans

df[''].str[1:].astype('float')

Question 25

new_price 컬럼이 5이하의 값을 가지는 데이터프레임을 추출하고, 전체 갯수를 구하여라

Ans = len(df.loc[df.new_price <=5])
Ans

Question 26

item_name명이 Chicken Salad Bowl 인 데이터 프레임을 추출하라고 index 값을 초기화 하여라

Ans = df.loc[df.item_name =='Chicken Salad Bowl'].reset_index(drop=True)
Ans.head(3)

Question 27

new_price값이 9 이하이고 item_name 값이 Chicken Salad Bowl 인 데이터 프레임을 추출하라

Ans = df.loc[(df.item_name =='Chicken Salad Bowl') & (df.new_price <= 9)]
Ans.head(5)

df.loc[(조건1) & (조건2)]

Question 28

df의 new_price 컬럼 값에 따라 오름차순으로 정리하고 index를 초기화 하여라

Ans = df.sort_values('new_price').reset_index(drop=True)
Ans.head(4)

df.sort_values('특정변수')

Question 29

df의 item_name 컬럼 값중 Chips 포함하는 경우의 데이터를 출력하라

Ans = df.loc[df.item_name.str.contains('Chips')]
Ans.head(5)

Question 30

df의 짝수번째 컬럼만을 포함하는 데이터프레임을 출력하라

Ans = df.iloc[:,::2]
Ans.head(5)

Question 31

df의 new_price 컬럼 값에 따라 내림차순으로 정리하고 index를 초기화 하여라

Ans = df.sort_values('new_price',ascending=False).reset_index(drop=True)
Ans.head(4)

df.sort_values('특정변수', ascending = False) - 특정변수 기준 내림차순 정렬

Question 32

df의 item_name 컬럼 값이 Steak Salad 또는 Bowl 인 데이터를 인덱싱하라

Ans = df.loc[(df.item_name =='Steak Salad') | (df.item_name =='Bowl')]
Ans

Question 33

df의 item_name 컬럼 값이 Steak Salad 또는 Bowl 인 데이터를 데이터 프레임화 한 후, item_name를 기준으로 중복행이 있으면 제거하되 첫번째 케이스만 남겨라

Ans = df.loc[(df.item_name =='Steak Salad') | (df.item_name =='Bowl')]
Ans = Ans.drop_duplicates('item_name')
Ans

df.drop_duplicates('특정변수')

Question 34

df의 item_name 컬럼 값이 Steak Salad 또는 Bowl 인 데이터를 데이터 프레임화 한 후, item_name를 기준으로 중복행이 있으면 제거하되 마지막 케이스만 남겨라

Ans = df.loc[(df.item_name =='Steak Salad') | (df.item_name =='Bowl')]
Ans = Ans.drop_duplicates('item_name',keep='last')
Ans

df.drop_duplicates('특정변수', keep = 'last')

특정변수 기준으로 중복행 제거하되 마지막 케이스 남기기

Question 35

df의 데이터 중 new_price값이 new_price값의 평균값 이상을 가지는 데이터들을 인덱싱하라

Ans = df.loc[df.new_price >= df.new_price.mean()]
Ans.head(5)

Question 36

df의 데이터 중 item_name의 값이 Izze 데이터를 Fizzy Lizzy로 수정하라

df.loc[df.item_name =='Izze','item_name'] = 'Fizzy Lizzy'
Ans = df
Ans.head(3)

Question 37

df의 데이터 중 choice_description 값이 NaN 인 데이터의 갯수를 구하여라

Ans = df.choice_description.isnull().sum()
Ans

df.특정변수.isnull().sum()

Question 38

df의 데이터 중 choice_description 값이 NaN 인 데이터를 NoData 값으로 대체하라(loc 이용)

df.loc[df.choice_description.isnull(),'choice_description'] ='NoData'
Ans = df
Ans.head()

Question 39

df의 데이터 중 choice_description 값에 Black이 들어가는 경우를 인덱싱하라

Ans = df[df.choice_description.str.contains('Black')]
Ans.head(5)

df.loc[df.특정변수.str.contains('')]

Question 40

df의 데이터 중 choice_description 값에 Vegetables 들어가지 않는 경우의 갯수를 출력하라

Ans = len(df.loc[~df.choice_description.str.contains('Vegetables')])
Ans

df.loc[~df.특정변수.str.contains('')] : ''포함하지 않는

Question 41

df의 데이터 중 item_name 값이 N으로 시작하는 데이터를 모두 추출하라

Ans = df[df.item_name.str.startswith('N')]
Ans.head(3)

df.loc[df.특정변수.startswith('')]

Question 42

df의 데이터 중 item_name 값의 단어갯수가 15개 이상인 데이터를 인덱싱하라

Ans= df[df.item_name.str.len() >=15]
Ans.head(3)

df[df.특정변수.str.len() >= 15]

Question 43

df의 데이터 중 new_price값이 lst에 해당하는 경우의 데이터 프레임을 구하고 그 갯수를 출력하라

lst =[1.69, 2.39, 3.39, 4.45, 9.25, 10.98, 11.75, 16.98]

lst =[1.69, 2.39, 3.39, 4.45, 9.25, 10.98, 11.75, 16.98]
Ans = df.loc[df.new_price.isin(lst)]

display(Ans.head(3))
print(len(Ans))

df.loc[df.특정변수.isin(lst)]

'Python' 카테고리의 다른 글

05 Time_Series (0)	2022.08.13
04 Apply, Map (0)	2022.08.07
03 Grouping (0)	2022.08.07
01 Getting & Knowing Data (0)	2022.08.07
for문 동적변수 생성 (0)	2022.08.07

hyerimir_archive

02 Filtering & Sorting

'Python' 카테고리의 다른 글

+ Recent posts

티스토리툴바