๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ—‚.์ž๊ฒฉ์ฆ/๐Ÿ“.๋น…๋ฐ์ดํ„ฐ๋ถ„์„๊ธฐ์‚ฌ

[์‹ค๊ธฐ] ๋น…๋ฐ์ดํ„ฐ ๋ถ„์„ ๊ธฐ์‚ฌ ์ž‘์—…ํ˜• 3 ์ •๋ฆฌ (Python)

by ๐Ÿ’พ๊ณ ๊ตฌ๋งˆ๋ง›ํƒ•๋จน๊ณ ์‹ถ๋‹ค 2023. 12. 4.
728x90

๋น…๋ฐ์ดํ„ฐ ๋ถ„์„ ๊ธฐ์‚ฌ ์‹ค๊ธฐ -  ์ž‘์—…ํ˜• 3 ์ •๋ฆฌ๋ณธ

๐Ÿšจ ๋ชจ๋“  ์ฝ”๋“œ๋Š” ํŒŒ์ด์ฌ ๊ธฐ์ค€์ž…๋‹ˆ๋‹ค.

๋”๋ณด๊ธฐ

์ž‘์—…ํ˜• 3์— ๋Œ€ํ•œ ์ •๋ณด๋Š” ์ •๋ง ์ ๊ณ  7ํšŒ ์‹ค๊ธฐ ์‹œํ—˜์„ ๋ณด๊ณ  ์ •๋ฆฌ๊ธ€์„ ์ž‘์„ฑํ•˜์ง€๋งŒ ์•„์ง๊นŒ์ง€ ์ž˜ ๋ชจ๋ฅด๊ฒ ๋‹ค.

์šฐ์„  ์ •๋ง ๊ธฐ๋ณธ์ ์œผ๋กœ ์•„๋ž˜์˜ ๋ฌธ์ œ๋ฅผ ๋ชจ๋‘ ์ตํžˆ๊ณ  ์•ฝ๊ฐ„์˜ ์‘์šฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋ฉด ๋ถ€๋ถ„ ์ ์ˆ˜๋ฅผ ๋ณด๋ ค๋ณผ ์ˆœ ์žˆ์„ ๊ฒƒ ๊ฐ™๋‹ค.

 

1. T๊ฒ€์ •

1) ์Œ์ฒดํ‘œ๋ณธ

from scipy import stats
s , p = stats.ttest_rel(data['bp_post'], data['bp_pre'], alternative="less")
if (p > 0.05):
    result4 = 't'
else:
    result4 = 'f'
# ์—ฌ๊ธฐ์„œ result4๊ฐ€ f๋กœ ๋‚˜์™”์œผ๋ฏ€๋กœ p๊ฐ’์ด ์œ ์˜์ˆ˜์ค€๋ณด๋‹ค ๋‚ฎ์Œ์„ ์˜๋ฏธ
# ์ฆ‰ ์‹คํ—˜์— ์„ฑ๊ณตํ•˜์˜€์œผ๋‹ˆ ๋Œ€๋ฆฝ๊ฐ€์„ค์ด ์ฑ„ํƒ๋˜์—ˆ๋‹ค๋Š” ์˜๋ฏธ, ์ฆ‰ ๊ท€๋ฌด๊ฐ€์„ค์€ ๊ธฐ๊ฐ๋˜์—ˆ์Œ


2) ๋…๋ฆฝํ‘œ๋ณธ

from scipy import stats
s, p = stats.ttest_ind(group1, group2)

# ๊ฒฐ๊ณผ๊ฐ€ t๋ผ๋ฉด ์œ ์˜์ˆ˜์ค€๋ณด๋‹ค ์ข‹์€ ๊ฒƒ ์ฆ‰ ์‹คํ—˜์— ์‹คํŒจํ•จ, ๊ท€๋ฌด๊ฐ€์„ค ์ฑ„ํƒ
# ๊ฒฐ๊ณผ๊ฐ€ f๋ผ๋ฉด ์œ ์˜์ˆ˜์ค€๋ณด๋‹ค ๋‚ฎ์Œ, ์‹คํ—˜์— ์„ฑ๊ณต, ๋Œ€๋ฆฝ๊ฐ€์„ค ์ฑ„ํƒ


3) ๋‹จ์ผํ‘œ๋ณธ

from scipy import stats
mu = 75
s, p = stats.ttest_1samp(scores, mu, alternative='greater')

# greater ์„ค์ •์ด๋ฏ€๋กœ 75๋ณด๋‹ค ํฐ์ง€ ์‹คํ—˜
# less๋ผ๋ฉด 75๋ณด๋‹ค ์ž‘์€์ง€


2. ์ผ์›๋ฐฐ์น˜, ANOVA F-๊ฒ€์ •

s , p= stats.f_oneway(groupA, groupB, groupC)


3. ์ƒคํ”ผ๋กœ์œŒํฌ

# Shapiro-Wilk ๊ฒ€์ •์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๊ฐ€ ์ •๊ทœ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋Š”์ง€ ๊ฒ€์ฆํ•˜์‹œ์˜ค
# ๊ท€๋ฌด ๊ฐ€์„ค(H0): ๋ฐ์ดํ„ฐ๋Š” ์ •๊ทœ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค.
# ๋Œ€๋ฆฝ ๊ฐ€์„ค(H1): ๋ฐ์ดํ„ฐ๋Š” ์ •๊ทœ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด์ง€ ์•Š๋Š”๋‹ค.
s , p =stats.shapiro(data)
# 0.9768090844154358 0.9676500558853149 T
# ๊ฒฐ๊ณผ๊ฐ€ T์ด๋ฏ€๋กœ ์‹คํ—˜์— ์‹คํŒจ, ์ฆ‰ ๋Œ€๋ฆฝ๊ฐ€์„ค ๊ธฐ๊ฐ, ๊ท€๋ฌด๊ฐ€์„ค์„ ์ฑ„ํƒํ•œ๋‹ค.

 

 

4. ๋กœ์ง€์Šคํ‹ฑํšŒ๊ท€โญ๏ธ( 7ํšŒ ์ž‘์—…ํ˜• 3 ์ œ์ถœ๋จ )

# C(Pclass): C()๋Š” categorical ๋ณ€์ˆ˜๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ, Pclass ๋ณ€์ˆ˜๋ฅผ ๊ฐ ํด๋ž˜์Šค(1, 2, 3)๋กœ ๋‚˜๋ˆ„์–ด ๋ชจ๋ธ์— ํฌํ•จ์‹œํ‚ต๋‹ˆ๋‹ค.
# ์ด๋Ÿฌํ•œ ๋ณ€์ˆ˜๋“ค์ด ์ข…์† ๋ณ€์ˆ˜ Survived์— ์–ด๋–ค ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€๋ฅผ ํ†ต๊ณ„์ ์œผ๋กœ ๋ถ„์„ํ•˜๋Š” ๊ฒƒ์ด ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋ชจํ˜•์˜ ๋ชฉ์ ์ž…๋‹ˆ๋‹ค. 
formula = "Survived ~ C(Pclass) + Gender + SibSp + Parch"

from statsmodels.formula.api import logit
model = logit(formula, data=df).fit()
print(model.params)


5. ์นด์ด์ œ๊ณฑ

# 1) ๊ธฐ๋ณธ ์˜ˆ์ œ
from scipy.stats import chisquare
s, p = chisquare(f_obs=observed_frequencies, f_exp=expected_frequencies)

# 2) ๊ธฐ๋Œ€๋นˆ๋„
result = stats.chi2_contingency(df)
s = result.statistic
p = result.pvalue
dof = result.dof
expected = result.expected_freq[0][0]


6. ํฌ์•„์†ก๋ถ„ํฌ

result = stats.poisson.pmf(5, cust) * 100
result2 = (1 - stats.poisson.cdf(1, cust)) * 100
# ํฌ์•„์†ก ๋ถ„ํฌ์˜ ๋ˆ„์  ๋ถ„ํฌ ํ•จ์ˆ˜ (CDF)๋Š” ํŠน์ • ๊ฐ’๋ณด๋‹ค ์ž‘๊ฑฐ๋‚˜ ๊ฐ™์€ ๊ฐ’์ด ๋ฐœ์ƒํ•  ํ™•๋ฅ ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. 
# ๋”ฐ๋ผ์„œ 1 - stats.poisson.cdf(1, cust)์€ 2๋ช… ์ด์ƒ์˜ ๊ณ ๊ฐ์ด ์žก์ง€๋ฅผ ๊ตฌ๋งคํ•  ํ™•๋ฅ ์„ ๊ตฌํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.


7. ๋ฒ ๋ฅด๋ˆ„์ด-์ดํ•ญ๋ถ„ํฌ

# [์ดํ•ญ๋ถ„ํฌ] 1๋ฒˆ ๋ฌธ์ œ์—์„œ ๊ณ„์‚ฐํ•œ ์„ฑ๊ณต ํ™•๋ฅ ์„ ์‚ฌ์šฉํ•˜์—ฌ, 
# 100๋ฒˆ์˜ ์‹œ๋„ ์ค‘ ์ •ํ™•ํžˆ 60๋ฒˆ ์„ฑ๊ณตํ•  ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜์‹œ์˜ค.
total = len(df)
n = 100
t = 60
success = df['Success'].sum()
result = success/total
result2 = stats.binom.pmf(t, n, result)
# binom.pmf(t, n, result) ํ•จ์ˆ˜๋Š” 
# ์ดํ•ญ ๋ถ„ํฌ์˜ ํ™•๋ฅ  ์งˆ๋Ÿ‰ ํ•จ์ˆ˜(Probability Mass Function, PMF)๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

 

Pixabay๋กœ๋ถ€ํ„ฐ ์ž…์ˆ˜๋œ Pexels๋‹˜์˜ ์ด๋ฏธ์ง€ ์ž…๋‹ˆ๋‹ค.

 

2023.11.29 - [๐Ÿ“‚.๋น…๋ฐ์ดํ„ฐ๋ถ„์„๊ธฐ์‚ฌ] - ๋น…๋ฐ์ดํ„ฐ๋ถ„์„๊ธฐ์‚ฌ ์‹ค๊ธฐ ์ž„์‹œ ์ •๋ฆฌ๋ณธ

2023.12.01 - [๐Ÿ“‚.๋น…๋ฐ์ดํ„ฐ๋ถ„์„๊ธฐ์‚ฌ] - [์‹ค๊ธฐ] ๋น…๋ฐ์ดํ„ฐ ๋ถ„์„ ๊ธฐ์‚ฌ ์ž‘์—…ํ˜• 1 ์ •๋ฆฌ (Python)

2023.12.03 - [๐Ÿ“‚.๋น…๋ฐ์ดํ„ฐ๋ถ„์„๊ธฐ์‚ฌ] - [์‹ค๊ธฐ] ๋น…๋ฐ์ดํ„ฐ ๋ถ„์„ ๊ธฐ์‚ฌ ์ž‘์—…ํ˜• 2 ์ •๋ฆฌ (Python)

728x90