728x90
๐๏ธ ๋ฐ์ดํฐ์ .
https://www.kaggle.com/datasets/tarkkaanko/amazon
1. ์๊ฐํ
- ๋ฆฌ๋ทฐ ํ์ ์๊ฐํ
- contraints๋ก pie chart ์์ ๊ตฌ๋ถ
- 5.0์ ๋ ํ์ ๋น์จ์ด 79.8%๋ก ๊ฐ์ฅ ๋์
# ๋ฆฌ๋ทฐ ํ์ ํ์ธ
constraints = ['#4682B4', '#FF6347', '#32CD32', '#FFD700', '#8A2BE2']
def categorical_variable_summary(df, column_name):
plt.figure(figsize=(10, 5))
# Countplot
plt.subplot(1, 2, 1)
df[column_name].value_counts().plot(kind='bar', color='skyblue')
plt.title('Countplot')
# Percentages
plt.subplot(1, 2, 2)
df[column_name].value_counts().plot(kind='pie', autopct='%1.1f%%', startangle=90, colors=constraints)
plt.title('Percentages')
plt.tight_layout()
plt.show()
# ๋ฆฌ๋ทฐ ํ์ ์๊ฐํ
categorical_variable_summary(df,'overall')
2. ๊ฐ์ ๋ถ์
- ๋ฆฌ๋ทฐ ๋ด์ฉ ํ
์คํธ ํ์ธ
- ์๋ฌธ์๋ก ๋ณํ
# ๋ฆฌ๋ทฐ ๋ด์ฉ ํ
์คํธ ๋ฐ์ดํฐ ์ถ์ถ ๋ฐ ์๋ฌธ์ ๋ณํ
rt = lambda x: re.sub("[^a-zA-Z]",' ',str(x))
df["reviewText"] = df["reviewText"].map(rt)
df["reviewText"] = df["reviewText"].str.lower()
- ๊ฐ์ ๋ถ์
- ๊ธ์ ์ด 81.3%๋ก ๊ฐ์ฅ ๋์
# Sentiment ๋ถ์
# polarity: ํ
์คํธ์ ๊ธ์ /๋ถ์ ์ ๋
# subjectivity: ํ
์คํธ์ ์ฃผ๊ด์ ์ ๋
df[['polarity', 'subjectivity']] = df['reviewText'].apply(lambda Text: pd.Series(TextBlob(Text).sentiment))
analyzer = SentimentIntensityAnalyzer()
for index, row in df['reviewText'].items():
score = analyzer.polarity_scores(row)
neg = score['neg']
neu = score['neu']
pos = score['pos']
if neg > pos:
df.loc[index, 'sentiment'] = "Negative"
elif pos > neg:
df.loc[index, 'sentiment'] = "Positive"
else:
df.loc[index, 'sentiment'] = "neutral"
# ๊ฐ์ ๋ถ์ ์๊ฐํ
categorical_variable_summary(df,'sentiment')
3. ์๋ ํด๋ผ์ฐ๋
- ๊ฐ์ ๋ถ์ ๊ฒฐ๊ณผ ๊ธฐ๋ฐ์ผ๋ก ์๋ ํด๋ผ์ฐ๋ ์์ฑ
# ๊ฐ์ ๋ณ ์๋ํด๋ผ์ฐ๋
def plot_wordcloud(sentiment, df):
reviews = df[df['sentiment'] == sentiment]['reviewText'].str.cat(sep=' ')
wordcloud = WordCloud(width = 800, height = 800,
background_color ='white',
stopwords = None,
min_font_size = 10).generate(reviews)
plt.figure(figsize = (8, 8), facecolor = None)
plt.imshow(wordcloud)
plt.axis("off")
plt.tight_layout(pad = 0)
plt.title(f'{sentiment.capitalize()} Reviews WordCloud')
plt.show()
# ๊ธ์ ์๋ํด๋ผ์ฐ๋ ์๊ฐํ
plot_wordcloud('Positive', df)
# ๋ถ์ ์๋ํด๋ผ์ฐ๋ ์๊ฐํ
plot_wordcloud('Negative', df)
# ์ค๋ฆฝ ์๋ํด๋ผ์ฐ๋ ์๊ฐํ
plot_wordcloud('neutral', df)
๐ ์ฐธ๊ณ .
https://www.kaggle.com/code/tarkkaanko/amazon-review-sentiment-analysis
๐ Amazon - Review Sentiment Analysis ๐ณ
Explore and run machine learning code with Kaggle Notebooks | Using data from amazon reviews for sentiment analysis
www.kaggle.com
728x90
'Study > Kaggle' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[Kaggle] ์๋ง์กด ๋ฆฌ๋ทฐ ๋ถ์ #01(๋ฐ์ดํฐ ๋ถ๋ฌ์ค๊ธฐ, ์ ์ฒ๋ฆฌ) (0) | 2024.06.21 |
---|---|
[Kaggle] ์ฃผํ ๊ฐ๊ฒฉ ์์ธก EDA #3(๊ฒฐ์ธก์น, ์ด์์น ์ฒ๋ฆฌ) (0) | 2024.01.02 |
[Kaggle] ์ฃผํ ๊ฐ๊ฒฉ ์์ธก EDA #2(์๊ฐํ) (0) | 2023.12.29 |
[Kaggle] ์ฃผํ ๊ฐ๊ฒฉ ์์ธก EDA #1(Kaggle ๋ฐ์ดํฐ ๋ถ๋ฌ์ค๊ธฐ, EDA) (0) | 2023.12.27 |
[Kaggle] ๊ธฐ์ด๋ฌธ๋ฒ (0) | 2023.11.14 |