[Kaggle] 아마존 리뷰 분석 #02(EDA, 감정분석)

01. 시각화

리뷰 평점 시각화
contraints로 pie chart 색상 구분
- 5.0점대 평점 비율이 79.8%로 가장 높음

# 리뷰 평점 확인
constraints = ['#4682B4', '#FF6347', '#32CD32', '#FFD700', '#8A2BE2']

def categorical_variable_summary(df, column_name):
    plt.figure(figsize=(10, 5))

    # Countplot
    plt.subplot(1, 2, 1)
    df[column_name].value_counts().plot(kind='bar', color='skyblue')
    plt.title('Countplot')

    # Percentages
    plt.subplot(1, 2, 2)
    df[column_name].value_counts().plot(kind='pie', autopct='%1.1f%%', startangle=90, colors=constraints)
    plt.title('Percentages')

    plt.tight_layout()
    plt.show()
    
    
# 리뷰 평점 시각화
categorical_variable_summary(df,'overall')

02. 감정 분석

리뷰 내용 텍스트 확인
- 소문자로 변환

# 리뷰 내용 텍스트 데이터 추출 및 소문자 변환
rt = lambda x: re.sub("[^a-zA-Z]",' ',str(x))
df["reviewText"] = df["reviewText"].map(rt)
df["reviewText"] = df["reviewText"].str.lower()

감정 분석
- 긍정이 81.3%로 가장 높음

# Sentiment 분석
# polarity: 텍스트의 긍정/부정 정도
# subjectivity: 텍스트의 주관적 정도
df[['polarity', 'subjectivity']] = df['reviewText'].apply(lambda Text: pd.Series(TextBlob(Text).sentiment))

analyzer = SentimentIntensityAnalyzer()

for index, row in df['reviewText'].items():
    score = analyzer.polarity_scores(row)

    neg = score['neg']
    neu = score['neu']
    pos = score['pos']
    if neg > pos:
        df.loc[index, 'sentiment'] = "Negative"
    elif pos > neg:
        df.loc[index, 'sentiment'] = "Positive"
    else:
        df.loc[index, 'sentiment'] = "neutral"
        
        
# 감정분석 시각화
categorical_variable_summary(df,'sentiment')

03. 워드 클라우드

감정 분석 결과 기반으로 워드 클라우드 생성

# 감정별 워드클라우드
def plot_wordcloud(sentiment, df):
    reviews = df[df['sentiment'] == sentiment]['reviewText'].str.cat(sep=' ')

    wordcloud = WordCloud(width = 800, height = 800,
                background_color ='white',
                stopwords = None,
                min_font_size = 10).generate(reviews)

    plt.figure(figsize = (8, 8), facecolor = None)
    plt.imshow(wordcloud)
    plt.axis("off")
    plt.tight_layout(pad = 0)
    plt.title(f'{sentiment.capitalize()} Reviews WordCloud')
    plt.show()

# 긍정 워드클라우드 시각화
plot_wordcloud('Positive', df)

# 부정 워드클라우드 시각화
plot_wordcloud('Negative', df)

# 중립 워드클라우드 시각화
plot_wordcloud('neutral', df)

📌 참고.

https://www.kaggle.com/code/tarkkaanko/amazon-review-sentiment-analysis

👀 Amazon - Review Sentiment Analysis 🐳

Explore and run machine learning code with Kaggle Notebooks | Using data from amazon reviews for sentiment analysis

www.kaggle.com

🗂️ 데이터셋.
https://www.kaggle.com/datasets/tarkkaanko/amazon

728x90

저작자표시 비영리 동일조건 (새창열림)

'Study > Kaggle' 카테고리의 다른 글

[Kaggle] 통신사 이탈 고객 예측 #01(데이터 불러오기, 전처리) (0)	2024.06.27
[Kaggle] 아마존 리뷰 분석 #03(TF-IDF, Topic-Modeling) (0)	2024.06.23
[Kaggle] 아마존 리뷰 분석 #01(데이터 불러오기, 전처리) (0)	2024.06.21
[Kaggle] 회사 평점 예측 #04(예측 모델) (0)	2024.06.20
[Kaggle] 회사 평점 예측 #03(감정 분석) (0)	2024.06.19

Home

[Kaggle] 아마존 리뷰 분석 #02(EDA, 감정분석)

목차

01. 시각화

02. 감정 분석

03. 워드 클라우드

'Study > Kaggle' 카테고리의 다른 글

티스토리툴바

[Kaggle] 아마존 리뷰 분석 #02(EDA, 감정분석)

목차

01. 시각화

02. 감정 분석

03. 워드 클라우드

'Study > Kaggle' 카테고리의 다른 글

관련글

티스토리툴바