Socially Drained: The Exhausting Impact of Social Interaction on Introverts¶

Analysis of behavioural data set from Kaggle with Random Forest: Extroverts vs. Introverts

In [41]:
import pandas as pd

data = pd.read_csv("personality_dataset.csv")
data.head()
Out[41]:
Time_spent_Alone Stage_fear Social_event_attendance Going_outside Drained_after_socializing Friends_circle_size Post_frequency Personality
0 4.0 No 4.0 6.0 No 13.0 5.0 Extrovert
1 9.0 Yes 0.0 0.0 Yes 0.0 3.0 Introvert
2 9.0 Yes 1.0 2.0 Yes 5.0 2.0 Introvert
3 0.0 No 6.0 7.0 No 14.0 8.0 Extrovert
4 3.0 No 9.0 4.0 No 8.0 5.0 Extrovert
In [42]:
data.describe()
Out[42]:
Time_spent_Alone Social_event_attendance Going_outside Friends_circle_size Post_frequency
count 2900.000000 2900.000000 2900.000000 2900.000000 2900.000000
mean 4.505816 3.963354 3.000000 6.268863 3.564727
std 3.441180 2.872608 2.221597 4.232340 2.893587
min 0.000000 0.000000 0.000000 0.000000 0.000000
25% 2.000000 2.000000 1.000000 3.000000 1.000000
50% 4.000000 3.963354 3.000000 5.000000 3.000000
75% 7.000000 6.000000 5.000000 10.000000 6.000000
max 11.000000 10.000000 7.000000 15.000000 10.000000
In [43]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2900 entries, 0 to 2899
Data columns (total 8 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Time_spent_Alone           2900 non-null   float64
 1   Stage_fear                 2900 non-null   object 
 2   Social_event_attendance    2900 non-null   float64
 3   Going_outside              2900 non-null   float64
 4   Drained_after_socializing  2900 non-null   object 
 5   Friends_circle_size        2900 non-null   float64
 6   Post_frequency             2900 non-null   float64
 7   Personality                2900 non-null   object 
dtypes: float64(5), object(3)
memory usage: 181.4+ KB
In [60]:
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()
data_encoded = data

categorical_columns = ['Stage_fear', 'Drained_after_socializing', 'Personality']
for column in categorical_columns:
    if column in data_encoded.columns:
        data_encoded[f'{column}_encoded'] = label_encoder.fit_transform(data_encoded[column])

data_encoded = data_encoded.drop(columns=['Stage_fear', 'Drained_after_socializing', 'Personality'])

corr_matrix = data_encoded.corr()

plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap="crest", fmt=".2f", linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()
No description has been provided for this image
In [61]:
from scipy.stats import pearsonr

corr_coefficient, p_value = pearsonr(data_encoded["Personality_encoded"], data_encoded["Drained_after_socializing_encoded"])

print(f"Pearson-correlation coefficiant: {corr_coefficient}")
print(f"p-value: {p_value}")
Pearson-correlation coefficiant: 0.8453884004502024
p-value: 0.0
In [62]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

X = data_encoded.drop(columns=['Personality_encoded'])  # Use all other encoded features
y = data_encoded['Personality_encoded']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train, y_train)

y_pred = rf_classifier.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')

print(classification_report(y_test, y_pred))

feature_importances = pd.DataFrame(rf_classifier.feature_importances_, index=X.columns, columns=['Importance']).sort_values('Importance', ascending=False)
print(feature_importances)
Accuracy: 0.9103448275862069
              precision    recall  f1-score   support

           0       0.92      0.92      0.92       463
           1       0.90      0.90      0.90       407

    accuracy                           0.91       870
   macro avg       0.91      0.91      0.91       870
weighted avg       0.91      0.91      0.91       870

                                   Importance
Drained_after_socializing_encoded    0.205836
Social_event_attendance              0.174983
Stage_fear_encoded                   0.156663
Time_spent_Alone                     0.156417
Going_outside                        0.131922
Post_frequency                       0.112619
Friends_circle_size                  0.061560
In [63]:
import matplotlib.pyplot as plt

data = pd.DataFrame.from_dict(data)

data.head()
Out[63]:
Time_spent_Alone Stage_fear Social_event_attendance Going_outside Drained_after_socializing Friends_circle_size Post_frequency Personality Stage_fear_encoded Drained_after_socializing_encoded Personality_encoded
0 4.0 No 4.0 6.0 No 13.0 5.0 Extrovert 0 0 0
1 9.0 Yes 0.0 0.0 Yes 0.0 3.0 Introvert 1 1 1
2 9.0 Yes 1.0 2.0 Yes 5.0 2.0 Introvert 1 1 1
3 0.0 No 6.0 7.0 No 14.0 8.0 Extrovert 0 0 0
4 3.0 No 9.0 4.0 No 8.0 5.0 Extrovert 0 0 0
In [65]:
import matplotlib.pyplot as plt

introverts = data.query('Personality == "Introvert"')

drained_counts = introverts["Drained_after_socializing"].value_counts()

colors = ['#254c80', '#E9F6FA']

plt.pie(drained_counts, colors=colors, labels=drained_counts.index,
        autopct='%1.1f%%', pctdistance=0.85)

centre_circle = plt.Circle((0, 0), 0.50, fc='white')
fig = plt.gcf()

fig.gca().add_artist(centre_circle)

plt.title('Drained after Socializing for Introverts')

plt.show()
No description has been provided for this image

Conclusion¶

A significant majority of introverts, approximately 92%, experience feelings of being drained after socializing. Being "drained after Socializing" is the most influential factor in predicting personality types. The high accuracy and precision of the model further validate the robustness of this conclusion. These results align with common perceptions and psychological insights about introversion, highlighting the substantial impact of social interactions on introverts' energy levels.