Socially Drained: The Exhausting Impact of Social Interaction on Introverts¶
Analysis of behavioural data set from Kaggle with Random Forest: Extroverts vs. Introverts
In [41]:
import pandas as pd
data = pd.read_csv("personality_dataset.csv")
data.head()
Out[41]:
Time_spent_Alone | Stage_fear | Social_event_attendance | Going_outside | Drained_after_socializing | Friends_circle_size | Post_frequency | Personality | |
---|---|---|---|---|---|---|---|---|
0 | 4.0 | No | 4.0 | 6.0 | No | 13.0 | 5.0 | Extrovert |
1 | 9.0 | Yes | 0.0 | 0.0 | Yes | 0.0 | 3.0 | Introvert |
2 | 9.0 | Yes | 1.0 | 2.0 | Yes | 5.0 | 2.0 | Introvert |
3 | 0.0 | No | 6.0 | 7.0 | No | 14.0 | 8.0 | Extrovert |
4 | 3.0 | No | 9.0 | 4.0 | No | 8.0 | 5.0 | Extrovert |
In [42]:
data.describe()
Out[42]:
Time_spent_Alone | Social_event_attendance | Going_outside | Friends_circle_size | Post_frequency | |
---|---|---|---|---|---|
count | 2900.000000 | 2900.000000 | 2900.000000 | 2900.000000 | 2900.000000 |
mean | 4.505816 | 3.963354 | 3.000000 | 6.268863 | 3.564727 |
std | 3.441180 | 2.872608 | 2.221597 | 4.232340 | 2.893587 |
min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 2.000000 | 2.000000 | 1.000000 | 3.000000 | 1.000000 |
50% | 4.000000 | 3.963354 | 3.000000 | 5.000000 | 3.000000 |
75% | 7.000000 | 6.000000 | 5.000000 | 10.000000 | 6.000000 |
max | 11.000000 | 10.000000 | 7.000000 | 15.000000 | 10.000000 |
In [43]:
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2900 entries, 0 to 2899 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Time_spent_Alone 2900 non-null float64 1 Stage_fear 2900 non-null object 2 Social_event_attendance 2900 non-null float64 3 Going_outside 2900 non-null float64 4 Drained_after_socializing 2900 non-null object 5 Friends_circle_size 2900 non-null float64 6 Post_frequency 2900 non-null float64 7 Personality 2900 non-null object dtypes: float64(5), object(3) memory usage: 181.4+ KB
In [60]:
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
data_encoded = data
categorical_columns = ['Stage_fear', 'Drained_after_socializing', 'Personality']
for column in categorical_columns:
if column in data_encoded.columns:
data_encoded[f'{column}_encoded'] = label_encoder.fit_transform(data_encoded[column])
data_encoded = data_encoded.drop(columns=['Stage_fear', 'Drained_after_socializing', 'Personality'])
corr_matrix = data_encoded.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap="crest", fmt=".2f", linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()
In [61]:
from scipy.stats import pearsonr
corr_coefficient, p_value = pearsonr(data_encoded["Personality_encoded"], data_encoded["Drained_after_socializing_encoded"])
print(f"Pearson-correlation coefficiant: {corr_coefficient}")
print(f"p-value: {p_value}")
Pearson-correlation coefficiant: 0.8453884004502024 p-value: 0.0
In [62]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
X = data_encoded.drop(columns=['Personality_encoded']) # Use all other encoded features
y = data_encoded['Personality_encoded']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train, y_train)
y_pred = rf_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
print(classification_report(y_test, y_pred))
feature_importances = pd.DataFrame(rf_classifier.feature_importances_, index=X.columns, columns=['Importance']).sort_values('Importance', ascending=False)
print(feature_importances)
Accuracy: 0.9103448275862069 precision recall f1-score support 0 0.92 0.92 0.92 463 1 0.90 0.90 0.90 407 accuracy 0.91 870 macro avg 0.91 0.91 0.91 870 weighted avg 0.91 0.91 0.91 870 Importance Drained_after_socializing_encoded 0.205836 Social_event_attendance 0.174983 Stage_fear_encoded 0.156663 Time_spent_Alone 0.156417 Going_outside 0.131922 Post_frequency 0.112619 Friends_circle_size 0.061560
In [63]:
import matplotlib.pyplot as plt
data = pd.DataFrame.from_dict(data)
data.head()
Out[63]:
Time_spent_Alone | Stage_fear | Social_event_attendance | Going_outside | Drained_after_socializing | Friends_circle_size | Post_frequency | Personality | Stage_fear_encoded | Drained_after_socializing_encoded | Personality_encoded | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 4.0 | No | 4.0 | 6.0 | No | 13.0 | 5.0 | Extrovert | 0 | 0 | 0 |
1 | 9.0 | Yes | 0.0 | 0.0 | Yes | 0.0 | 3.0 | Introvert | 1 | 1 | 1 |
2 | 9.0 | Yes | 1.0 | 2.0 | Yes | 5.0 | 2.0 | Introvert | 1 | 1 | 1 |
3 | 0.0 | No | 6.0 | 7.0 | No | 14.0 | 8.0 | Extrovert | 0 | 0 | 0 |
4 | 3.0 | No | 9.0 | 4.0 | No | 8.0 | 5.0 | Extrovert | 0 | 0 | 0 |
In [65]:
import matplotlib.pyplot as plt
introverts = data.query('Personality == "Introvert"')
drained_counts = introverts["Drained_after_socializing"].value_counts()
colors = ['#254c80', '#E9F6FA']
plt.pie(drained_counts, colors=colors, labels=drained_counts.index,
autopct='%1.1f%%', pctdistance=0.85)
centre_circle = plt.Circle((0, 0), 0.50, fc='white')
fig = plt.gcf()
fig.gca().add_artist(centre_circle)
plt.title('Drained after Socializing for Introverts')
plt.show()
Conclusion¶
A significant majority of introverts, approximately 92%, experience feelings of being drained after socializing. Being "drained after Socializing" is the most influential factor in predicting personality types. The high accuracy and precision of the model further validate the robustness of this conclusion. These results align with common perceptions and psychological insights about introversion, highlighting the substantial impact of social interactions on introverts' energy levels.