تطبيقات

تصنيف التغريدات

لدينا ثلاثة ملفات:

  1. الأول يحوي تغريدات
  2. الثاني يحوي كلمات إيجابية
  3. الثالث يحوي كلمات سلبية

نريد تصنيف التغريدات بناءً على عدد الكلمات الإيجابية والسلبية فيها.

from pathlib import Path

data_dir = Path('../../datasets/tweets')

بعد النظر في شكل الملفات. يتبين لنا أن السطر الواحد يحتوي على “شيء” واحد. وبالتالي سنقسم البيانات على كل سطر.

ونلاحظ أيضًا وجود أسطر خاوية، يجب التخلص منها.

def read_lines(file_path: Path) -> list[str]:
    """skips empty lines and converts to lowercase"""
    result = []
    with open(file_path, 'r') as file:
        for line in file:
            x = line.strip()
            if len(x) > 0:
                x = x.lower()
                result.append(x)
    return result

نستدعي الفعل لقراءة الملفات الثلاث:

tweets = read_lines(data_dir / 'tweets.txt')
positive_words = read_lines(data_dir / 'words_positive.txt')
negative_words = read_lines(data_dir / 'words_negative.txt')
for i, tweet in enumerate(tweets):
    print(i, tweet)
0 grateful for the amazing people in my life who make it so wonderful
1 the beautiful flowers were in full bloom and the sweet scent filled the air. the birds were singing merrily and the sun was shining brightly. it was a perfect day
2 the kind and generous old man was always willing to help others. he was a role model for the entire community and he was loved by everyone
3 the intelligent and talented young woman had a bright future ahead of her. she was passionate about her work and she was determined to make a difference in the world
4 sending out good vibes to everyone today! have a beautiful day!
5 i've been feeling so anxious and stressed lately with everything going on. really need a break from it all.
6 traffic was the worst it's ever been today. the long commute just made me feel grumpy and drained.
7 had an awful morning - overslept and missed my first meeting. feeling uneasy about how my boss will react.
8 all the noise from the construction outside is driving me nuts. it's making working from home dreadful.
9 went out with some friends last night but didn't have as much fun as i thought. felt a bit lonely and left early feeling uneasy.
for p in positive_words:
    print(p)
grateful
amazing
wonderful
beautiful
sweet
perfect
kind
generous
help
loved
intelligent
talented
young
bright
passionate
determined
good
beautiful
friends
fun
happy
for n in negative_words:
    print(n)
anxious
stressed
worst
grumpy
drained
awful
overslept
missed
uneasy
boss
noise
dreadful
lonely
sad

الآن نقوم بتصنيف التغريدات:

def classify(text: str) -> (int, int):
    """Returns the number of positive and negative words in the text"""
    
    positive_count = 0
    for word in positive_words:
        if word in text:
            positive_count += 1
    
    negative_count = 0
    for word in negative_words:
        if word in text:
            negative_count += 1
    
    return positive_count, negative_count

assert classify('i am happy') == (1, 0)
assert classify('i am sad') == (0, 1)
assert classify('i am happy and sad') == (1, 1)
assert classify('i shall walk to the store') == (0, 0)

نستدعي الفعل على التغريدات، ونفرز النتائج في ثلاثة قوائم:

positive_tweets = []
negative_tweets = []
neutral_tweets = []
for tweet in tweets:
    pos, neg = classify(tweet)
    score = pos - neg
    print(f'-{neg} +{pos} = {score:+}')
    if score > 0:
        positive_tweets.append((tweet, score))
    elif score < 0:
        negative_tweets.append((tweet, score))
    else:
        neutral_tweets.append((tweet, score))
-0 +3 = +3
-0 +5 = +5
-0 +4 = +4
-0 +6 = +6
-0 +3 = +3
-2 +0 = -2
-3 +0 = -3
-5 +0 = -5
-2 +0 = -2
-2 +2 = +0

نعرض التغريدات بحسب التصنيف مرتبة بقوة التصنيف من الأعلى إلى الأسفل:

print('positive_tweets:')
for tweet, score in sorted(positive_tweets, key=lambda x: x[1], reverse=True):
    print(f'{score:+} {tweet}')
positive_tweets:
+6 the intelligent and talented young woman had a bright future ahead of her. she was passionate about her work and she was determined to make a difference in the world
+5 the beautiful flowers were in full bloom and the sweet scent filled the air. the birds were singing merrily and the sun was shining brightly. it was a perfect day
+4 the kind and generous old man was always willing to help others. he was a role model for the entire community and he was loved by everyone
+3 grateful for the amazing people in my life who make it so wonderful
+3 sending out good vibes to everyone today! have a beautiful day!
print('negative_tweets:')
for tweet, score in sorted(negative_tweets, key=lambda x: x[1]):
    print(f'{score:+} {tweet}')
negative_tweets:
-5 had an awful morning - overslept and missed my first meeting. feeling uneasy about how my boss will react.
-3 traffic was the worst it's ever been today. the long commute just made me feel grumpy and drained.
-2 i've been feeling so anxious and stressed lately with everything going on. really need a break from it all.
-2 all the noise from the construction outside is driving me nuts. it's making working from home dreadful.
print('neutral_tweets:')
for tweet, score in neutral_tweets:
    print(f'{score} {tweet}')
neutral_tweets:
0 went out with some friends last night but didn't have as much fun as i thought. felt a bit lonely and left early feeling uneasy.