import pandas as pd
import nltk
from nltk.corpus import stopwords as nltk_stopwords
from sklearn.feature_extraction.text import TfidfVectorizer 

data = pd.read_csv("/datasets/tweets_lemm.csv")
corpus = data['lemm_text'].values.astype('U')

nltk.download('stopwords')
stopwords = set(nltk_stopwords.words('russian'))

count_tf_idf = TfidfVectorizer(stop_words=stopwords)
tf_idf = count_tf_idf.fit_transform(corpus) 

print("Размер матрицы:", tf_idf.shape)

Результат

[nltk_data] Downloading package stopwords to
[nltk_data]     /home/student/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package stopwords to
[nltk_data]     /home/student/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
Размер матрицы: (5000, 9248)

Получили матрицу со значениями TF-IDF. Признаков столько же, сколько было у мешка слов. Всё верно: слова те же!