python - Take certain words and print the frequency of each phrase/word? -
i have file has list of bands , album , year produced. need write function go through file , find different names of bands , count how many times each of bands appear in file.
the way file looks this:
beatles - revolver (1966) nirvana - nevermind (1991) beatles - sgt pepper's lonely hearts club band (1967) u2 - joshua tree (1987) beatles - beatles (1968) beatles - abbey road (1969) guns n' roses - appetite destruction (1987) radiohead - ok computer (1997) led zeppelin - led zeppelin 4 (1971) u2 - achtung baby (1991) pink floyd - dark side of moon (1973) michael jackson -thriller (1982) rolling stones - exile on main street (1972) clash - london calling (1979) u2 - can't leave behind (2000) weezer - pinkerton (1996) radiohead - bends (1995) smashing pumpkins - mellon collie , infinite sadness (1995) . . .
the output has in descending order of frequency , this:
band1: number1 band2: number2 band3: number3
here code have far:
def read_albums(filename) : file = open("albums.txt", "r") bands = {} line in file : words = line.split() word in words: if word in '-' : del(words[words.index(word):]) string1 = "" in words : list1 = [] string1 = string1 + + " " list1.append(string1) k in list1 : if (k in bands) : bands[k] = bands[k] +1 else : bands[k] = 1 word in bands : frequency = bands[word] print(word + ":", len(bands))
i think there's easier way this, i'm not sure. also, i'm not sure how sort dictionary frequency, need convert list?
you right, there easier way, counter
:
from collections import counter open('bandfile.txt') f: counts = counter(line.split('-')[0].strip() line in f if line) band, count in counts.most_common(): print("{0}:{1}".format(band, count))
what doing:
line.split('-')[0].strip() line in f
if line
?
this line long form of following loop:
temp_list = [] line in f: if line: # makes sure skip blank lines bits = line.split('-') temp_list.add(bits[0].strip()) counts = counter(temp_list)
unlike loop above - doesn't create intermediary list. instead, creates generator expression - more memory efficient way step through things; used argument counter
.
Comments
Post a Comment