Talkin’ ‘Bout Trucks, Beer, and Love in Country Songs ― Analyzing Genius Lyri ...

Trucks, beer, and love, all things that make country music go round.I’ve said before that country music is just pop music with a slide, and thenlyrics about slightly different topics than what you’ll hear in hip hop or “normal” pop music on the radio.

In my continuing quest to validate my theory that all country songs can fit into one of four different topics, in this post, I go through lyrics to see which artiststalk about trucks, beer, and love the most. In my firstpost on this topic , I talked about how to get song lyrics from genius and print them out on the command line.

The goal here, and what I’m going to walk you through, is how I stored stored info and lyrics for all the songs for the country artists, how I made sure that all the lyrics were unique, and then ran some stats on the songs.Another note before we go is thata lot of data work is just janitorial. The actual code for getting“interesting” results is fairly simple. The key it to enjoy doing the janitor-style coding and then you’ll be good.

If you’re interested in which country music people talk most about trucks, beer, alcohol, or small towns, skip to the end where I list out some stats. For the rest, here’s some code.

Talkin’ ‘Bout Trucks, Beer, and Love in Country Songs ― Analyzing Genius Lyri ...

I wonder how they feel about beer trucks. I’m guessing they’d all be fans of them.

Step 1 ― Save the Lyrics!

When doing anything with web scraping, the one thing to always, always keep in mind here, is that you want to avoid hitting the server for as little as possible. With that in mind, we’re going to do here is assume the inputs are names of artists. For each of those artists, find all of their songs, and then for each of those songs, grab the lyrics in the way that I did in the first post, and then save them locally along with some meta information the API provides.

Now when I post the following code, don’t imagine that I knew what I wanted . Everything in here was created iteratively. Here’s a list of all the features of this piece of code does that were created iteratively.

Directory structure― Within the folder that contains the main .py file, there’s a folder named artists. And within that folder, when the code runs, a folder with the artist’s name is created (if not already). And within that folder, there are two more folders, info and lyrics. When we run the code, I put the lyrics in /artists/artist_name/lyrics/Song Title.txt and the info from the API, containing information about the song, like annotations, title, and song API id so we can grab it again if need be, in the file/artists/artist_name/info/Song Title.txt. The key, again, being saving all the info given to avoid unnecessary requests.

Redundancy Checking― Along with making sure to save all the info given, if we run an artist for the second time, we don’t want to get lyrics that we already have. So once we have all the songs for that artist, I run a check to see if we have a file with the name of the song already, and that the file isn’t empty. If the file is there, we continue to the next song.

Lyric Error Checking― Ahh unicode. While great for allowing multitudes ofdifferent characters rather than the standard English alphabet along with a few specialty characters, they’re not ideal when I’m trying to deal with simple song lyrics. And when saving the lyrics, I encountered more than a few random, unnecessary characters that python threw errors for encoding problems. In a semi-janky rule-based solution (which isn’t great to use, see below), when I saw these errors being thrown, I would specifically replace them with the correct “normal” character. I assume there’s some library out there that would take care of all the encoding issues, but this worked for me. Also, on Genius’s end, it would be sweet if they, you know, checked for abnormal characters when lyrics were uploaded and didn’t have them in the first place. Also would be cool if they included the lyrics in the API.

def clean_lyrics(lyrics): lyrics = lyrics.replace(u"\u2019", "'") #right quotation mark lyrics = lyrics.replace(u"\u2018", "'") #left quotation mark lyrics = lyrics.replace(u"\u02bc", "'") #a with dots on top lyrics = lyrics.replace(u"\xe9", "e") #e with an accent lyrics = lyrics.replace(u"\xe8", "e") #e with an backwards accent lyrics = lyrics.replace(u"\xe0", "a") #a with an accent lyrics = lyrics.replace(u"\u2026", "...") #ellipsis apparently lyrics = lyrics.replace(u"\u2012", "-") #hyphen or dash lyrics = lyrics.replace(u"\u2013", "-") #other type of hyphen or dash lyrics = lyrics.replace(u"\u2014", "-") #other type of hyphen or dash lyrics = lyrics.replace(u"\u201c", '"') #left double quote lyrics = lyrics.replace(u"\u201d", '"') #right double quote lyrics = lyrics.replace(u"\u200b", ' ') #zero width space ? lyrics = lyrics.replace(u"\x92", "'") #different quote lyrics = lyrics.replace(u"\x91", "'") #still different quote lyrics = lyrics.replace(u"\xf1", "n") #n with tilde! lyrics = lyrics.replace(u"\xed", "i") #i with accent lyrics = lyrics.replace(u"\xe1", "a") #a with accent lyrics = lyrics.replace(u"\xea", "e") #e with circumflex lyrics = lyrics.replace(u"\xf3", "o") #o with accent lyrics = lyrics.replace(u"\xb4", "") #just an accent, so remove lyrics = lyrics.replace(u"\xeb", "e") #e with dots on top lyrics = lyrics.replace(u"\xe4", "a") #a with dots on top lyrics = lyrics.replace(u"\xe7", "c") #c with squigly bottom return lyrics

Check out the most of themain function below. If you’re looking for the actual full file, check out this gist. It’s easier to post that on Github than formatting the entire thing here.

def song_ids_already_scraped(artist_folder_path, force=False): #check for ids already scraped so we don't redo if force: return [] song_ids = [] files = os.listdir(artist_folder_path) for file_name in files: dot_split = file_name.split('.') #sometimes the file is empty, we don't want to include if that's the case if dot_split[1] == 'txt': try: song_id = dot_split[0].split("_")[-1] if os.path.getsize(artist_folder_path + '/' + file_name) != 0: song_ids.append(song_id) except: pass return song_ids def info_from_song_api_path(song_api_path): song_url = base_url + song_api_path response = requests.get(song_url, headers=headers) json = response.json() return json def songs_from_artist_api_path(artist_api_path): api_paths = [] artist_url = base_url + artist_api_path + "/songs" data = {"per_page": 50} while True: response = requests.get(artist_url, data=data, headers=headers) json = response.json() songs = json["response"]["songs"] for song in songs: api_paths.append(song["api_path"]) if len(songs) < 50: break #no more songs for artist else: if "page" in data: data["page"] = data["page"] + 1 else: data["page"] = 1 return list(set(api_paths)) if __name__ == "__main__": for artist_name in artist_names: #setting up path to artist's directories artist_folder_path = "artists/%s" % artist_name.replace(' ', '_').lower() artist_lyrics_path = "%s/lyrics" % artist_folder_path artist_info_path = "%s/info" % artist_folder_path if not os.path.exists(artist_folder_path): os.makedirs(artist_folder_path) if not os.path.exists(artist_lyrics_path): os.makedirs(artist_lyrics_path) if not os.path.exists(artist_info_path): os.makedirs(artist_info_path) #only using lyrics since they're saved second prev_song_ids = song_ids_already_scraped(artist_lyrics_path) #find the artist! search_url = base_url + "/search" data = {'q': artist_name} response = requests.get(search

Talkin’ ‘Bout Trucks, Beer, and Love in Country Songs ― Analyzing Genius Lyri ...

Trending Articles

SM3268AB 8CE三星量产无法格式化

[下载工具]Think4V utubedown(Youtube高清视频下载工具) v2.1.6 官方版2.1.3

出售: SINE Othello 電源線

博讯｜张磊帮助下，李源潮的儿子被耶鲁录取

FullEventLogView 1.73 免安裝中文版 - 事件檢視器取代工具

同門四角戀？李沛旭喇舌「小郭雪芙」曾智希，蔡淑臻拍完婚紗...怒毀婚

五代RAV4 降車身（機械車位因素）

[攻略] 《魔獸世界》6.2.2 白色魚人蛋再現！來去收編魚人寶寶特基！

jetBrains Product crack 2024 Java based

2013 KUGA 6G轉動方向盤會聽到摳摳摳的異音，有人知道原因嗎?

【豌豆字幕組】[藥屋少女的呢喃（藥師少女的獨語）/ Kusuriya no Hitorigoto][25][繁體][1080P][MP4]

好用的照片后期处理软件【DxO PhotoLab Elite 5.4.0.4765 (x64) 多语言便携版】..

出售: Thixar Silence Plus 啫喱板

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

三條崙討海人故事…重建烏倉寮憶43年前船難

致喬立建設道歉聲明

[一般] 神州全地圖掉寶資料

方易通7862 8/128G 無360 刷機

動感校園小記者・瑪利諾修院學校｜採訪王瑋駿陳晞文帶領試玩風帆

有藍電流行車紀錄器分享文嗎