What is a Hashtable/Hashmap?

A hashtable is a data structure that with a collection of key-value pairs, where each key maps to a value, and the keys must be unique and hashable.

  • In Python there is a built in hashtable known as a dictionary___.

The primary purpose of a hashtable is to provide efficient lookup, insertion, and deletion operations. When an element is to be inserted into the hashtable, a hash function is used to map the key to a specific index in the underlying array that is used to store the key-value pairs. The value is then stored at that index. When searching for a value, the hash function is used again to find the index where the value is stored.

The key advantage of a hashtable over other data structures like arrays and linked lists is its average-case time complexity for lookup, insertion, and deletion operations.

  • The typical time complexity of a hashtable is _O(1)__.

What is Hashing and Collision?

Hashing is the process of mapping a given key to a value in a hash table or hashmap, using a hash function. The hash function takes the key as input and produces a hash value or hash code, which is then used to determine the index in the underlying array where the value is stored. The purpose of hashing is to provide a quick and efficient way to access data, by eliminating the need to search through an entire data structure to find a value.

However, it is possible for two different keys to map to the same hash value, resulting in a collision. When a collision occurs, there are different ways to resolve it, depending on the collision resolution strategy used.

Python's dictionary implementation is optimized to handle collisions efficiently, and the performance of the dictionary is generally very good, even in the presence of collisions. However, if the number of collisions is very high, the performance of the dictionary can degrade, so it is important to choose a good hash function that minimizes collisions when designing a Python dictionary.

What is a Set?

my_set = set([1, 2, 3, 2, 1])
print(my_set)  

# What do you notice in the output?
# the output prints out 1,2,3, and doesn't print out the other 2 and 1 in the dictionary
# the numbers are printed in a dictionary 

# Why do you think Sets are in the same tech talk as Hashmaps/Hashtables?
# keys must be unique and hashable, no duplicates 
# sets and hashmaps have unique values 
{1, 2, 3}
lover_album = {
    "title": "Lover",
    "artist": "Taylor Swift",
    "year": 2019,
    "genre": ["Pop", "Synth-pop"],
    "tracks": {
        1: "I Forgot That You Existed",
        2: "Cruel Summer",
        3: "Lover",
        4: "The Man",
        5: "The Archer",
        6: "I Think He Knows",
        7: "Miss Americana & The Heartbreak Prince",
        8: "Paper Rings",
        9: "Cornelia Street",
        10: "Death By A Thousand Cuts",
        11: "London Boy",
        12: "Soon You'll Get Better (feat. Dixie Chicks)",
        13: "False God",
        14: "You Need To Calm Down",
        15: "Afterglow",
        16: "Me! (feat. Brendon Urie of Panic! At The Disco)",
        17: "It's Nice To Have A Friend",
        18: "Daylight"
    }
}

# What data structures do you see?
# lists
# dictionaries 

# Printing the dictionary
print(lover_album)
{'title': 'Lover', 'artist': 'Taylor Swift', 'year': 2019, 'genre': ['Pop', 'Synth-pop'], 'tracks': {1: 'I Forgot That You Existed', 2: 'Cruel Summer', 3: 'Lover', 4: 'The Man', 5: 'The Archer', 6: 'I Think He Knows', 7: 'Miss Americana & The Heartbreak Prince', 8: 'Paper Rings', 9: 'Cornelia Street', 10: 'Death By A Thousand Cuts', 11: 'London Boy', 12: "Soon You'll Get Better (feat. Dixie Chicks)", 13: 'False God', 14: 'You Need To Calm Down', 15: 'Afterglow', 16: 'Me! (feat. Brendon Urie of Panic! At The Disco)', 17: "It's Nice To Have A Friend", 18: 'Daylight'}}
print(lover_album.get('tracks'))
# or
print(lover_album['tracks'])
{1: 'I Forgot That You Existed', 2: 'Cruel Summer', 3: 'Lover', 4: 'The Man', 5: 'The Archer', 6: 'I Think He Knows', 7: 'Miss Americana & The Heartbreak Prince', 8: 'Paper Rings', 9: 'Cornelia Street', 10: 'Death By A Thousand Cuts', 11: 'London Boy', 12: "Soon You'll Get Better (feat. Dixie Chicks)", 13: 'False God', 14: 'You Need To Calm Down', 15: 'Afterglow', 16: 'Me! (feat. Brendon Urie of Panic! At The Disco)', 17: "It's Nice To Have A Friend", 18: 'Daylight'}
{1: 'I Forgot That You Existed', 2: 'Cruel Summer', 3: 'Lover', 4: 'The Man', 5: 'The Archer', 6: 'I Think He Knows', 7: 'Miss Americana & The Heartbreak Prince', 8: 'Paper Rings', 9: 'Cornelia Street', 10: 'Death By A Thousand Cuts', 11: 'London Boy', 12: "Soon You'll Get Better (feat. Dixie Chicks)", 13: 'False God', 14: 'You Need To Calm Down', 15: 'Afterglow', 16: 'Me! (feat. Brendon Urie of Panic! At The Disco)', 17: "It's Nice To Have A Friend", 18: 'Daylight'}
print(lover_album.get('tracks')[4]) # two types of calls 
# or
print(lover_album['tracks'][4]) # index, name of key to retrieve the value 
The Man
The Man
lover_album["producer"] = set(['Taylor Swift', 'Jack Antonoff', 'Joel Little', 'Taylor Swift', 'Louis Bell', 'Frank Dukes'])
# Printing the dictionary
print(lover_album)

# What can you change to make sure there are no duplicate producers?
# use set 
#
{'title': 'Lover', 'artist': 'Taylor Swift', 'year': 2019, 'genre': ['Pop', 'Synth-pop'], 'tracks': {1: 'I Forgot That You Existed', 2: 'Cruel Summer', 3: 'Lover', 4: 'The Man', 5: 'The Archer', 6: 'I Think He Knows', 7: 'Miss Americana & The Heartbreak Prince', 8: 'Paper Rings', 9: 'Cornelia Street', 10: 'Death By A Thousand Cuts', 11: 'London Boy', 12: "Soon You'll Get Better (feat. Dixie Chicks)", 13: 'False God', 14: 'You Need To Calm Down', 15: 'Afterglow', 16: 'Me! (feat. Brendon Urie of Panic! At The Disco)', 17: "It's Nice To Have A Friend", 18: 'Daylight'}, 'producer': {'Taylor Swift', 'Louis Bell', 'Joel Little', 'Jack Antonoff', 'Frank Dukes'}}
lover_album["tracks"].update({19: "All Of The Girls You Loved Before"})
# Printing the dictionary

# How would add an additional genre to the dictionary, like electropop? 
# add electropop to "genre": ["Pop", "Synth-pop"],
# or i would append: 
# Access the genre list in the lover_album dictionary
genres = lover_album["genre"]

# Append "Electropop" to the end of the list
genres.append("Electropop")

print(lover_album)
{'title': 'Lover', 'artist': 'Taylor Swift', 'year': 2019, 'genre': ['Pop', 'Synth-pop', 'Electropop'], 'tracks': {1: 'I Forgot That You Existed', 2: 'Cruel Summer', 3: 'Lover', 4: 'The Man', 5: 'The Archer', 6: 'I Think He Knows', 7: 'Miss Americana & The Heartbreak Prince', 8: 'Paper Rings', 9: 'Cornelia Street', 10: 'Death By A Thousand Cuts', 11: 'London Boy', 12: "Soon You'll Get Better (feat. Dixie Chicks)", 13: 'False God', 14: 'You Need To Calm Down', 15: 'Afterglow', 16: 'Me! (feat. Brendon Urie of Panic! At The Disco)', 17: "It's Nice To Have A Friend", 18: 'Daylight', 19: 'All Of The Girls You Loved Before'}, 'producer': {'Taylor Swift', 'Louis Bell', 'Joel Little', 'Jack Antonoff', 'Frank Dukes'}}
for k,v in lover_album.items(): # iterate using a for loop for key and value
    print(str(k) + ": " + str(v))

# Write your own code to print tracks in readable format

tracks = lover_album['tracks']
print("----------------")
print("Tracks in Lover:")
# Iterate over the tracks dictionary and print each track
for track_number, track_title in tracks.items():
    print(f"Track {track_number}: {track_title}")
title: Lover
artist: Taylor Swift
year: 2019
genre: ['Pop', 'Synth-pop']
tracks: {1: 'I Forgot That You Existed', 2: 'Cruel Summer', 3: 'Lover', 4: 'The Man', 5: 'The Archer', 6: 'I Think He Knows', 7: 'Miss Americana & The Heartbreak Prince', 8: 'Paper Rings', 9: 'Cornelia Street', 10: 'Death By A Thousand Cuts', 11: 'London Boy', 12: "Soon You'll Get Better (feat. Dixie Chicks)", 13: 'False God', 14: 'You Need To Calm Down', 15: 'Afterglow', 16: 'Me! (feat. Brendon Urie of Panic! At The Disco)', 17: "It's Nice To Have A Friend", 18: 'Daylight'}
----------------
Tracks in Lover:
Track 1: I Forgot That You Existed
Track 2: Cruel Summer
Track 3: Lover
Track 4: The Man
Track 5: The Archer
Track 6: I Think He Knows
Track 7: Miss Americana & The Heartbreak Prince
Track 8: Paper Rings
Track 9: Cornelia Street
Track 10: Death By A Thousand Cuts
Track 11: London Boy
Track 12: Soon You'll Get Better (feat. Dixie Chicks)
Track 13: False God
Track 14: You Need To Calm Down
Track 15: Afterglow
Track 16: Me! (feat. Brendon Urie of Panic! At The Disco)
Track 17: It's Nice To Have A Friend
Track 18: Daylight
def search():
    search = input("What would you like to know about the album?")
    if lover_album.get(search.lower()) == None:
        print("Invalid Search")
    else:
        print(lover_album.get(search.lower()))

search()

# This is a very basic code segment, how can you improve upon this code?
# you can add a feature where the user can add another song to the dictionary
# you can also add a feature where if the user types a typo, they can try again 
# the print messages are also not very specific or helpful, they could tell the user more clearly of a problem
['Pop', 'Synth-pop']

Hacks

  • Answer ALL questions in the code segments
  • Create a diagram or comparison illustration (Canva).
    • What are the pro and cons of using this data structure?
    • Dictionary vs List
  • Expand upon the code given to you, possible improvements in comments
  • Build your own album showing features of a python dictionary

For Mr. Yeung's class: Justify your favorite Taylor Swift song, answer may effect seed

Using My Own Album

melodramaAlbum = {
    "title": "Melodrama",
    "artist": "Lorde",
    "year": 2017,
    "genre": ["Pop", "Alternative/Indie"],
    "awards": ["2018 Grammy Award for Album of the Year", "2018 NME Award for Best Album", "2017 ARIA Music Award for Best International Artist"],
    "tracks": {
        1: "Green Light",
        2: "Sober",
        3: "Homemade Dynamite",
        4: "The Louvre",
        5: "Liability",
        6: "Hard Feelings/Loveless",
        7: "Sober II (Melodrama)",
        8: "Writer in The Dark",
        9: "Supercut",
        10: "Liability Reprise",
        11: "Perfect Places"
    }
}

print(melodramaAlbum)
{'title': 'Melodrama', 'artist': 'Lorde', 'year': 2017, 'genre': ['Pop', 'Alternative/Indie'], 'awards': ['2018 Grammy Award for Album of the Year', '2018 NME Award for Best Album', '2017 ARIA Music Award for Best International Artist'], 'tracks': {1: 'Green Light', 2: 'Sober', 3: 'Homemade Dynamite', 4: 'The Louvre', 5: 'Liability', 6: 'Hard Feelings/Loveless', 7: 'Sober II (Melodrama)', 8: 'Writer in The Dark', 9: 'Supercut', 10: 'Liability Reprise', 11: 'Perfect Places'}}
print("Genre: ")
print(melodramaAlbum['genre'])

print("Track 8: ")
print(melodramaAlbum['tracks'][8])
Genre: 
['Pop', 'Alternative/Indie']
Track 8
Writer in The Dark

Search Function that allows for typos in album info

  • for example: if the user inputted genree or trackss, they would be able to try again
  • I installed fuzzywuzzy which is a python library for fuzzy string matching. It provides a set of functions and algorithms for comparing and measuring the similarity between two strings.
  • I think this would be a great addition to our final project because we also have a search function and if the user inputs a song that is spelled wrong, we could still locate what song they intended.
from fuzzywuzzy import fuzz, process

def album_search():
    max_attempts = 3
    attempts = 0
    while attempts < max_attempts:
        search_key = input("What would you like to know about the album? ")
        search_key = search_key.lower()

        # Check for exact match
        if search_key in melodramaAlbum:
            print(f"{search_key.capitalize()}: {melodramaAlbum[search_key]}")
            break
        else:
            # Perform fuzzy search
            matches = process.extract(search_key, melodramaAlbum.keys(), limit=5)
            match_names = [match[0] for match in matches]
            print(f"Did you mean one of these {', '.join(match_names)}?")

            # Prompt user to input what they actually mean
            user_input = input("Please enter what you actually mean: ")
            if user_input.lower() in melodramaAlbum:
                print(f"{user_input.capitalize()}: {melodramaAlbum[user_input.lower()]}")
                break
            else:
                attempts += 1
                print("Unfortunately, we don't have that info.")

        if attempts == max_attempts:
            print("You have exceeded the maximum number of attempts.")
            break

album_search()
/home/kaylee/anaconda3/lib/python3.9/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
  warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/home/kaylee/vscode/myproject/_notebooks/2023-03-29-DS-hashmaps.ipynb Cell 18 in <cell line: 33>()
     <a href='vscode-notebook-cell://wsl%2Bubuntu/home/kaylee/vscode/myproject/_notebooks/2023-03-29-DS-hashmaps.ipynb#X23sdnNjb2RlLXJlbW90ZQ%3D%3D?line=29'>30</a>             print("You have exceeded the maximum number of attempts.")
     <a href='vscode-notebook-cell://wsl%2Bubuntu/home/kaylee/vscode/myproject/_notebooks/2023-03-29-DS-hashmaps.ipynb#X23sdnNjb2RlLXJlbW90ZQ%3D%3D?line=30'>31</a>             break
---> <a href='vscode-notebook-cell://wsl%2Bubuntu/home/kaylee/vscode/myproject/_notebooks/2023-03-29-DS-hashmaps.ipynb#X23sdnNjb2RlLXJlbW90ZQ%3D%3D?line=32'>33</a> album_search()

/home/kaylee/vscode/myproject/_notebooks/2023-03-29-DS-hashmaps.ipynb Cell 18 in album_search()
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/kaylee/vscode/myproject/_notebooks/2023-03-29-DS-hashmaps.ipynb#X23sdnNjb2RlLXJlbW90ZQ%3D%3D?line=7'>8</a> search_key = search_key.lower()
     <a href='vscode-notebook-cell://wsl%2Bubuntu/home/kaylee/vscode/myproject/_notebooks/2023-03-29-DS-hashmaps.ipynb#X23sdnNjb2RlLXJlbW90ZQ%3D%3D?line=9'>10</a> # Check for exact match
---> <a href='vscode-notebook-cell://wsl%2Bubuntu/home/kaylee/vscode/myproject/_notebooks/2023-03-29-DS-hashmaps.ipynb#X23sdnNjb2RlLXJlbW90ZQ%3D%3D?line=10'>11</a> if search_key in melodramaAlbum:
     <a href='vscode-notebook-cell://wsl%2Bubuntu/home/kaylee/vscode/myproject/_notebooks/2023-03-29-DS-hashmaps.ipynb#X23sdnNjb2RlLXJlbW90ZQ%3D%3D?line=11'>12</a>     print(f"{search_key.capitalize()}: {melodramaAlbum[search_key]}")
     <a href='vscode-notebook-cell://wsl%2Bubuntu/home/kaylee/vscode/myproject/_notebooks/2023-03-29-DS-hashmaps.ipynb#X23sdnNjb2RlLXJlbW90ZQ%3D%3D?line=12'>13</a>     break

NameError: name 'melodramaAlbum' is not defined