For data scientists, data visualization is a very important step to show some insights. Not only bar charts, line graphs, and scatter plots are very useful, but also maps are also very helpful to know our data better. In this blog, I will share some of my experiences and skills for how to plot a map of the world, country, and city.
How to generate countries' abbreviations?
Let’s choose a real-time topic — COVID-19. The data is from Kaggle which contains some columns such as country or region name, confirmed cases, and fatalities. How to plot a world map based on countries' names with variables — confirmed cases and fatalities? If we only have countries names how to plot a world map? There is a package which is called pycountry. If we need countries’ abbreviations (2-letter or 3-letter country code) such as the USA, CHN, KOR, etc, we can use pycountry library in python.
We are able to obtain countries’ abbreviations using pycountry. The first thing we need to do is to import the pycountry package and I wrote the function as follows:
# generate country code based on country name
import pycountry def alpha3code(column):
CODE=[]
for country in column:
try:
code=pycountry.countries.get(name=country).alpha_3
# .alpha_3 means 3-letter country code
# .alpha_2 means 2-letter country code
CODE.append(code)
except:
CODE.append('None')
return CODE# create a column for code
df['CODE']=alpha3code(df.Country_Region)
df.head()
What is Geopandas?
So what is the CODE for? The answer is —it’s for merging with one of geopandas datasets. GeoPandas is an open-source project to make working with geospatial data in python easier. If we want to add each country’s name and the number of confirmed cases and fatalities, we need another data — ‘location’ which contains each country’s latitude and longitude.
# first let us merge geopandas data with our data
# 'naturalearth_lowres' is geopandas datasets so we can use it directly
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))# rename the columns so that we can merge with our data
world.columns=['pop_est', 'continent', 'name', 'CODE', 'gdp_md_est', 'geometry']# then merge with our data
merge=pd.merge(world,df,on='CODE')# last thing we need to do is - merge again with our location data which contains each country’s latitude and longitude
location=pd.read_csv('https://raw.githubusercontent.com/melanieshi0120/COVID-19_global_time_series_panel_data/master/data/countries_latitude_longitude.csv')
merge=merge.merge(location,on='name').sort_values(by='Fatalities',ascending=False).reset_index()
World Map
We can get some information from “merge” data — a country name, confirmed cases, fatalities, latitude, and longitude. Now it’s time to plot a world map with python !!!!!
# plot confirmed cases world map
merge.plot(column='Confirmed_Cases', scheme="quantiles",
figsize=(25, 20),
legend=True,cmap='coolwarm')
plt.title('2020 Jan-May Confirmed Case Amount in Different Countries',fontsize=25)# add countries names and numbers
for i in range(0,10):
plt.text(float(merge.longitude[i]),float(merge.latitude[i]),"{}\n{}".format(merge.name[i],merge.Confirmed_Cases[i]),size=10)plt.show()
You can resize the map, text (countries’ names and numbers), and the colors. The colors will be given to those countries based on their total numbers of confirmed cases by their quantiles.
USA Map by State
In this section, I will introduce the Cartographic Boundary Files — Shapefile. The cartographic boundary files are simplified representations of selected geographic areas from the Census Bureau’s MAF/TIGER geographic database. These boundary files are specifically designed for small scale thematic mapping. You can download shapefiles from :
So how does a shapefile work? How does a shapefile look like? Normally ‘Shapefile’ is a folder which contains serval files and we need all of them to plot a map because those files work together. Let us take a look at USA COVID-19 data by state.
Then we need to know the shapefile's path and use Geopandas to load the data from the shapefile as geo_usa. Then we can merge usa_state data which contains COVID-19 information.
# url of our shape file
path="copy and paster the folder's path"# load the shape file using geopandas
geo_usa = geopandas.read_file(path+'cb_2018_us_state_20m')# merge usa_state data and geo_usa shapefile
geo_merge=geo_usa.merge(usa_state,on='NAME')# plot USA map
geo_merge.plot(column='Confirmed_Cases', scheme="quantiles",figsize=(25, 15),legend=True,cmap='coolwarm')
plt.xlim(-130,-60)
plt.ylim(20,55)
# add countries names and numbers
for i in range(len(geo_merge)):
plt.text(geo_merge.longitude[i],geo_merge.latitude[i],"{}\n{}".format(geo_merge.NAME[i],geo_merge.Confirmed_Cases[i]),size=10)
plt.title('COVID-19 Confirmed Cases by States',fontsize=25)
plt.show()
USA Map for a County
There are the same steps to plot a USA map by county. First, we need to download the USA county shapefile from the United States Census Bureau and get our county data ready!
# load the shapefile path="copy paste your county shape file path"
geo_county=geopandas.read_file(path+'cb_2018_us_county_20m')# rename columns
geo_county.columns=['STATEFP', 'COUNTYFP', 'COUNTYNS', 'AFFGEOID', 'GEOID', 'county', 'LSAD',
'ALAND', 'AWATER', 'geometry']#merge cb_2018_us_county_20m file with usa dta
geo_county=geo_county.merge(usa_county_df,on='county').dropna(axis=0).sort_values(by='Confirmed_Cases',ascending=False).reset_index()
geo_county.head()
We can see there are so many different columns, and what we need are just name, Confirmed_Cases, and Fatalities. Let us plot the USA map again but it’s a little bit different!
geo_county.plot(column='Confirmed_Cases',scheme="quantiles",figsize=(25,10),cmap='coolwarm')
plt.xlim(-130,-60)
plt.ylim(20,55)
plt.title('COVID-19 Confirmed Cases by County',fontsize=20)
plt.show()
Map of New York City
The data is from the New York City Health Department's official Website Github, and it has data of New York City boroughs — The Bronx, Queens, Brooklyn, Manhattan, and Staten Island.
# data for NYC boroughs
url='https://raw.githubusercontent.com/nychealth/coronavirus-data/master/boro/boroughs-case-hosp-death.csv'nyc=pd.read_csv(url)# rearrange the date a little bit
nyc['Date']=pd.to_datetime(nyc.DATE_OF_INTEREST)# create features list for each borough's confirmed case and fatality
boroughs=[['BK_CASE_COUNT', 'BK_DEATH_COUNT'],
['BX_CASE_COUNT', 'BX_DEATH_COUNT'],
['MN_CASE_COUNT', 'MN_DEATH_COUNT'],
['QN_CASE_COUNT', 'QN_DEATH_COUNT'],
['SI_CASE_COUNT', 'SI_DEATH_COUNT']]# obtian the sum of confirmed cases and fatalities in different boroughs
nyc_con=[]
nyc_fa=[]
for i in boroughs:
nyc_con.append(sum(nyc[i[0]]))
nyc_fa.append(sum(nyc[i[1]]))
Good news! There is NYC data in geopandas datasets, so we can directly use it and merge with new data which is called ‘con_fa_nyc’. ‘con_fa_nyc’ has five columns: borough name, number of confirmed cases and fatalities, each borough’s latitude and longitude! Again, information on latitude and longitude is not necessary, only if you want to add the boroughs’ names and the numbers.
# load the data from geopandas.datasets
nyc_shp = geopandas.read_file(geopandas.datasets.get_path('nybb'))# create a dataframe
con_fa_nyc=pd.DataFrame()
con_fa_nyc['BoroName']=['Brooklyn','Bronx','Manhattan','Queens','Staten Island']
con_fa_nyc['Confirmed_Cases']=nyc_con
con_fa_nyc['Fatalities']=nyc_fa
con_fa_nyc['longitude']=[985000,1000000,970000 ,1040000,925000]
con_fa_nyc['latitude']=[180000,250000,220000,200000,150000]# merge con_fa_nyc and nyc_shp
nyc_shp=nyc_shp.merge(con_fa_nyc,on='BoroName')
nyc_shp.head()
If you want to have a very fancy map you can add a background with a package — contextily. It will represent a map like a google map style.
# import background package
import contextily as ctx#Convert the data to Web Mercator
nyc_shp = nyc_shp.to_crs(epsg=3857)
ax = nyc_shp.plot(column='Confirmed_Cases',figsize=(10, 10), alpha=0.5, edgecolor='k' ,cmap='Reds',legend=True,scheme="quantiles")#Add background tiles to plot
ctx.add_basemap(ax)
If you do not want to have a fancy map you can use code as follows:
# plot new york city
ax = nyc_shp.plot(column='Confirmed_Cases',figsize=(10, 10), alpha=0.5, edgecolor='k', cmap='Reds',legend=True,scheme="quantiles")# add boroughs' names with numbers of confirmed cases and fatalities
for i in range(len(nyc_shp)):
plt.text(nyc_shp.longitude[i],nyc_shp.latitude[i],"{}\nConfirmed cases: {}\nFatalities: {}".format(nyc_shp.BoroName[i],nyc_shp.Confirmed_Cases[i],nyc_shp.Fatalities[i]),size=13)
plt.title('COVID-19 Confirmed Cases by boroughs',fontsize=25)
leg = ax.get_legend()
leg.set_bbox_to_anchor((1.3,1))
plt.show()
For more code detail please visit my GitHub!