Search

Working with Communitys

the Community object is geosnap's central data structure. A Community is a dataset that stores information about a collection of neighborhoods over several time periods, including each neighborhood's physical, socioeconomic, and demographic attributes and its demarcated boundaries. Under the hood, each Community is simply a long-form geopandas geodataframe with some associated metadata.

If you're working with built-in data, you instantiate a Community by choosing the constructor for your dataset and passing either a boundary (geodataframe) or a selection filter that defines the study area. The selection filter can be either a GeoDataFrame boundary or a set of FIPS codes. Boundary queries are often more convenient but they are more expensive to compute and will take longer to construct.

When constructing Communitys from fips codes, the constructor has arguments for state, county, msa, or list of any arbitrary fips codes. If more than one of these arguments is passed, geosnap will use the union. This means that each level of the hierarchy is available for convenience but you are free to mix and match msas, states, counties, and even single tracts to create your study region of choice

If you're working with your own data, you instantiate a Community by passing a list of geodataframes (or a single long-form).

from geosnap import Community

Create a Community from built-in census data

The quickest and easiest method for getting started is to instantiate a Community using the built-in census data. To do so, you use the Community.from_census constructor:

# this will create a new community using data from Washington DC (which is fips code 11)
dc = Community.from_census(state_fips='11')

Note that when using Community.from_census, the resulting community has unharmonized tract boundaries, meaning that the tracts are different for each decade

To access the underlying data from a Community, simply call its gdf attribute which returns a geodataframe

dc.gdf.head()
geoid n_mexican_pop n_cuban_pop n_puerto_rican_pop n_total_housing_units n_vacant_housing_units n_occupied_housing_units n_owner_occupied_housing_units n_renter_occupied_housing_units n_white_persons ... p_irish_born_pop p_italian_born_pop p_poverty_rate_children p_poverty_rate_hispanic p_russian_born_pop p_scandanavian_born_pop p_scandanavian_pop n_total_pop_sample p_female_labor_force p_black_persons
0 11001001600 4409.0 2.0 7.0 1667.0 43.0 1624.0 1458.0 166.0 114423.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 11001001500 5430.0 11.0 15.0 2309.0 78.0 2231.0 1898.0 333.0 4694102.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 11001001701 2868.0 5.0 22.0 1287.0 50.0 1237.0 619.0 618.0 31727.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 11001001801 694.0 5.0 28.0 10.0 1.0 9.0 0.0 9.0 35813.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 11001001702 2516.0 6.0 12.0 1086.0 43.0 1043.0 712.0 331.0 48917.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 195 columns

# create a little helper function for plotting a time-series

import matplotlib.pyplot as plt

def plot(community, column):
    fig, axs = plt.subplots(1,3, figsize=(15,5))
    axs=axs.flatten()

    community.gdf[community.gdf.year==1990].dropna(subset=[column]).plot(column=column, scheme='quantiles', cmap='Blues', k=6, ax=axs[0])
    axs[0].axis('off')
    axs[0].set_title('1990')

    community.gdf[community.gdf.year==2000].dropna(subset=[column]).plot(column=column, scheme='quantiles', cmap='Blues', k=6, ax=axs[1])
    axs[1].axis('off')
    axs[1].set_title('2000')

    community.gdf[community.gdf.year==2010].dropna(subset=[column]).plot(column=column, scheme='quantiles', cmap='Blues', k=6, ax=axs[2])
    axs[2].axis('off')
    axs[2].set_title('2010')
plot(dc, 'p_nonhisp_white_persons')

Create a Community from a longitudinal database

To instantiate a Community from a longitudinal database, you must first register the database with geosnap using either store_ltdb or store_ncdb. Once the data are available in datasets, you can call Community.from_ltdb and Community.from_ncdb

LTDB using fips codes

I don't know the Riverside MSA fips code by heart, so I'll slice through the msas dataframe in the data store to find it

from geosnap import datasets
datasets.msas()[datasets.msas().name.str.startswith('Riverside')]
geoid name type geometry
725 40140 Riverside-San Bernardino-Ontario, CA Metro Area POLYGON ((-117.673749 33.870831, -117.673941 3...
riverside = Community.from_ltdb(msa_fips='40140')
plot(riverside, 'p_poverty_rate')

Instead of passing a fips code, I could use the boundary argument to pass the riverside MSA as a geodataframe. This is more computationally expensive because it requires geometric operations, but is more flexible because it allows you to create communities that don't nest into fips hierarchies (like zip codes, census designated places, or non-US data)

NCDB Using a boundary

# grab the boundary for Sacramento from libpysal's built-in examples

import geopandas as gpd
import libpysal
sac = gpd.read_file(libpysal.examples.get_path('sacramentot2.shp'))
sacramento = Community.from_ncdb(boundary=sac)
plot(sacramento, 'median_household_income')

Create a Community from a list of geodataframes

If you are working outside the US, or if you have data that aren't included in geosnap (like census blocks or zip codes) then you can still create a community using the Community.from_geodataframes constructor, which allows you to pass a list of geodataframes that will be concatenated into the single long-form gdf structure that geosnap's analytics expect.

This constructor is typically used in cases where a researcher has several shapefiles for a study area, each of which pertainin to a different time period. In such a case, the user would read each shapefile into a geodataframe and ensure that each has a "time" column that will differentiate each time period from one another in the long-form structure (e.g. if each shapefile is a different decade, then the 1990 shapefile should have a column called "year" in which every observation has a value of 1990). Then, these geodataframes simply need to be passed in a list to the from_geodataframes constructor

Here, I'll use cenpy to grap population data from two different ACS vintages and combine them into a single community

from cenpy.products import ACS
chi13 = ACS(2013).from_place('chicago', variables='B00001_001E')
chi13['year'] = 2013
chi17 = ACS(2017).from_place('chicago', variables='B00001_001E')
chi17['year'] = 2017
chicago = Community.from_geodataframes([chi13, chi17])
fig, axs = plt.subplots(1,2, figsize=(12,5))

chicago.gdf[chicago.gdf.year==2013].dropna(subset=['B00001_001E']).plot(column='B00001_001E', cmap='Greens', scheme='quantiles', k=6, ax=axs[0])
axs[0].axis('off')
axs[0].set_title('2013')
chicago.gdf[chicago.gdf.year==2017].dropna(subset=['B00001_001E']).plot(column='B00001_001E', cmap='Greens', scheme='quantiles', k=6, ax=axs[1])
axs[1].axis('off')
axs[1].set_title('2017')