drop down to sqlite3 for gpkg read to avoid gdal dependencies #604

jameshalgren · 2023-02-22T14:22:14Z

t-route/src/troute-network/troute/HYFeaturesNetwork.py

Lines 16 to 20 in 01c713a

    
           def read_geopkg(file_path): 
        
               flowpaths = gpd.read_file(file_path, layer="flowpaths") 
        
               attributes = gpd.read_file(file_path, layer="flowpath_attributes").drop('geometry', axis=1) 
        
               #merge all relevant data into a single dataframe 
        
               flowpaths = pd.merge(flowpaths, attributes, on='id')

In most cases (in every case?) for t-route, we are discarding the geometry data and only use the geo-enabled packages for convenience in reading the files where the network connectivity information is stored.

To avoid unnecessary (see note) dependencies, we could update this function to look something like the following:

## TESTED; See note below
import sqlite3
import pandas as pd

def read_geopkg(file_path):
    flowpaths = read_geopkg_layer_sqlite3(file_path, layer="flowpaths")
    attributes = read_geopkg_layer_sqlite3(file_path, layer="flowpath_attributes")
    #merge all relevant data into a single dataframe
    return pd.merge(flowpaths, attributes, on='id')

def read_geopkg_layer_sqlite3(file_path, layer):
    con = sqlite3.connect(file_path)
    return pd.read_sql_query("SELECT * from " + layer, con)

Note: More accurately, "troublesome" dependencies. Gdal has been causing some trouble on the arm64 builds and if we can circumvent those issues, we gain (potentially) quite a bit of performance.
Note2: The geopandas read is (I believe) lazy, so this direct sql read with the connection closing will carry a performance hit, potentially, especially for very large flowpath tables.

jameshalgren · 2023-02-22T14:49:25Z

Tested this by using the old function and the new function to read a geopackage for the 03w VPU:

wget -P 03w -c https://nextgen-hydrofabric.s3.amazonaws.com/v1.2/nextgen_03W.gpkg

Then compare the outputs:

d = read_geopkg("03w/nextgen_03W.gpkg")
d2 = read_geopkg_sqlite3("03w/nextgen_03W.gpkg")
d.drop(["geometry"],axis=1) == d2.drop(["geom", "fid_x", "fid_y"],axis=1)
## Returns all True

The only difference is the geometry.

Useful comments on datacarpentry.org here.

jameshalgren · 2023-02-22T15:19:32Z

ping @ZacharyWills

groutr · 2023-03-30T16:46:46Z

Your sql statement is a special case that has its own convenience function, pd.read_sql_table.

pd.read_sql_table('flowpaths', con, index_col='id')

jameshalgren · 2024-08-22T19:19:59Z

Your sql statement is a special case that has its own convenience function, pd.read_sql_table.
pd.read_sql_table('flowpaths', con, index_col='id')

Python 3.10+

JoshCu mentioned this issue Aug 22, 2024

remove geopandas and fiona + bugfixes #833

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drop down to sqlite3 for gpkg read to avoid gdal dependencies #604

drop down to sqlite3 for gpkg read to avoid gdal dependencies #604

jameshalgren commented Feb 22, 2023 •

edited

Loading

jameshalgren commented Feb 22, 2023 •

edited

Loading

jameshalgren commented Feb 22, 2023

groutr commented Mar 30, 2023

jameshalgren commented Aug 22, 2024

drop down to sqlite3 for gpkg read to avoid gdal dependencies #604

drop down to sqlite3 for gpkg read to avoid gdal dependencies #604

Comments

jameshalgren commented Feb 22, 2023 • edited Loading

jameshalgren commented Feb 22, 2023 • edited Loading

jameshalgren commented Feb 22, 2023

groutr commented Mar 30, 2023

jameshalgren commented Aug 22, 2024

jameshalgren commented Feb 22, 2023 •

edited

Loading

jameshalgren commented Feb 22, 2023 •

edited

Loading