Define standard necessary column names for input data. #24

anujsinha3 · 2024-06-20T23:11:55Z

Currently, each column in the CSV file is accessed by an integer index. This has the following limitations:

The workflow is prone to failures due to the wrong ordering of columns as the input CSV file columns are STRICTLY positioned with no flexibility.
The source code becomes convoluted and difficult to comprehend.
Code vectorization is difficult, and increased usage for nested loops impacts performance.

We plan to use pandas data frames going forward, for which we need to standardize the column names that will be part of the input CSV file.

Existing column names: (Confirm if these column names are standard ones, or if any change if required)
"unix_start_t",
"user_ID",
"orig_lat",
"orig_long",
"orig_unc",
"stay_lat",
"stay_long",
"stay_unc",
"stay_dur",
"stay_ind",
"human_start_t"

gracejia513 · 2024-06-27T19:07:30Z

Hi Anuj, I believe these columns are not part of the input file:
"stay_lat",
"stay_long",
"stay_unc",
"stay_dur",
"stay_ind",
"human_start_t"

However, we can use them as standardized column names for the output file.

Anurag19101996 · 2024-06-27T19:56:08Z

Hi @anujsinha3, these are the following column names:

User_ID
Orig_lat
Orig_long
Datetime
Orig_Unc (Not mandatory)

For output columns, please take a look at the output column names below:
https://uwnetid.sharepoint.com/sites/og_ssec_escience/_layouts/15/Doc.aspx?sourcedoc={6b0ea251-f0a8-4ce3-8ea9-d1d796dcf28f}&action=edit&wd=target%28Meeting%20Notes.one%7Cb0d0aa65-2fb3-4a83-9b78-ca0f174e22b5%2F2024.06.20%20UW%20internal%20meeting%7C4f2f25d4-7499-4f2c-aa8a-252b79a0cfdf%2F%29&wdorigin=NavigationUrl

Anurag19101996 · 2024-06-27T19:56:24Z

@gracejia513 Please confirm once.

gracejia513 · 2024-07-12T22:00:43Z

@anujsinha3 did you have a standardized column for datetime? Is it UNIX_START_T?

anujsinha3 · 2024-07-16T03:26:14Z

Column names have been standardized in the following format, i.e. snake_casing. The ordering of columns DO NOT matter in the csv file.

The column names are insensitive to capital or small letters but do require '_' where mentioned.

A few Examples:

'orig_lat', 'orig_long', 'unix_start_t', 'user_id'

anujsinha3 assigned anujsinha3 and gracejia513 and unassigned anujsinha3 Jun 25, 2024

anujsinha3 mentioned this issue Jun 26, 2024

perf: enhance incremental clustering to use pandas data frames #27

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define standard necessary column names for input data. #24

Define standard necessary column names for input data. #24

anujsinha3 commented Jun 20, 2024 •

edited

Loading

gracejia513 commented Jun 27, 2024

Anurag19101996 commented Jun 27, 2024

Anurag19101996 commented Jun 27, 2024

gracejia513 commented Jul 12, 2024

anujsinha3 commented Jul 16, 2024

Define standard necessary column names for input data. #24

Define standard necessary column names for input data. #24

Comments

anujsinha3 commented Jun 20, 2024 • edited Loading

gracejia513 commented Jun 27, 2024

Anurag19101996 commented Jun 27, 2024

Anurag19101996 commented Jun 27, 2024

gracejia513 commented Jul 12, 2024

anujsinha3 commented Jul 16, 2024

anujsinha3 commented Jun 20, 2024 •

edited

Loading