You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Different Tattle contributors periodically upload their chat backups to a designated folder on a Google Drive owned by Tattle. Tattle Admins should then be able to run a script to download the content of this drive and transform into a desired structure (to be explained later)
Background:
A WhatsApp Group Chat’s content can be backed up on your google drive. This backup is stored in a folder that has the same name as the WhatsApp Group (enforced by a tattle team member). This folder contains :
A .txt file containing timestamped stream of WhatsApp messages AND/OR
image and video files that were part of this group chat
Objective:
Obtain data for every WhatsApp Group in a structured form (JSON preferred) so that it can be stored in a MongoDB. This structured file should contain
the timestamp of the message,
the content of the message
If the message is a text message, this should be a string containing that text
If the message is a image or video, it should contain the path to the file on your local machine
an Anonymized sender id (to obfuscate sender’s phone number)
Current Progress:
I encourage you to read about the various authentication methods that Google offers to programmatically access their services (Drive in our case). In my research, I tried out a few and moved ahead with something that they call Service Accounts.
Check out the functions getFilesInThisFolder(), getFoldersInThisFolder(), getFolderFromDriveByName() here
They contain some examples of how to GET directory and file information from google drive. Hopefully they parameter sent to the drive.files.list() function in my code will serve as documentation of google drive API and save you some time.
You will also authentication related code in that file that might be helpful. In my understanding the challenge with google drive has been figuring out what is the right authentication mechanism for your task. once thats done the process of actually fetching data from google drive is always the same.
you'll also see a reference to a file named '/whatsapp-scraper-668a815fc26f.json'. This was generated for the service account for Tattle's Gmail account. We can send this to you in case you just want to try it out.
If the message is a text message, this should be a string containing that text
If the message is a image or video, it should contain the path to the file on your local machine
an Anonymized sender id (to obfuscate sender’s phone number). This is unique to a file and not persistent across files.
the group name based on the file name
The primary database doesn't have any automated linkages between messages, but in a second production database we can connect messages based on timestamp and phone number.
User Story:
Different Tattle contributors periodically upload their chat backups to a designated folder on a Google Drive owned by Tattle. Tattle Admins should then be able to run a script to download the content of this drive and transform into a desired structure (to be explained later)
Background:
A WhatsApp Group Chat’s content can be backed up on your google drive. This backup is stored in a folder that has the same name as the WhatsApp Group (enforced by a tattle team member). This folder contains :
Objective:
Obtain data for every WhatsApp Group in a structured form (JSON preferred) so that it can be stored in a MongoDB. This structured file should contain
Current Progress:
I encourage you to read about the various authentication methods that Google offers to programmatically access their services (Drive in our case). In my research, I tried out a few and moved ahead with something that they call Service Accounts.
Check out the functions getFilesInThisFolder(), getFoldersInThisFolder(), getFolderFromDriveByName() here
They contain some examples of how to GET directory and file information from google drive. Hopefully they parameter sent to the drive.files.list() function in my code will serve as documentation of google drive API and save you some time.
You will also authentication related code in that file that might be helpful. In my understanding the challenge with google drive has been figuring out what is the right authentication mechanism for your task. once thats done the process of actually fetching data from google drive is always the same.
you'll also see a reference to a file named '/whatsapp-scraper-668a815fc26f.json'. This was generated for the service account for Tattle's Gmail account. We can send this to you in case you just want to try it out.
Obfuscation phone number related code is here
The text was updated successfully, but these errors were encountered: