Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.nfo Scraper #1143

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

.nfo Scraper #1143

wants to merge 7 commits into from

Conversation

TgSeed
Copy link
Contributor

@TgSeed TgSeed commented Sep 26, 2022

Hello
We have imagined a program that would allow us to categorize and watch our "videos" in our local devices, stash came... then we imagined if stash can find metadata of "videos" automatically, scrapers came... now we imagining if stash can simply get metadata without going through scenes, search results and scrapers one by one, here is the .nfo scraper!

.nfo scraper allow you to set Scenes', Performers' and Studios' data with ease, if other people gets into it.
.nfo files contain data about the scene/movie, like the title, release date and many more. They're widely used by Kodi. You may have seen .nfo files in torrent files and if you're professional in that, you know that .nfo files are always in some torrent files
Unfortunately, i have never seen .nfo files with porn! i guess this is because it isn't going to be used.

Imagine if you get your hundred recently downloaded porn videos getting their metadata and things set in a single click! That's how .nfo can help if it get correctly implemented in stash (look how well Kodi is spreading .nfo files for "Movies")

The scraper, by the time of writing this, is capable of reading scene data from .nfo files and send to stash the scene data (cover, date, tags and more.), performers' names and studio name. because of current stash limitations.
Extending stash in some specific parts can unleash the .nfo scraper capabilities, without modifying the scraper so it will be able to create performers with their data (image, country, hair color, tattoos, tags and more.), studios with their name, aliases, URL and image.

The scraper tries to read .nfo files in the same directory of the video with the same filename, but with the .nfo extension.
OR .nfo file with the same filename but in the .nfo/ directory within the video's directory.
So so, C:\video1.mp4 nfo file can be C:\video1.nfo or C:\.nfo\video1.nfo
Scraping is easy peasy, because it uses Scrape by Fragment, you don't need to input search query or anything, and stash can do it in bulk.

There has been an attempt (#689) previously to make it and several Feature Requests (stashapp/stash#428, stashapp/stash#1199, #429) relating.

Side note: Currently there is a kodi-helper script by @WithoutPants which i found it has some minor issues with the latest stash or python 3.10 probably. Anyway, you may be able to use it to create .nfo files, however, let's wish for a better solution.

@TgSeed
Copy link
Contributor Author

TgSeed commented Sep 26, 2022

Example .nfo (made by kodi-helper, minor modifications made, e.g haircolor in actor element)

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<movie>
    <title>Abigail Part 2</title>
    <userrating></userrating>
    <plot>After her experience in the kitchen, they have been caught, and the job is done.   It's time for the next job.  A mature couple on their anniversary and the lady of the relationship has sent something a little special to her husband's room - in the shape of a new set of golf clubs.  But Abigail and Lena are here to do a job - to end a relationship that needs help to come to it's conclusion - whatever the reason.</plot>
    <uniqueid type="stash">1</uniqueid>
    
    <tag>69</tag>
    <tag>atm</tag>
    <tag>missionary</tag>
    <tag>riding</tag>
    <premiered>2018-08-14</premiered>
    <studio>Tushy</studio>
    
    <actor>
        <name>Abigail Mac</name>
        <role></role>
        <haircolor>blonde</haircolor>
        <order>0</order>
        <thumb>http://localhost/performer/88/image</thumb>
    </actor>
    <actor>
        <name>Lena Paul</name>
        <role></role>
        <order>1</order>
        <thumb>http://localhost/performer/93/image</thumb>
    </actor>
    <actor>
        <name>Mick Blue</name>
        <role></role>
        <tattoos>No</tattoos>
        <order>2</order>
        <thumb>http://localhost/performer/440/image</thumb>
    </actor>
    <thumb aspect="poster">http://localhost/scene/1/screenshot</thumb>
<thumb aspect="clearlogo">http://localhost/studio/9/image</thumb>
    <fanart><thumb>http://localhost/scene/1/screenshot</thumb>
<thumb>http://1localhost/studio/9/image</thumb></fanart>
    
</movie>

`folder.nfo` is useful when you have a folder that belongs to specific studio or a movie so you can set studio, date and more via that single .nfo file for all the scenes in the movie.

Variables in `.nfo` files: They're useful when you want to for example mix a title with the studio and the filename, so the xml file may have `title` element as `[%studio_name%] %filename%`
Supported variables at this time are:
* `title` - The title found in .nfo file, otherwise the title in scene fragment
* `filename` - The filename of the scene, without extension
* `fileextension` - The scene's file extension, e.g `.mp4`
* `studio_name` - The studio's name found in .nfo file, otherwise the studio in scene fragment
* `date` - The date found in .nfo file, otherwise the date in scene fragment
@TgSeed
Copy link
Contributor Author

TgSeed commented Sep 27, 2022

32d41a2 commit brings the following features:

folder.nfo is useful when you have a folder that belongs to specific studio or a movie so you can set studio, date and more via that single .nfo file for all the scenes in the movie.

Variables in .nfo files: They're useful when you want to for example mix a title with the studio and the filename, so the xml file may have title element as [%studio_name%] %filename%
Supported variables at this time are:

  • title - The title found in .nfo file, otherwise the title in scene fragment
  • filename - The filename of the scene, without extension
  • fileextension - The scene's file extension, e.g .mp4
  • studio_name - The studio's name found in .nfo file, otherwise the studio in scene fragment
  • date - The date found in .nfo file, otherwise the date in scene fragment

folder.nfo files are just .nfo files, nothing special about them, but can just be made to assign specific metadatas to all the scenes in the folder.
example folder.nfo file:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<movie>
    <title>[%studio_name%] %filename%</title>
    <studio>Example</studio>
</movie>

The nfo file above will set scene title for all the scenes in the folder like [Example] %filename% and the studio as Example

the data from folder.nfo files will not be passed to stash immediately, the scraper will try to look .nfo files dedicated for the scene after that. if exists, it will set the data that exists in the scene's .nfo file and leave them that doesn't exists as is set by folder.nfo

@ArcCdr
Copy link

ArcCdr commented Sep 27, 2022

Hi TgSeed. Seems we are working on the same idea with slightly different approaches. Are you on Discord so we can chat? My discord pseudo is also ArcCdr. If you are interested, my version is still early alpha (started this week): https://github.com/ArcCdr/CommunityScripts/tree/nfoSceneParser/plugins/nfoSceneParser

@bnkai
Copy link
Collaborator

bnkai commented Sep 27, 2022

@TgSeed @ArcCdr instead of starting from scratch have you checked #689 ?
From what i remember the only issue left with that PR was supporting moved local links for the images.
Scene/Movie covers can be either normal urls or absolute file paths(atm in this PR absolute paths eg c:\my_media\scene\cover.jpg will not work). In the case of file paths the images need to be b64 encoded and sent to stash. In some cases the absolute paths may not be correct, either because they were generated in another system or if stash is run on a docker container. In those case we need to transform the paths to get the images

@ArcCdr
Copy link

ArcCdr commented Sep 28, 2022

@TgSeed Indeed, finishing #689 is probably best. I started as a plug-in rather than a scraper => your concept is closer to the unfinished pr. Makes more sense that you continue on it...
One other thing missing from #689 is the support for movie. I like your idea of a folder.nfo for that. The ideal nfo scraper should definitely support movie creation...
There are some useful nfo data not yet parsed like rating.
Also, for some fields, nfo spec supports more than one option. It should be supported as well. From a quick look in the code:

  • details should scrape either "plot" or "outline". I would add "tagline" if those are missing
  • title should scrape either "title", "originaltitle" or "sorttitle"
  • rating should scrape either "userrating" or "ratings/rating"
  • date should scrape either "premiered", "year" or "releasedate"

@TgSeed
Copy link
Contributor Author

TgSeed commented Sep 28, 2022

@TgSeed @ArcCdr instead of starting from scratch have you checked #689 ?

Well, that was simple and honestly, i didn't liked it (i'm a man of .NET so i prefer classes over raw objects, they give type-hinting and more so coding and editing is easier)
So i decided to work on my own solution, you're of course free to merge whatever you want.

From what i remember the only issue left with that PR was supporting moved local links for the images.

Excuse me, I'm confused now. Those things must be handled at the .nfo file creation step, not reading.
.nfo file is a file, it shouldn't depend on other files. They're used as "portable" metadata.
if understood you correctly.

  • details should scrape either "plot" or "outline". I would add "tagline" if those are missing

  • title should scrape either "title", "originaltitle" or "sorttitle"

  • rating should scrape either "userrating" or "ratings/rating"

  • date should scrape either "premiered", "year" or "releasedate"

Yep, thanks.
They can be done in my code via python's or operator in a single line for simplicity, like below:

SceneObject.details = movie.findtext('plot') or movie.findtext('outline') or movie.findtext('tagline') or SceneObject.details

I will do it later.

@ArcCdr
Copy link

ArcCdr commented Sep 28, 2022

I'll let @bnkai decide on the "new" vs "update" given the previous pr. I'm not part of stash's team, just I have done NFO import/export before => I'll focus my advice on that.

Indeed, the plot/outline/tagline you can do like you mention. For others you need a bit more logic as the content are different as well and needs to be interpreted:

  • ratings can be either in "userrating" or "ratings", in which case they have a "max" attribute referring to the scale to use to bring back the value on a scale of 5 for stash
  • "year", you have to complete with month & day,...
  • "genre", you can consider synonym with tag and parse as well

A good refresh on the spec is https://kodi.wiki/view/NFO_files/Templates if you don't use it already.

Other things you might want to support is movie sets (Not just with the expectation that there will be a folder.nfo file). The nfo spec supports a "set" tag that I have seen used for porn movies (each file is a scene, the "set" is the movie). Directly in the scene nfo as "set tag then...

I saw you decided not to support ratings. Maybe add constants at the beginning of your scraper to let the user decide by (un)commenting a flag?

@bnkai
Copy link
Collaborator

bnkai commented Sep 29, 2022

Excuse me, I'm confused now. Those things must be handled at the .nfo file creation step, not reading. .nfo file is a file, it shouldn't depend on other files. They're used as "portable" metadata. if understood you correctly.
I was explaining what the status was for that PR. The only functionality missing from this PR is support for image files with file paths instead of URLs. The way this was handled is read the file and send it to stash as a b64 encoded string. Imho this was good enough but the user that made the PR mentioned a case where we need to process the path adjusting it if needed. NFO files are created via external apps eg Kodi, Jellyfin, ... so if a user is running them via docker or some other VM some custom images will have paths that are not correct when read by stash and that is something that the user can't select during NFO creation.
I am fine with the above corner case not being included to the PR if it's too complex for now ( I was planning to revisit the first PR when I get some time anyway), we can take care of that in a new PR.

@ArcCdr any PR that is working and tests ok is fine to merge. Do you have by any chance any non Kodi generated NFO samples? Do we target Kodi compliant nfos or general XML based nfos? I think there were some samples in the first PR and the relevant issues, I will update when I get some time from my PC

@ArcCdr
Copy link

ArcCdr commented Sep 29, 2022

@bnkai Thanks for the clarification on "merge conditions" ;-) NFO spec is pretty much defined by KODI. If you target that, you are green. Other soft added proprietary extensions, but they are of low/no value to stash. I'll send you via Discord a few nfo files of different "origins" if you want to do some tests.
@TgSeed I have the code for the more complete/complex support of the NFO tags/spec => don't worry about them. I'll propose it as a pr on your version once it is merged and becomes an official scraper!

@bnkai bnkai added the script Scraper executes a script label Sep 29, 2022
Assuming the `.nfo` creator output the date and rating as is in stash, for example, date must be in format `2006-01-30`

Thanks to @ArcCdr for the suggestions
@TgSeed
Copy link
Contributor Author

TgSeed commented Oct 7, 2022

Added the suggested aliases by @ArcCdr via c6b4b97 commit.

I don't see any other changes that I'm going to do because it works assuming the exported .nfo file is dedicated to stash and preserve it's formats (e.g the date format or Images are base64 encoded (base64 encoded image is to avoid inaccessible stash hosts urls for worldwide users)).

@ArcCdr LMK If I'm correct so we will know and we it get merged finally. then you're of course free to enhance it all.

@ArcCdr
Copy link

ArcCdr commented Oct 8, 2022

@TgSeed I think you have build a very sound scraper! I understood from bnkai's past comments, it was good to merge the first time. @bnkai you were going to test with a few nfo's before the merge I think?

Thanks for the most recent changes. For the "year" tag, I'm not sure how stash reacts when a plugin return just 4 digits year as date. Does it add "-01-01" to it or you need to do it in the plugin? Depending on that, you might want to contact the parsed year with month & day to avoid runtime errors.

If you are on discord, let me know your id and I can send you a set of nfo files you can use to test ones generated by various sources (not only KODI or stash plugins).

@bnkai
Copy link
Collaborator

bnkai commented Oct 12, 2022

@TgSeed can you move the scraper in a separate folder and add a README, mostly to explain the folder.nfo usage and the variables?

it works assuming the exported .nfo file is dedicated to stash and preserve it's formats e.g the date format or Images are base64 encoded

I was not aware of a program exporting images in the nfo as base64 strings, is that available somewhere? Please add to the readme the assumptions you made as well

@TgSeed if its ok with you i will merge after you are done with the scraper and then we will use a new PR to add extra functionality

@TgSeed
Copy link
Contributor Author

TgSeed commented Oct 15, 2022

@TgSeed can you move the scraper in a separate folder and add a README, mostly to explain the folder.nfo usage and the variables?

Done.

I was not aware of a program exporting images in the nfo as base64 strings, is that available somewhere?

Unfortunately, there's not or I don't know of.
But if anyone goes to make a tool to export .nfo files or make it built-in Stash, it must do it that way. It's simple, Someone else's local URL has no meaning for other users/people.

@TgSeed if its ok with you i will merge after you are done with the scraper and then we will use a new PR to add extra functionality

Sure, it's all done from my point of view. Thanks

@ArcCdr
Copy link

ArcCdr commented Oct 19, 2022

But if anyone goes to make a tool to export .nfo files or make it built-in Stash, it must do it that way. It's simple, Someone else's local URL has no meaning for other users/people.

@TgSeed The nfo spec for images is to have them as separate files. for instance video.mp4 & video-landscape.jpg. I am working with Scruffy to refine his SNEK plugin that generates nfos. We will do it like that (image export as file, not as B64 text into the nfo).

Let's try to stick to the spec as much as possible. So I guess you will be the only user of the B64 in the nfo ;-) You might want to support loading the image from the nfo compliant file structure as well...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
script Scraper executes a script
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants