Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to import/export/syncronize content #135

Closed
2 tasks
DenoBeno opened this issue Jan 24, 2020 · 21 comments
Closed
2 tasks

Provide a way to import/export/syncronize content #135

DenoBeno opened this issue Jan 24, 2020 · 21 comments
Assignees
Labels
BB: Infrastructure Container Engine / Cloud Infrastructure Building Block SHOWSTOPPER Feature or bug, that, if not addressed, renders the CSIS essentially useless

Comments

@DenoBeno
Copy link

As we are moving from a single site to devel/test/public schema, we need to establish some way to syncronize part of the content across sites.

  • One possibility to do this is with "migration" module. This is sure to work but requires quite a lot of configuration.
  • Another posibility may be the content-synchronization module. This one provides drush commands for import/export of various content types. The contents are all saved in yaml files, similarly to the way configuration syncronization module works.

Drush commands documentation looks promissing, but this needs to be tested: https://www.drupal.org/docs/8/modules/content-synchronization/drush-commands

@DenoBeno
Copy link
Author

@fgeyer16 , WDYT?

@p-a-s-c-a-l p-a-s-c-a-l added the BB: Infrastructure Container Engine / Cloud Infrastructure Building Block label Jan 27, 2020
@p-a-s-c-a-l p-a-s-c-a-l added this to the D1.4 CLARITY CSIS v2 milestone Jan 27, 2020
@fgeyer16
Copy link
Contributor

I only read from the content-synchronization now, so I do not have any practical "feeling" how it works and what could be the pitfalls. And I am a great fan of Drupal 8's migrate module so my opinion is likely biased ;-).

I think the need to do configuration is the advantage and disadvantage of the migrate module at the same time as the easy use of the content-synchronization is its advantage and disadvantage as well.
If you really want to move all the content between dev/testing/prod then content-synchronization may be the better solution because of no need to configure something. And you can move config and content in one step by copying the sync folder(s) from one instance to teh other.

Do we have dev-only content?

But I think we will have dev content as well as we have dev config which should not be moved from dev to testing or prod. For config there is the config-split module to define sets of configurations so you can move them separately without the need to sort out the "1 000 not needed config yml files from the 1 000 000 existing config yml files". But for content-synchronization you have to sort them out manually or you have to export content by uuid. And this every time you want to synchronize which is may be very time consuming. While if you use migrate you will have to do time consuming configuration like a view (e.g. all nodes without titles beginning with "DEV" ) and the migration only once.

Get real world content into dev

Another pitfall I see with content-synchronization is that data models in dev/testing/prod are not necessarily the same. In direction dev->prod they should be. but what if we need real world content in dev e.g. given to test automatic tagging of content, or some cool new feature which needs change in data model (change of field type , new fields or something similarly ). Then content-synchronization will fail, because it assumes that data model is the same.
With migrate configured to map fields into other fields or omit fields this will work.

Is it sensible to have two modules to do the same?

I think for csis we will need the migrate module anyway to mirror offers (solutions) and show cases from marketplace to csis. In csis they will have only the fields which are needed to show them (title, summary taxonomies and location) and the link to the full content on the marketplace. So this has to be done with migrate.
Is it sensible to use different tools for syncing between dev/testing/prod and syncing inside the myclimateservices universe?

Syncing automatically

Automatic syncing is maybe not a scenarion for syncing dev/testing/prod just for completeness:
Marketplace is checking for new users and organisations on profile every 5 minutes triggered by cron. The view at profile only shows users and organisations changed in the last 6 minutes. So data transfer is reduced to the necessary. Using content-synchronization all content would be transferred and decision if content has changed is made on the target system. And syncing automatically by cron is more complex because you will have to run a cron job for export on the source system and one for the import on the target system. Data transfer has to be be done by one of these cron jobs. How to decide how many time is needed between the export and import to be sure all content is exported?

This are my thoughts. As I said at the beginning this is likely biased to favor migrate because I already worked a lot with it, and like it.

If you do not manage to write a migration for some content dependent on taxonomies and other stuff you maybe should rethink the data model for less complexity ;-).

@p-a-s-c-a-l
Copy link
Member

p-a-s-c-a-l commented Feb 4, 2020

Since we don't want users to enter data into the prod system when is not yet synchronised with the dev system, we temporarily point both csis.myclimateservices.eu and csis-dev.myclimateservices.eu to the dev system. See also #23 (comment) .

Once the migration strategy has been implemented, we'll use csis.myclimateservices.eu for the prod system.

@p-a-s-c-a-l p-a-s-c-a-l added the SHOWSTOPPER Feature or bug, that, if not addressed, renders the CSIS essentially useless label Feb 7, 2020
@p-a-s-c-a-l
Copy link
Member

@fgeyer16 you can test the synchronisation now between https://csis-dev.ait.ac.at/ and https://csis.ait.ac.at/

@patrickkaleta
Copy link

I tested the Config Split module mentioned by @fgeyer16 . I think this is what we should use for handling import/export of Drupal configuration, since it allows us to define e.g. which modules to be active only in Dev-environment, or that CSS and JS files should be aggregated on the Prod-Server etc, etc.

I will run it on our two new servers with @fgeyer16 and as a start define that the Devel module should only be enabled on Dev.

Note: With this module it's possible to handle differences in the configuration which are more or less permanent (like examples above). Managing scenarios like "we added a new field to data packages in Dev but don't want it just yet on Prod in the next synchronization" are practically impossible to handle with this module.

@patrickkaleta
Copy link

For the import/export of actual content I'll now look into option 2 proposed by @DenoBeno (content synchronization with drush).

I think that should be our preferred way of migrating content, since it's flexible and doesn't require any configuration. Should it sometimes fail (due to changed data models in Dev, etc) we can go the extra mile and configure the migrate module to handle that specific content type.

Regarding dev-only content:
We could use a common prefix for all dev-content titles and on the Prod-site a view could list all those dev entities, which we then unpublish with a bulk edit.

@p-a-s-c-a-l
Copy link
Member

Another posibility may be the content-synchronization module. This one provides drush commands for import/export of various content types. The contents are all saved in yaml files, similarly to the way configuration syncronization module works.

Just an idea: If we commit the configuration into our private GitLab repo we can probably set-up some git-lab pipelines that trigger integration tests and automated deployment.

@patrickkaleta
Copy link

  • Another posibility may be the content-synchronization module. This one provides drush commands for import/export of various content types. The contents are all saved in yaml files, similarly to the way configuration syncronization module works.

Drush commands documentation looks promissing, but this needs to be tested: https://www.drupal.org/docs/8/modules/content-synchronization/drush-commands

Sounds good, but doesn't work...
Seriously, I don't know how this Content Sync module can be marked as a stable version. To me this is a alpha/beta version of an abandonded module (maintainers haven't committed anything or responded to any issues in almost a year). Drush commands are not available and the patches to fix this bug only work for a copy hosted in an external repository (so managing our modules with composer will be problematic).

Unfortunately, not even basic functionality via BE (import/export of full site content or individual node) is working correctly. This module seems to have issues with hierarchical taxonomy terms, which it wrongly tries to duplicate.

@fgeyer16 let's discuss this today. IMO trying to fix this module will take just as much time as going with the migration module and configure it for every one of our content types.

@patrickkaleta
Copy link

Config Split module tested on our 2 new servers and it works as hoped.

I'll write a more detailed "best-practice issue" about how to import/export configuration later. Here's just a short summary:

  • configuration will be stored as yaml files in two directories (/app/config/sync and app/config/dev_sync), so we can push/pull it via our private Git repo
  • sync folder has all the configuration that will be shared among both instances
  • dev_sync folder will contain configuration only relevant for the Dev instance
  • import/export of configuration via Drush commands drush config-split:import and drush config-split:export (short: drush csim and drush csex)

@patrickkaleta
Copy link

As for the import/export of content, after discussion with @DenoBeno and @fgeyer16 I will now look into these two modules: Structure sync and Features

@patrickkaleta
Copy link

As for the import/export of content, after discussion with @DenoBeno and @fgeyer16 I will now look into these two modules: Structure sync and Features

The Features module is not meant to be used in Drupal 8 for this purpose (as mentioned by its developers).

We can use the Structure sync module to import/export our taxonomy terms (countries, cities, study types, hazards, ...). It works similiar to the Config Split module - taxonomy terms are exported as a single yml file into the same sync folder as the configuration. I'll describe the process step-by-step in another "best practice" issue. It features a couple of options for the import and export, but for reasons mentioned below we should always select all taxonomies for export and run the import in full mode.

Downsides:

  1. it's not very efficient, since it always needs to export the complete taxonomy list (ATM around 2-3 minutes, which is not a deal-breaker but could get annoying if we sync the taxonomies frequently)
  2. the safe import mode is not a viable option, since it only adds completely new terms, but would omit updates in already existing terms -> so we have to use the full import mode
  3. and the full import mode deletes in a first step all taxonomy terms from the system that are not in the exported config file, which is why we have to always do a complete export, otherwise we would unintentionally delete most of our taxonomies

@fgeyer16 since you praised the Migrate module so much, I'd like to give it a try for the content syncronization that is still unresolved. Could you please provide a working configuration for just one of our content types, so that I can then more or less copy&paste that for the remaining content types? You seem to have the most experience with that module, so this would probably be the most efficient approach.

@fgeyer16
Copy link
Contributor

fgeyer16 commented Mar 4, 2020

@patrickkaleta a working migration for content types has to be done first, but I have already a working migration for hazard taxonomy which was used to migrate hazards from csis to marketplace: import hazards.txt

For content (nodes) we will have to create migrations for taxonomies used by node first. So we will need taxonomy migrations anyway. Please notice that this yaml as it is only will update terms imported by this migration and duplicate existing taxonomy terms which were created not by this migration.

Assuming that in future dev and prod of csisi originates from the same systems uuids should be the same so we can update by looking up uuids of existing terms: import hazards lookup.txt
This is not tested yet but should work.

I will provide a full content type import in the next days.

@patrickkaleta
Copy link

@patrickkaleta a working migration for content types has to be done first, but I have already a working migration for hazard taxonomy which was used to migrate hazards from csis to marketplace: import hazards.txt

For content (nodes) we will have to create migrations for taxonomies used by node first. So we will need taxonomy migrations anyway. Please notice that this yaml as it is only will update terms imported by this migration and duplicate existing taxonomy terms which were created not by this migration.

Assuming that in future dev and prod of csisi originates from the same systems uuids should be the same so we can update by looking up uuids of existing terms: import hazards lookup.txt
This is not tested yet but should work.

I will provide a full content type import in the next days.

I see, so if we want to use Migrate for our content types, we will have to use it for our taxonomies as well... In that case I guess, we will have the Structure Sync module as a backup solution

@fgeyer16
Copy link
Contributor

Here is an working example for updating existing and creating new hazards. This needs to be a chain of migrations since we cannot update the icons and terms in one migration properly.
So first you have to update the icons: import-icons.txt

Then we can update the hazards: import.hazards.txt

@patrickkaleta
Copy link

Here is an working example for updating existing and creating new hazards. This needs to be a chain of migrations since we cannot update the icons and terms in one migration properly.
So first you have to update the icons: import-icons.txt

Then we can update the hazards: import.hazards.txt

Ok thanks for the updated files. I will try them out, write a detailed "Best practise" issue for that and start working on the migration files for the necessary content types and taxonomies which we want to synchronize between our two systems.

@patrickkaleta
Copy link

So thanks to the latest example files provided by @fgeyer16 I was able to create the migration files for all of our taxonomies (in total 27, two taxonomies were omitted since they are either completely empty or marked as obsolete and therefore not relevant for us right now).

They all seem to work, but I can do a proper testing only after I create a couple of migrations for some content nodes as well, since they are referenced by some of the taxonomies (GL-steps for the Studytype taxonomy and DP_source and DP_license for land-usage taxonomy).

In addition to that, we then most urgently need the migration file for data packages (and migrations for all its referenced items). I'm hoping to get all of that (writing missing migrations, run proper testing, write a "Best practise" Issue) by the end of this week or early next week.

@patrickkaleta
Copy link

I finished the migration files for the GL-step, DP_source and DP_license content types. So, all of the taxonomies can now be synchronized.

Now working on the migrations for the Data packages (I completely forgot about all the references in the DP_resources, which of course also will require their own migration scripts), so I'm hoping to have that done by the end of today.

Proper testing of the content synchronization should be done by Wednesday.

@patrickkaleta
Copy link

patrickkaleta commented Apr 10, 2020

I believe it's done. "Best practice" issues for the synchronization are:

  1. Config sync HowTo
  2. Content sync HowTo

In short:

  • generally synchronize configuration before content
  • only published content is synchronized (so dummy content could be unpublished in Dev before synchronization to keep it out of Prod)
  • new content gets added, existing content gets updated (unless --update flag is omitted in Drush command) and content removed in Dev will be removed on Prod as well

Not short:

In total, I've created 51 migration files (22 for Nodes, 27 for Taxonomies, 1 for Files and 1 for Paragraphs). Running each of them individually in the correct order during a synchronization would be slow and error-prone, so I suggest to create a simple bash script will all the needed migration commands, which would be executed inside the Docker-container.

For testing, I completely removed all content in my local instance and ran the content synchronization. It looked good - data packages and resources seemed to be complete and I was able to create new Studies.

The thorough testing I had in mind first (viewing the JSON exports of the REST-Views from the live CSIS and my local CSIS side-by-side) turned out too complex. Certain fields (date created & updated, IDs and reference IDs, ...) won't be the same, so these JSON export won't ever be 100% identical and reading through 50 different exports with well over 1000 items would require too much time. From what I went through, I couldn't find anything that seemed wrong, so I'm confident it works. Maybe the CSIS Testing Team could look into that in more detail? Or probably better, we let them have a look at the system after such a synchronization and see what bugs they are able to find and keep a faulty migration as possible reason in mind.

@p-a-s-c-a-l
Copy link
Member

Maybe the CSIS Testing Team could look into that in more detail? Or probably better, we let them have a look at the system after such a synchronization and see what bugs they are able to find and keep a faulty migration as possible reason in mind.

I agree. We'll validate by Acceptance Tests. So what are the next steps now?

Try to resolve remaining showstopper issues and then

@patrickkaleta
Copy link

So what are the next steps now?

Try to resolve remaining showstopper issues and then

Yes, I believe those are the remaining necessary steps.

@p-a-s-c-a-l
Copy link
Member

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BB: Infrastructure Container Engine / Cloud Infrastructure Building Block SHOWSTOPPER Feature or bug, that, if not addressed, renders the CSIS essentially useless
Projects
None yet
Development

No branches or pull requests

5 participants