Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source includes triggers MapperParsingException/IllegalStateException if field is in an array #92480

Closed
TheRiffRafi opened this issue Dec 20, 2022 · 13 comments · Fixed by #92984
Assignees
Labels
>bug :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team

Comments

@TheRiffRafi
Copy link
Contributor

Elasticsearch Version

8.5.3

Installed Plugins

No response

Java Version

bundled

OS Version

n/a

Problem Description

In version 7.x of ES you can set an "includes" settings in the "_source" mapping definition for a field within a nested field.
If you are indexing a document that contains multiple nested fields where one of the fields does not contain a particular field, the document indexes without problems.

However, doing the same in version 8.5.3 fails with a "reason": "Unclosed object or array found".

Steps to Reproduce

Create a mapping like this:

PUT testnested
{
  "mappings": {
    "_source": {
      "includes": [
        "doc.field2","doc.field1.field3"
      ]
    },
    "properties": {
      "doc": {
        "properties": {
          "field1": {
            "type": "nested",
            "properties": {
              "field1n1": {
                "type": "nested",
                "properties": {
                  "name": {
                    "type": "text"
                  }
                }
              },
              "field3": {
                "type": "text"
              }
            }
          },
          "field2": {
            "type": "object"
          }
            
          }
        }
      }
    }
  }

Then index a document like this:

POST testnested/_doc
{
  "doc": {
    "field1": [
      {
        "field1n1": [
          {
            "name": "peter"
          },
          {
            "name": "parker"
          }
        ],
        "field3": "test"
      },
      {
        
        "field4": "testfield3"
      }
    ]
  }
}

Receive error:

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse",
    "caused_by": {
      "type": "illegal_state_exception",
      "reason": "Failed to close the XContentBuilder",
      "caused_by": {
        "type": "i_o_exception",
        "reason": "Unclosed object or array found"
      }
    }
  },
  "status": 400
}

Logs (if relevant)

No response

@TheRiffRafi TheRiffRafi added >bug needs:triage Requires assignment of a team area label labels Dec 20, 2022
@TheRiffRafi
Copy link
Contributor Author

For now the only work-around I found is to include a field that you expect to always be written int the nested documents,
and then the indexing won’t fail as there is no empty object or array to put in the document.

@DaveCTurner
Copy link
Contributor

I don't think this relates to nested fields, it's to do with the _source.includes bit:

PUT /testindex
{
  "mappings": {
    "_source": {
      "includes": [
        "array.field"
      ]
    }
  }
}

# 200 OK
# {
#   "shards_acknowledged": true,
#   "acknowledged": true,
#   "index": "testindex"
# }

POST /testindex/_doc
{
  "array": [
    {
      "field": "value"
    },
    {}
  ]
}

# 400 Bad Request
# {
#   "status": 400,
#   "error": {
#     "caused_by": {
#       "caused_by": {
#         "reason": "Unclosed object or array found",
#         "type": "i_o_exception"
#       },
#       "reason": "Failed to close the XContentBuilder",
#       "type": "illegal_state_exception"
#     },
#     "reason": "failed to parse",
#     "root_cause": [
#       {
#         "reason": "failed to parse",
#         "type": "mapper_parsing_exception"
#       }
#     ],
#     "type": "mapper_parsing_exception"
#   }
# }

Definitely a bug tho. The stack trace is as follows:

org.elasticsearch.index.mapper.MapperParsingException: failed to parse
	at org.elasticsearch.index.mapper.DocumentParser.wrapInMapperParsingException(DocumentParser.java:233)
	at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:84)
	at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:78)
	at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:999)
	at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:948)
	at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:892)
	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:321)
	at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:187)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:253)
	at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:133)
	at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:72)
	at org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:211)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:892)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.lang.Thread.run(Thread.java:1589)
Caused by: java.lang.IllegalStateException: Failed to close the XContentBuilder
	at org.elasticsearch.xcontent.XContentBuilder.close(XContentBuilder.java:1237)
	at org.elasticsearch.common.bytes.BytesReference.bytes(BytesReference.java:36)
	at org.elasticsearch.common.xcontent.XContentFieldFilter.lambda$newFieldFilter$3(XContentFieldFilter.java:97)
	at org.elasticsearch.index.mapper.SourceFieldMapper.applyFilters(SourceFieldMapper.java:236)
	at org.elasticsearch.index.mapper.SourceFieldMapper.preParse(SourceFieldMapper.java:217)
	at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:122)
	at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:81)
	... 16 more
Caused by: java.io.IOException: Unclosed object or array found
	at org.elasticsearch.xcontent.provider.json.JsonXContentGenerator.close(JsonXContentGenerator.java:560)
	at org.elasticsearch.xcontent.XContentBuilder.close(XContentBuilder.java:1235)
	... 22 more

@DaveCTurner DaveCTurner added :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed needs:triage Requires assignment of a team area label labels Dec 21, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label Dec 21, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner DaveCTurner changed the title includes for source mapping causes failures when indexing document with missing field within nested field type Source includes triggers MapperParsingException/IllegalStateException if field is in an array Dec 21, 2022
@rachitakumar rachitakumar reopened this Dec 21, 2022
@TheRiffRafi
Copy link
Contributor Author

Hello @DaveCTurner! Thank you for looking into this.
We have a user who can't guarantee that a particular field will always be in the nested object.
I am thinking that perhaps we can use a pipeline to check for one of the "include" fields and add it with a an empty value if missing. That will add extra overhead to ingest, can you think of any other possible and more elegant work-around ?

@DaveCTurner
Copy link
Contributor

I think this is a bug in the XContent framework, possibly even Jackson, so I'm handing this over to core/infra. The following test shows the low-level problem:

diff --git a/libs/x-content/src/test/java/org/elasticsearch/xcontent/support/filtering/AbstractXContentFilteringTestCase.java b/libs/x-content/src/test/java/org/elasticsearch/xcontent/support/filtering/AbstractXContentFilteringTestCase.java
index 0cc95f54693..445890c25bf 100644
--- a/libs/x-content/src/test/java/org/elasticsearch/xcontent/support/filtering/AbstractXContentFilteringTestCase.java
+++ b/libs/x-content/src/test/java/org/elasticsearch/xcontent/support/filtering/AbstractXContentFilteringTestCase.java
@@ -305,6 +305,24 @@ public abstract class AbstractXContentFilteringTestCase extends AbstractFilterin
         );
     }

+    public void testArrayWithEmptyObjectInInclude() throws IOException {
+        testFilter(
+            builder -> builder.startObject().startArray("foo").startObject().field("bar", "baz").endObject().endArray().endObject(),
+            builder -> builder.startObject()
+                .startArray("foo")
+                .startObject()
+                .field("bar", "baz")
+                .endObject()
+                .startObject()
+                .endObject()
+                .endArray()
+                .endObject(),
+            singleton("foo.bar"),
+            emptySet(),
+            true
+        );
+    }
+
     @AwaitsFix(bugUrl = "https://github.com/elastic/elasticsearch/pull/80160")
     public void testDotsAndDoubleWildcardInExcludedFieldName() throws IOException {
         testFilter(

One possible workaround is to include the whole array rather than just a specific field. Or even just stop using source includes and preserve the whole source. {"mappings":{"_source":{"includes":["array.field"]}}} doesn't work, but {"mappings":{"_source":{"includes":["array"]}}} does.

@DaveCTurner DaveCTurner added :Core/Infra/Core Core issues without another label and removed :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. Team:Distributed Meta label for distributed team labels Dec 24, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Dec 24, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@tvernum
Copy link
Contributor

tvernum commented Dec 29, 2022

This looks like a bug in Jackson, @pgomulka & I are working on a fix.

@pgomulka
Copy link
Contributor

together with @tvernum we raised an issue and PR in jackson repo FasterXML/jackson-core#882 and FasterXML/jackson-core#883
I doubt this can be released in time for 8.6
I recon we could repackage jackson with a fix and use it inside x-content

@gui-elastic
Copy link
Member

Thank you so much for all your efforts.

Unfortunately, not using the _source and the includes would impact more disk usage, and other provided workaround would require a high effort for users, which can not be ideal for most use cases. Is there any other possible workaround, besides the ones already provided, for this situation until the new release is launched?

@pgomulka
Copy link
Contributor

to avoid the problem the document that is being parsed must have a 'field/object' present in the last place in the array.
that means for this filter

 "includes": [
        "doc.field2","doc.field1.field3"
      ]

it would work if the ordering of fields was different

  POST testnested/_doc
{
  "doc": {
    "field1": [
      {
        
        "field4": "testfield3"
      },
      {
        "field1n1": [
          {
            "name": "peter"
          },
          {
            "name": "parker"
          }
        ],
        "field3": "test"
      }
    ]
  }
}

so some kind of preprocessing with sorting involved might solve this?
or just add a "fake" field from a filter into a document's array. Stressing the array part, as this is where the problem occurs.

so something like..

 "includes": [
        "doc.field2","doc.field1.field3","doc.fake"
      ]

and

POST testnested2/_doc
{
  "doc": {
    "field1": [
      
      {
        "field1n1": [
          {
            "name": "peter"
          },
          {
            "name": "parker"
          }
        ],
        "field3": "test"
      },
      {
        
        "field4": "testfield3"
      },
      {"fake": "fake"}
    ]
  }
}

I am not sure if that would help, but can't thing of any other workaround

@ronen-bar
Copy link

ronen-bar commented Jan 4, 2023

We are using CBES (Couchbase to Elastic Connector) to stream documents from a Couchbase bucket to Elasticsearch.
We need the bug fix in Elasticsearch for this purpose, since we cannot modify the production data in the Couchbase bucket.
We have already found simpler work-around that works for our use case, but we are still looking for the bug fix in Elasticsearch without modifying the production data in Couchbase.

@pgomulka
Copy link
Contributor

pgomulka commented Jan 9, 2023

Just leaving a note to make sure we update the known-issue sections in our docs.
In es 6.8-7.6 we used jackson 2.8.11 - and it does not have a bug
versions affected by a bug:
7.7-7.16 with jackson 2.10.4
7.17 with jackson 2.13.4 also has it
8.x with jackson 2.14.0

pgomulka added a commit that referenced this issue Jan 18, 2023
while jackson 2.14.2 with FasterXML/jackson-core#882 is still not released
we want to patch the jackson-core used by x-content with the modified class that fixes the bug #92480

closes #92480
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Jan 18, 2023
while jackson 2.14.2 with FasterXML/jackson-core#882 is still not released
we want to patch the jackson-core used by x-content with the modified class that fixes the bug elastic#92480

closes elastic#92480
@pgomulka
Copy link
Contributor

pgomulka commented Jan 27, 2023

fyi the fix was included in 8.6.1 release and is available for download now

pgomulka added a commit that referenced this issue Feb 22, 2023
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Feb 23, 2023
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Feb 23, 2023
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Feb 23, 2023
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Feb 23, 2023
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Feb 23, 2023
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Feb 23, 2023
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Feb 23, 2023
pgomulka added a commit that referenced this issue Feb 23, 2023
zuiyu-main pushed a commit to zuiyu-main/elasticsearch that referenced this issue Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants