All notable changes to this project will be documented in this file. Changes are grouped as follows:
Added
for new features.Changed
for changes in existing functionality.Deprecated
for soon to be removed functionality.Removed
for removed functionality.Fixed
for any bugfixes.Security
in case of vulnerabilities.
- CDF as a SQL source. Beam SQL cli support.
- Publish job metrics to CDF time series.
- OOTB incremental read support for time series.
- MAX_WRITE_BATCH_SIZE set to 900 instead of 4000, to avoid overloading Cognite API(429 Errors).
- Beam SDK 2.44.0
- Assets delete was not recursive. Fixed.
- Support for
RawStateStore
for reading and setting state. This will be the new pattern for handling state dependent actions such as delta loading. - Java SDK 1.18.0
- Beam SDK 2.42.0
ProjectConfig
via files. We recommend using a startup "driver" / container for the initial configuration of the job, including setting theProjectConfig
as opposed to having the Beam Job read project credentials directly from file. An example: https://github.com/cognitedata/cdf-sdk-java-examples/tree/main/10-beam-pipeline
- Check for empty strings in auth/
ProjectConfig
when instantiation theCogniteClient
.
- Factory methods
ProjectConfig.ofClientCredentials()
to align with style conventions in the Java SDK.
- Add checks for blank/empty strings in
ProjectConfig
. A blank string will be treated as if it is not present / null.
- Java SDK 1.17.0
- Beam SDK 2.41.0
- Configurable timeout of the async API endpoints (contextualization endpoints like
entity matching
andengineering diagrams
). You configure the timeout viaHints.withAsyncApiJobTimeout(Duration timeout)
. The default timeout is 20 minutes.
- Beam SDK 2.40.0
- Writers' automatic reporting of
extraction pipeline run
will also generate arun
when zero elements are written to CDF.
- Java SDK v1.16.0
- Geo-location attribute on the
files
resource type is supported. Sequences
upsert support including modified column schema. Theupsert
functionality includes both modifiedsequences headers
/SequenceMetadata
andsequences rows
/SequenceBody
. For more information, please refer to the documentation: https://github.com/cognitedata/cdf-sdk-java/blob/main/docs/sequence.md#update-sequences and https://github.com/cognitedata/cdf-sdk-java/blob/main/docs/sequence.md#insert-rows
- Geo-location attribute on the
- Beam v2.39.0
- Support for
extraction pipelines
Read direct
mode forFilesMetadata
andFiles
. This read mode will output batches of objects (List<FileMetadata/FileContainer>
) instead of single rows. This can be useful in very high data volume scenarios.Write direct
mode forFilesMetadata
andFiles
. This allows for bypassing the shuffle and batch stage of the writer when you have pre-batched objects. This can offer improved performance when writing very large volumes of data.- Support custom auth scopes when using Open ID Connect authentication.
- Java SDK v1.13.0
- fixed file binary download expired URL
- Added support for S3 as temp storage for file binary
- Beam 2.37.0
- API metrics are disabled by default. They can be enabled via
ReaderConfig.enableMetrics(true)
/WriterConfig.enableMetrics(true)
.
- Added convenience methods to the
RequestParameters
object for easier handling of items (byexternalId
orid
). You can useRequestParameters.withItemExternalIds(String... externalId)
andRequestParameters.withItemInternalIds(Long... externalId)
to add multiple items to the request. - Java SDK v1.11.0
- Added
3D Models Revisions
- Added
3D File Download
- Added
3D Asset Mapping
EngineeringDiagrams
promoted from experimental to stable. It has the same signature and behavior as before and is located under thecontextualization
family:CogniteClient.contextualization().engineeringDiagrams()
.- Add utility class
com.cognite.client.util.RawRows
for working withRawRow
object. Please refer to the documentation for more information.
- Added
- Beam SDK 2.36.0
- The single item methods
RequestParameters.withItemExternalId(String externalId)
andRequestParameters.withItemInternalId(Long externalId)
have been deprecated in favour of the new multi-item versions.
- Improve logging of failed API requests.
- Java SDK v1.9.0
- Added
3D Models
- Increased read and write timeouts to match sever-side values.
- Upsert of
sequenceMetadata
not identifying duplicate entries correctly.
- Added
- Write direct can accept all collections implementing
Iterable
.
- Beam SDK 2.35.0
- Java SDK v1.8.0:
- File binary upload use PUT instead of POST
- Further improvements in the file binary upload robustness.
- Fix.
SequenceMetadata
upsert now respects the max cell count batch limit.
Read direct
could emit empty batches in certain circumstances.- Timeouts when using Google Cloud Storage as temp storage for file binaries.
Read direct
mode forRawRow
andEvent
. This read mode will output batches of rows (List<RawRow/Event>
) instead of single rows. This can be useful in very high data volume scenarios.Write direct
mode forRawRow
andEvent
. This allows for bypassing the shuffle and batch stage of the writer when you have pre-batched objects. This can offer improved performance when writing very large volumes of data.- Support for
dataSetId
in theLabels
resource type (Java SDK v1.5.0).
- File binary upload robustness (Java SDK v1.5.0).
Labels
using api v1 endpoint (Java SDK v1.5.0).
- Refactored the asset synchronization to use the Java SDK implementation.
- Java SDK v 1.4.0. Brings lots of useful features, like:
- Improved performance reading
relationships
andsequences
. - Improved stability when working with large-volume file binaries.
- Added support for including source and target objects when reading
sequences
.
- Improved performance reading
- Added support for grayscale image in engineering diagram SVGs.
CreateInteractiveDiagram.withTargetView()
has been renamed toCreateInteractiveDiagram.withEntitiesView()
to align with API naming.
Files.readAll()
would fail if a file object only consists of a header (and not a binary).
- Support for read
first N
for the readers working towards thelist
endpoints in the API (assets
,events
,TS headers
,sequences headers
,file headers
,raw rows
and more). This allows you to quickly sample a limited sub-set from a large results set.
- Support for Beam v2.33.0
- Bumped the Java SDK to v1.3.0
- Support for Beam 2.32.0
- The experimental feature
interactive P&IDs
has been refactored intointeractive engineering diagrams
in order to be aligned with the new CDF API endpoint. The new transform is available atCogniteIO.experimental().createInteractiveDiagram()
We skip a few versions due to updating our build pipeline.
- Support for Beam 2.31.0
- Configurable batch size for file binary download and upload
- Breaking change: Remove the use of wrapper objects from the data transfer objects (
Asset
,Event
, etc.). Please refer to the documentation for more information.
- Repeated annotations when generating interactive P&IDs.
- TLS / SSL handshake errors when running on Dataflow and connecting to Google APIs. This was due to Dataflow disabling a set of crypto algorithms on its workers (because of poor Java 8 implementations). Now that we are on Java 11, we override the Dataflow config and allow all Java 11 algorithms.
- Support for Java 11.
- Support for Beam 2.29.0.
- CDF Java SDK 0.9.4.
- Support for authenticating towards Cognite Data Fusion using OpenID Connect.
- Null pointer when writing large file binaries. Under certain circumstances writing large file binaries to CDF could result in an error (null pointer). This could happen if the network connection to CDF was interrupted while transferring the file binary.
- Duplicates when reading
file header
.
-
Refactored the core I/O engine into a separate Java SDK. This should also give a general performance improvement of about 2x.
-
Support for Beam 2.28.0
-
Refactored
RequestParameters
fromcom.cognite.beam.servicesV1.RequestParameters
tocom.cognite.beam.RequestParameters
. All other signatures are the same as before, so you may run a search & replace to update your client. -
Refactored data transfer objects from
com.cognite.beam.io.dto
tocom.cognite.client.dto
. All other signatures are the same as before, so you may run a search & replace to update your client. -
Refactored
com.cognite.beam.io.servicesV1
tocom.cognite.client.servicesV1
. All other signatures are the same as before, so you may run a search & replace to update your client. -
Refactored
com.cognite.beam.io.config.UpsertMode
tocom.cognite.client.config.UpsertMode
. All other signatures are the same as before, so you may run a search & replace to update your client. -
EntityMatch.matchTo
renamed toEntityMatch.target
to align with the entity matching api v1. -
Experimental: The interactive P&ID service has been updated to use the new entities specification.
-
Fixed a bug causing the number of write shards to be double of the configured value.
-
Fixed missing duplicate detection when upserting sequences rows.
-
Fixed a bug when reading
Relationship
whereTargetExternalId
would always be set tonull
. -
Fixed a null pointer exception when using GCS temp storage for files--in some cases GCS is unable to report a correct content size.
- Update replace sequences with empty metadata.
- Support for
read by id
forassets
,events
,time series
,sequences
andfiles
. - Support for Beam 2.26.0
- Renamed
CogniteIO.readFilesBinariesById
toCogniteIO.readAllFilesBinariesById
in order to be consistent with theread
vs.readAll
pattern.
- Null values in sequences. Sequences now support null values.
- Support for reading and writing
sequences
. - Support read
aggregates
forassets
,events
,time series
,files
andsequences
. - Support Beam 2.25.0
- Support for (very) large file binaries via temp blob storage.
- Google Cloud Storage and local file storage are supported as temp storage.
- New endpoints for entity matcher train and predict.
- New naming for configuration of entity mather and P&ID transforms.
- Relationships in beta.
- Entity matcher runs towards the new entity matching endpoints in playground.
- Improved logging when building interactive P&IDs and when writing files.
- Labels support for files.
File.directory
as new attribute onFile
- Create interactive P&ID runs towards the new P&ID api endpoints.
- The create interactive P&ID transform has been refactored with new input type and config options.
- Entity matcher api error messages propagation.
- File writer handles files with >1k asset links.
- File writer handles files with empty file binary.
- New
readDirect
andwriteDirect
modes for time series points. These modes bypass some of the built-in data validation and optimization steps--this allows for higher performance under some circumstances (for example if the input data is pre-sorted).
- The entity matcher now runs towards the new
fit
andpredict
endpoints.
- Time series writer metrics.
- Support for entity matching.
- Read and write metrics.
- Generate interactive P&IDs (experimental)
- Experimental support for streaming reads from Raw, Assets, Events and Files.
- Support for security categories on files.
- Support for Labels
- Reading from Raw in combination with GCP Secret Manager.
- Not fetching secret from GCP Secret Manager.
- Add support for
Relationships
.
GcpSecretManager
instantiation viaof(projectId, secretId)
instead ofcreate()
. To highlight the needed input properties.
- Experimental: Streaming support for reading time series datapoints.
- Experimental: Optimized code path for reading time series datapoints.
- Support for GCP secret manager.
- Add datasets as a resource type.
Asset
andAssetLookup
has added support for aggregated properties (childCount
,depth
andpath
).- Add support for the aggregates endpoint for
Asset
,Event
,Timeseries
andFile
.
legacyName
is no longer included when creating time series.- The
BuildAssetLookup
transform will include aggregated properties in the outputAssetLookup
objects. - Support for Beam 2.20.0
- Data set id included as a comparison attribute for the synchronize assets function.
- Upsert items is more robust against race conditions (on large scale upserts) and legacy name collisions.
- Support for data sets for assets, events and time series.
- Concurrency error when writing files with more than 2k assetIds.
- Support upsert of files with more than 1k assetIds.
- Utility transform for building read raw table requests based on a config file.
- New request execution framework towards the Cognite API. This enables "fan-out" per worker (multiple parallell async IOs).
- Support for file binaries.
- Support for multi TS read requests. This should speed up wide and shallow queries by 5 - 10x.
- Support for data set for files.
- Default limit for TS points aggregates requests are now set to 10k.
- Updated the timeseries point reader to the new proto payload from the Cognite API.
- Read rows from Raw now works in combination with a column specification.
- Add delta read support to TS headers.
- RAW.
minLastUpdatedTime
andmaxLastUpdatedTime
did not work properly in combination with range cursors.
- Files. Support updating
mimeType
. - CSV reader: Strip out empty lines (lines with no alphabetic characters).
- TS points. Can now resolve duplicates based on
legacyName
.
- CSV reader: Added support for BOM (byte order mark), quoting and custom header.
- Support for reading csv / delimited text files.
- Support for the new TS header advanced filter and partition endpoint.
- Optimized writes for low frequency TS datapoints.
- Api metrics. Fixed batch size mertric when writing timeseries data points.
- Changed upper limit on write batch latency.
- Metric for read time series datapoints batch size.
- Utility transforms for parsing TOML config files to map and array.
isStep
is added toTimeseriesPoint
.
- Update of time series headers includes
name
.
- Delta configuration via parameters and templates.
ReadRawRow
configured withdbName
and/ortableName
via parameters and templates.
- Metrics can be enabled/disabled via
readerConfig
andwriterConfig
. The default is enabled.
- Delta read support for raw, assets, events and file metadata/header.
- Metrics. Latency and batch size metrics for Cognite API calls.
- API v1 range partitions. Support for API v1 range partitions for assets and events.
- Move to Beam 2.16.0
- Read single row from raw by row key.
- Asset synchronizer: Performance optimization, better delta detection between CDF and source.
- File metadata/header writer. Fixed issue where creating new file metadata could fail.
- Asset synchronizer: Fix data integrity check of empty parentExternalId.
- Asset synchronizer: Fixed issue with parsing parentExternalId.
- Asset synchronizer: Fixed issue with data validation not handling multiple hierarchies.
- Fixed issue with asset input validation where a root node was not identified when the parentExternalId was null (as opposed to an empty string).
- Additional input validation for asset hierarchy synchronization.
- Max write batch size hint for Raw.
- Utility methods for parsing Raw column values into types.
- Upsert and delete file headers/metadata.
- Read TOML files.
- Write and update assets.
- Asset hierarchy synchonization. Continously mirror source, including insert, updates and deletes (for both node metadata and hierarchy structure).
- Support both "partial update" and "replace" as upsert modes.
- From
xxx.builder().build()
toxx.create()
for user facing config objects and transforms (likeRequestParameters
,ProjectConfig
, etc.).
- Fixed issue when writing TS data points and generating a default TS header. In the case of multiple parallel writes there could be collisions.
- Fixed issue when writing TS headers where duplicate
legacyName
exists.
- Fixed file list / reader error.
- Support auto-create time series headers/metadata for time series where
isString=true
. - Support for reading file metadata.
- Support for deleting raw rows.
ReadRows
, the memberlastUpdatedTime
made optional.
- Fixed (another) null pointer exception when using project configuration parameters on the Dataflow runner.
- Fixed null pointer exception when using project configuration files.
- Conveniece methods added to
RequestParameters
for setting single itemexternalId
/id
. - Using protobuf for writing and reading time series data points.
- Support for providing
ProjectConfig
via file.
- New group and artifact ids.
- Beam version: from 2.13.0 to 2.14.0
isString
is removed fromTimeseriesPoint
. This field is redundant as the same information can be obtained fromgetValueOneofCase()
.oneOf
cases inTimeseriesPoint
,TimeseriesPointPost
andItem
has been renamed for increased clarity.
- Fix protobuf version conflict.
- Add support for writing and updating time series data points.
- Add support for writing time series meta data.
- Add support for delete time series.
- Add support for delete assets.
- Fix write insert error for objects with existing "id".
- Increase connection and read timeouts towards the Cognite API to 90 seconds. This is to accomodate reading potentially large response payloads.
- Optimize handling of cursors.
- Fix TS header reader (request provider error).
- Fix support for templates.
ValueProvider<T>
is now lazily evaluated.
- Add rootId to assets.
- Add writeShards and maxWriteBatchLatency to Hints.
- Add support for raw cursors on read.
- Add support for listing raw databases and tables.
- Add shortcut methods for raw.dbName and raw.tableName to
RequestParameters
. - Add support for specifying raw.dbName and raw.tableName via
ValueProvider<String>
.
- Renamed "splits" to "shards" (Hints object)
- Remove depth and path from assets.
- Add batch identifier to writer logs.
- Improve the Raw writer and reader.
- Refactored the connector configuration.
- Supports API v1.
- Added support for new readers and writers:
- Readers: Assets, events, TS header and TS datapoints, Raw.
- Writers: Events and Raw.
- Writers perform automatic upsert.
- Readers support push-down of all filters offered by the API.
- Added support for reading time series headers and datapoints.
- New configuration pattern.
- Added support for reading events.
- Configuration of the reader is captured in a configuration object.