Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid memcpy in GRPC write path #810

Open
wants to merge 1,016 commits into
base: branch-2.2.x
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1016 commits
Select commit Hold shift + click to select a range
7db83cb
Bump junit from 4.13 to 4.13.1 (#469)
dependabot[bot] Oct 12, 2020
b7a96b2
Fetch parent info in delete method asynchronously (#465)
medb Oct 12, 2020
ef91e6e
Always infer implicit directories (#467)
medb Oct 13, 2020
e0ae512
Revert "Add interceptor to inject remote address to non-ok status dis…
hongyegong Oct 13, 2020
0f9f5e4
Migrate glob algorithm configuration to enum (#470)
medb Oct 14, 2020
99225ab
Cleanup and update dependencies (#472)
medb Oct 14, 2020
cd031b0
Improve test coverage of GoogleCloudStorage.list* methods (#466)
medb Oct 14, 2020
90e0b50
Do not create parent directories for new objects (#468)
medb Oct 14, 2020
0d5d333
Remove Slf4j Flogger backend. (#473)
medb Oct 15, 2020
bd6bec7
Use shared executor instance for patallel file info requests (#474)
medb Oct 15, 2020
c6e7bd8
Do not create parent directory objects and bucket for new directories…
medb Oct 20, 2020
9f87977
Refactor to use utility method getFromfuture (#476)
medb Oct 20, 2020
e605420
Revert fs.gs.overwrite.generation.mismatch.ignore feature (#477)
medb Oct 21, 2020
18f4ee6
Update CONFIGURATION.md
medb Oct 26, 2020
84f30b4
refactor methods handing out gRPC channels syncly (#479)
hongyegong Oct 27, 2020
cc6fafc
Made input stream chunk size fixed to reduce copies (#481)
hongyegong Oct 28, 2020
3bafeaa
Bumped the readFrom chunk size to be maximum size of gRPC message (#484)
hongyegong Nov 4, 2020
0538297
Fixed constructor of GCSImpl to respect gRPC flag (#486)
hongyegong Nov 5, 2020
89b6a38
Cancel the request context when no more data available (#487)
hongyegong Nov 5, 2020
c2f8ad6
Add HCFS benchmark tool (#489)
medb Nov 6, 2020
e44af2c
Update README.md
hongyegong Nov 10, 2020
7870126
Added basic integration tests for gRPC (#490)
hongyegong Nov 19, 2020
1f57bb5
retry getObjectRequest for grpc read (#491)
hongyegong Nov 19, 2020
159cf3f
Remove redundant constructors (#492)
medb Nov 20, 2020
098d40d
Remove unused listFileNames methods (#493)
medb Nov 23, 2020
2572af7
Use factory methods to create GoogleCloudStorageItemInfo instances (#…
medb Nov 23, 2020
2528f5e
Use projection to request only object fields that are used in get and…
medb Nov 23, 2020
0694533
Cleanup and delete unused code (#496)
medb Nov 23, 2020
37a275e
List +1 object only when prefix object is not included in the list re…
medb Nov 24, 2020
15ef184
Fix flaky tests (#499)
medb Nov 25, 2020
440d176
Delete obsolete LocalFileSystemIntegrationTest (#498)
medb Nov 25, 2020
982d2e8
Fix and reenable disabled test cases (#500)
medb Nov 26, 2020
acf13f8
List only object fields that are used in Hadoop FileStatus (#502)
medb Dec 1, 2020
0e0037e
Minor cleanup (#503)
medb Dec 2, 2020
ff0923d
Remove obsolete property (#504)
medb Dec 2, 2020
83b27b9
Remove obsolete counters - we should use new StorageStatistics API in…
medb Dec 2, 2020
f6a79d5
Enable reading from public buckets (without credentials) (#501)
tooptoop4 Dec 2, 2020
41e3ed4
Improve performance cache hit ratio and add integration tests (#508)
medb Dec 4, 2020
07d2521
Remove dependency on 3rd party retry library (#510)
medb Dec 7, 2020
c946b0b
Remove client authentication support (#511)
medb Dec 8, 2020
4c3e1a7
Clarify fs.gs.inputstream.fast.fail.on.not.found.enable property HCFS…
medb Dec 8, 2020
3cb96a6
Add fs.gs.create.items.conflict.check.enable property (#509)
medb Dec 9, 2020
3616c5c
Switch default JSON library to Gson (#512)
medb Dec 16, 2020
dd9a2ad
Upgrade protobuf to 3.14.0 (#514)
itay Dec 16, 2020
0458fd2
Remove redundant properties (#516)
medb Dec 18, 2020
345276c
Cleanup imports (#517)
medb Dec 18, 2020
f5aae04
Add Hadoop Service lifecycle support for delegation token binding int…
pzampino Dec 18, 2020
0a44788
Do not initialize perf cache with list request during get call (#522)
medb Dec 21, 2020
9b05432
Update all dependencies to latest versions (#523)
medb Dec 21, 2020
3842b9e
Migrate Apache HTTP transport configuration to v2 API (#520)
medb Dec 21, 2020
18f462b
Fix pref cache initialization with list request (#519)
medb Dec 21, 2020
ea2cad5
Parallelize GCS requests in listStatus method (#521)
medb Dec 21, 2020
9e65066
Use default instance of JacksonFactory (#527)
medb Dec 21, 2020
573db28
Do not attempt to initialize delegation token support if it was not c…
medb Dec 21, 2020
55a4713
Early exit from impersonation auth configuration (#525)
medb Dec 21, 2020
b9e0f06
Remove redundant fs.gs.config.override.file property (#524)
medb Dec 21, 2020
1128d6e
Change default value of 'fs.gs.inputstream.min.range.request.size' pr…
medb Dec 22, 2020
9c42255
Add more directory delete and rename integration tests (#529)
medb Dec 22, 2020
545d377
Remove redundant GSC requests in rename and delete operations (#530)
medb Dec 22, 2020
7050038
Fix logging level for trace logs (#531)
medb Dec 23, 2020
e580ae1
Update CHANGES.md
medb Dec 26, 2020
73a6e5e
Add gcsio version as usre agent for gRPC (#515)
hongyegong Dec 29, 2020
775aea1
Release GCS connector 2.2.0 and BQ connector 1.2.0.
hongyegong Jan 6, 2021
db26eba
Update versions for next connectors release development
medb Jan 17, 2021
b274946
Support credential configuration for gRPC API (#534)
medb Jan 20, 2021
232ff0e
Update README.md
medb Feb 10, 2021
9448b2c
Update README.md
medb Feb 10, 2021
a4e8aa2
Update README.md
medb Feb 10, 2021
bd1f76e
Switch channel when encountering certain errors (#537)
hongyegong Mar 4, 2021
b97ff44
Remove redundant delete calls for inferred directories (#539)
mprashanthsagar Mar 4, 2021
0256444
Add precondition ifGenerationMatch=0 for create directory calls (#541)
mprashanthsagar Mar 11, 2021
83724ce
Make one more GoogleCloudStorageImpl constructor public (#542)
Fokko Mar 11, 2021
fface44
Bump api-client from 1.31.1 to 1.31.3 (#543)
Fokko Mar 12, 2021
b3b2e64
Fix ComputeCredential usage in gRPC (#545)
medb Mar 16, 2021
aa2be06
Improve exception message for Hadoop CLI (#548)
medb Mar 16, 2021
8cd761f
Update Hadoop 3 version to 3.2.2 (#549)
mprashanthsagar Mar 17, 2021
d700d3b
Re-factor gRPC stub initialization (#550)
medb Mar 18, 2021
474424a
Fix proxy configuration for Apache HTTP transport (#546)
medb Mar 18, 2021
b154b83
Add unit tests for retries of exceptions thrown during GCS requests e…
medb Mar 18, 2021
5529fba
Fix compute credentials identification for gRPC API (#553)
medb Mar 24, 2021
bc80e7a
Move gRPC call timeout values to configurations (#554)
mprashanthsagar Apr 15, 2021
4be3ab2
Add credentials support to StorageStubProvider (#552)
veblush Apr 22, 2021
5da8047
Add Integration tests for GCS gRPC APIs (#556)
mprashanthsagar Apr 29, 2021
e9bc107
Update gRPC version (#559)
mprashanthsagar May 3, 2021
38f9262
Release GCS connector 2.2.1 (#560)
mprashanthsagar May 17, 2021
72bca2f
Fix storage stub auth for cloudpath (#563)
mprashanthsagar May 18, 2021
2b1a4ec
Update versions for next release (#566)
mprashanthsagar May 28, 2021
80a7148
Add support for footer prefetch in gRPC read channel (#567)
mprashanthsagar Jun 7, 2021
f361f9c
Switch to json client for metadata operation (#570)
mprashanthsagar Jun 9, 2021
41a018c
Added credentials to the ctor of GoogleCloudStorageImpl (#562)
veblush Jun 10, 2021
48e9f80
Fix integer overflow when computing bytesToRead (#572)
mprashanthsagar Jun 12, 2021
160449d
Address log level for ignored exception (#573)
mprashanthsagar Jun 14, 2021
b449918
Fix readLimit range requests in gRPC channel (#574)
mprashanthsagar Jun 15, 2021
118a8c7
Re-factor read api in gRPC channel (#575)
mprashanthsagar Jun 16, 2021
8d6e02a
Invalidate current request to gcs if read request beyond range offset…
mprashanthsagar Jun 23, 2021
12eeae0
Fix resumable upload from offset of last chunk (#577)
mprashanthsagar Jun 28, 2021
cb4500d
Release GCS connector 2.2.2 (#579)
mprashanthsagar Jun 29, 2021
2caedba
Update versions for next release (#580)
mprashanthsagar Jun 29, 2021
adbfe56
Update all dependencies to latest versions (#551)
medb Jun 29, 2021
4ef9f11
Clean up change notes (#581)
medb Jun 29, 2021
96158f9
Update INSTALL.md (#583)
aman-ebay Jul 26, 2021
778c621
Update CONFIGURATION.md
medb Jul 26, 2021
aff9e4a
Authentication service integration (#587)
maen-allaga Jul 27, 2021
06e84b7
Make AccessTokenProvider.AccessTokenType public
cyxxy Jul 27, 2021
dbb4114
Remove dots from the bucket names when running integration tests (#591)
majdyz Jul 27, 2021
64143a9
Unshade the "AccessTokenType" class (#592)
cyxxy Jul 27, 2021
8d7c32a
Add a zero-copy deserializer to gRPC Read (#564)
veblush Jul 28, 2021
e484f95
Restore compatibility with pre-2.8 Hadoop versions (#372)
medb May 7, 2020
aecda65
Evict gRPC channel from pool, for transient errors (#589)
mprashanthsagar Aug 2, 2021
d3da29c
Add spotless plugin to check google-java-format compliance (#594)
mprashanthsagar Aug 4, 2021
8bfb2e7
Format java files in gcsio against google-java-format spec (#595)
mprashanthsagar Aug 4, 2021
06b6222
Migrate gRPC channels to GCS v2 APIs (#590)
mprashanthsagar Aug 17, 2021
50d7c48
Short-circuiting in readObjectContentFromGCS(#606)
veblush Aug 24, 2021
bfb97c1
More robust zero-copy deserializer with an option. (#604)
veblush Aug 26, 2021
3de7c38
Change oath token to header & call order when intializing list reques…
aalexx-S Aug 16, 2021
d1f67e2
Format all java files, remove incremental format check (#620)
mprashanthsagar Sep 21, 2021
e09c570
Update javadoc for WikipediaRequestBytes class (#621)
mprashanthsagar Sep 21, 2021
19a8f00
Add precondition to ensure that rename operation does not override de…
mprashanthsagar Sep 30, 2021
a6f5ee3
Release GCS connector 2.2.3 (#627)
mprashanthsagar Oct 4, 2021
de9d2a8
Decrease log-level for hflush rate-limit warning log message. (#632)
ranu010101 Oct 11, 2021
c562540
Upgrade Google Auth library to support ExternalAccount (#633)
davidrabinowitz Oct 13, 2021
a19ac26
Support GCS fine grained action in 2.2.x (#634)
maen-allaga Oct 19, 2021
8c6ad42
Unshade AccessBoundary AutoValue class (#637) (#638)
maen-allaga Oct 19, 2021
b1e9027
Update versions for next release (#639)
mprashanthsagar Oct 21, 2021
da159ca
Prepare 2.2.4 release (#646)
maen-allaga Nov 5, 2021
08c6321
Update CHANGES.md
maen-allaga Nov 5, 2021
d263e88
Switch gRPC lb policy to round_robin (#643)
veblush Nov 15, 2021
a57ac98
Update dependencies to LTS versions (#644)
mprashanthsagar Nov 15, 2021
991f7e0
gson and gRPC version upgrade with bug fixes (#649)
suztomo Nov 15, 2021
ed4b71d
Prepare for next release (#650)
mprashanthsagar Nov 17, 2021
200cead
Add option to enable directpath for gRPC (#652)
mprashanthsagar Nov 18, 2021
7508fac
Update read time based on object size (#653)
mprashanthsagar Nov 29, 2021
1232154
Move gRPC read retries to application level (#651)
mprashanthsagar Nov 29, 2021
dca26df
Remove grpclb retry config (#654)
mprashanthsagar Nov 30, 2021
a603ec3
Fix 2.2.x build (#656)
mprashanthsagar Dec 6, 2021
872f929
Add trace logs for time spent over network calls for gRPC read channe…
mprashanthsagar Dec 8, 2021
b7da475
Update bucket_name_prefix for dataproc allowlist (#659)
mprashanthsagar Dec 9, 2021
721dbd9
Move gRPC endpoint override to GoogleCloudStorageOptions (#660)
mprashanthsagar Dec 14, 2021
c97be44
Added Traffic Director support (#657)
veblush Dec 15, 2021
273c570
Add message level timeouts for gRPC requests (#665)
mprashanthsagar Jan 5, 2022
e1049ad
Shutdown background threadpool on GCSImpl close (#676)
mprashanthsagar Jan 7, 2022
c366dea
Update protobuf version (#679)
mprashanthsagar Jan 11, 2022
d384b58
Mark watchdog executor service as daemon (#677)
mprashanthsagar Jan 18, 2022
da9c469
Configure Error Prone analyzer and fix build failures (#683)
medb Jan 19, 2022
5ce31fd
Revert "Update read time based on object size (#653)" (#691)
mprashanthsagar Jan 24, 2022
1737619
Remove initial delay for watchdog scheduler (#690)
mprashanthsagar Jan 24, 2022
8f5a875
Fix Formatting after re-base
Jan 24, 2022
f9124e5
Change the TD scheme to google-c2p-experimental (#684)
veblush Jan 24, 2022
7a5063e
Drain iterator for server-streaming grpc call cancellation (#692)
mprashanthsagar Jan 25, 2022
1ae8775
Upgrade gRPC to 1.43 (#685)
veblush Jan 27, 2022
c094a92
Enable TCP keep alive while using HttpTransport (#696)
Deependra-Patel Jan 28, 2022
24a1461
Log at finer for exception while draining iterator (#702)
mprashanthsagar Jan 31, 2022
c69499b
Release GCS connector 2.2.5 (#701)
Deependra-Patel Jan 31, 2022
588ff5b
upgrading gRPC to version 1.43.2 (#705)
davidrabinowitz Feb 3, 2022
b7ac3a1
Fix integration test which had really long bucket name (#707)
Deependra-Patel Feb 3, 2022
b06db97
Decrease bucket prefix to not exceed max bucket size allowed (#708) (…
Deependra-Patel Feb 3, 2022
98eff54
adding grpc-xds dependnecy (#712)
davidrabinowitz Feb 4, 2022
790017a
Upgrade grpc-google-cloud-storage-v2 to 2.2.2-alpha (#703)
veblush Feb 4, 2022
39af41a
[grpc] Refactor footer caching logic (#713)
mprashanthsagar Feb 8, 2022
3fbc6aa
Fix test from rebase (#723)
mprashanthsagar Feb 9, 2022
b7f8a0b
Update GCS version for next release development (#720)
Deependra-Patel Feb 9, 2022
d7a5377
Exclude application time from watchdog for ServerStreamingRPC (#731)
mprashanthsagar Feb 17, 2022
2ceb646
Removed the limit of default service account to use TD & Directpath (…
veblush Feb 18, 2022
8b46786
Do not rely on both requestContext and responseIterator to determine …
mprashanthsagar Feb 24, 2022
a1e93ca
Improve trace logging for gRPC channels (#739)
mprashanthsagar Feb 24, 2022
751d7e6
Fix short-circuit response on retries for StartResumableWrites (#740)
mprashanthsagar Feb 26, 2022
a812f59
Migrate tests from using Mockito mocks except for mockBatchHelper and…
aalexx-S Mar 3, 2022
d264f63
Log metadata and read data latencies
mayanks Mar 3, 2022
4255f7e
Fix short-circuit response on retries for StartResumableWrites (#740)…
mprashanthsagar Mar 7, 2022
b510e53
Merge branch 'branch-2.2.x' into branch-2.2.x
mprashanthsagar Mar 7, 2022
2c58bb3
converted atFine to atFinest level logging
mayanks Mar 8, 2022
1aa5cc5
removed conditional check to log requestId
mayanks Mar 8, 2022
6e4dea9
using stopwatch to measure request latencies
mayanks Mar 8, 2022
84758fd
added logs in metadata call exception
mayanks Mar 8, 2022
4edaab2
Update default gRPC message level timeout (#748)
mprashanthsagar Mar 9, 2022
edf9fee
Decouple get metadata with get data (#724) (#747)
aalexx-S Mar 10, 2022
97a718c
Add test with increased batch size (#753)
mprashanthsagar Mar 24, 2022
8567eeb
Handle startResumableWrite Failure to avoid blocked writes (#756)
mprashanthsagar Apr 11, 2022
ccfe71c
Fix short-circuit response on retries for QueryWriteStatus (#757)
mprashanthsagar Apr 11, 2022
53c2240
Do not retry when there is no data buffered (#758)
mprashanthsagar Apr 11, 2022
1664433
Add integ test with TD enabled (#754) (#759)
Deependra-Patel Apr 11, 2022
0c86046
Shade distruptor library (#761)
mayanks Apr 12, 2022
fe7ce6b
Enable TD by default (#764)
mprashanthsagar Apr 14, 2022
1d755fc
Optimise metadata info to ReadChannel (#755)
mprashanthsagar Apr 14, 2022
56ddc5f
Run all gRPC integration tests also with TD enabled (#762)
Deependra-Patel Apr 14, 2022
0e0c0d7
Preparing release 2.2.6 (#763)
davidrabinowitz Apr 14, 2022
ce93a3b
Fix Requester Pays AUTO mode (#742)
danking Mar 1, 2022
9a59dcd
shading new dependencies (#765)
davidrabinowitz Apr 14, 2022
cc1b485
Preparing the next 2.2.7 release version (#766)
davidrabinowitz Apr 18, 2022
18e1610
Add generated preprod client for GCS (#774)
mprashanthsagar May 6, 2022
9e87f63
Add env config for integ test to only run with TD enabled (#767)
Deependra-Patel Apr 19, 2022
e8cc18a
Make writes via GCS gRPC API more resilient (#778)
mayanks May 9, 2022
b5e5472
Fix: Prevent clobbering of SSL trustCertificates (#786)
Deependra-Patel May 17, 2022
f4d8ee5
Lazy footer prefetch (#788)
mayanks May 24, 2022
b8c97a9
Preparing for 2.2.7 release (#794)
Deependra-Patel Jun 2, 2022
bdf90cb
Prepare for next release (#798)
Deependra-Patel Jun 6, 2022
68ecc4c
FsBenchmark test to support write benchmark (#795)
mayanks Jun 3, 2022
250689e
Upgrade Google auth dependency to 1.7.0 (#803)
JerryLeiDing Jun 13, 2022
ceeb3b4
Upgrade google-oauth-client version (#806)
Deependra-Patel Jun 16, 2022
ac25ce3
Add support for tracing GCS API calls (#796)
arunkumarchacko Jun 17, 2022
00d24b1
Fixing seek back to same position with grpc channel (#808) (#812)
abmodi Jun 24, 2022
8002a92
Grpc read optimization to not prematurely close existing requests (#8…
abmodi Jun 24, 2022
513f5cd
fix write stall on ready by timing out
mayanks Jul 1, 2022
c55287b
Add bucket information as header (#815) (#818)
arunkumarchacko Jul 6, 2022
6b7fbb4
Always set the audience to the Google Cloud OAuth2 Token Server (#822)
KoopaKing Jul 16, 2022
ebe5477
Add tracing for GRPC API calls (#821) (#824)
arunkumarchacko Jul 20, 2022
8672fd5
Fix a flaky test (#837)
arunkumarchacko Jul 25, 2022
5252c6f
Update default gRPC message level timeout (#748)
mprashanthsagar Mar 9, 2022
59a35eb
Decouple get metadata with get data (#724) (#747)
aalexx-S Mar 10, 2022
576446b
Add test with increased batch size (#753)
mprashanthsagar Mar 24, 2022
b5dc1cb
Handle startResumableWrite Failure to avoid blocked writes (#756)
mprashanthsagar Apr 11, 2022
ace9ec6
Fix short-circuit response on retries for QueryWriteStatus (#757)
mprashanthsagar Apr 11, 2022
a9e7b64
Do not retry when there is no data buffered (#758)
mprashanthsagar Apr 11, 2022
8f4cc6c
Add integ test with TD enabled (#754) (#759)
Deependra-Patel Apr 11, 2022
ded1f14
Shade distruptor library (#761)
mayanks Apr 12, 2022
ca2968d
Enable TD by default (#764)
mprashanthsagar Apr 14, 2022
3f9fbeb
Optimise metadata info to ReadChannel (#755)
mprashanthsagar Apr 14, 2022
d449198
Run all gRPC integration tests also with TD enabled (#762)
Deependra-Patel Apr 14, 2022
0a6b553
Preparing release 2.2.6 (#763)
davidrabinowitz Apr 14, 2022
ab5c65a
Fix Requester Pays AUTO mode (#742)
danking Mar 1, 2022
8b6436a
shading new dependencies (#765)
davidrabinowitz Apr 14, 2022
5403100
Preparing the next 2.2.7 release version (#766)
davidrabinowitz Apr 18, 2022
06f8cf7
Add generated preprod client for GCS (#774)
mprashanthsagar May 6, 2022
0e383ce
Add env config for integ test to only run with TD enabled (#767)
Deependra-Patel Apr 19, 2022
5538991
Make writes via GCS gRPC API more resilient (#778)
mayanks May 9, 2022
ebbf6f7
Fix: Prevent clobbering of SSL trustCertificates (#786)
Deependra-Patel May 17, 2022
695ae00
Lazy footer prefetch (#788)
mayanks May 24, 2022
8717ffd
Preparing for 2.2.7 release (#794)
Deependra-Patel Jun 2, 2022
93dca13
Prepare for next release (#798)
Deependra-Patel Jun 6, 2022
77633fa
FsBenchmark test to support write benchmark (#795)
mayanks Jun 3, 2022
fb96e13
Upgrade Google auth dependency to 1.7.0 (#803)
JerryLeiDing Jun 13, 2022
e796e57
Upgrade google-oauth-client version (#806)
Deependra-Patel Jun 16, 2022
7383a9b
Add support for tracing GCS API calls (#796)
arunkumarchacko Jun 17, 2022
2c03849
Optimize write performance by avoiding a memcopy
mayanks May 20, 2022
d73cc08
fetch the committed write offset in an async manner before the buffer…
mayanks Jun 1, 2022
934b463
added comments and logs
mayanks Jun 1, 2022
6117a33
fixed spotless:clean issue
mayanks Jun 3, 2022
3f8ee59
fixed review comments
mayanks Jun 23, 2022
5971675
Fixing seek back to same position with grpc channel (#808) (#812)
abmodi Jun 24, 2022
08040ac
Grpc read optimization to not prematurely close existing requests (#8…
abmodi Jun 24, 2022
aa8fa93
handling interrupted exceptions and added comments
mayanks Jun 29, 2022
923d8dd
added explanation on how lastChunk is used
mayanks Jun 29, 2022
5433d35
remove dependency of GoogleCloudStorageGrpcWriteChannel on BaseAbstra…
mayanks Jun 30, 2022
376aac6
removed optimization to fetch committed offset asynchronously
mayanks Jun 30, 2022
b36962f
added a test case when exception is raised on second write after gcs …
mayanks Jun 30, 2022
25ee9f2
fix write stall on ready by timing out
mayanks Jul 1, 2022
971b845
fixed the comment and write channel initialization logic
mayanks Jul 4, 2022
126ba74
Add bucket information as header (#815) (#818)
arunkumarchacko Jul 6, 2022
579d743
Merge branch 'branch-2.2.x' of github.com:GoogleCloudDataproc/hadoop-…
mayanks Aug 3, 2022
4aaab2b
resolved merge issue with the head of the branch
mayanks Aug 3, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,6 @@ bin/

# MacOS folder files
.DS_Store

# Ignore Maven wrapper jar
.mvn/wrapper/maven-wrapper.jar
117 changes: 117 additions & 0 deletions .mvn/wrapper/MavenWrapperDownloader.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
/*
* Copyright 2007-present the original author or authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.net.*;
import java.io.*;
import java.nio.channels.*;
import java.util.Properties;

public class MavenWrapperDownloader {

private static final String WRAPPER_VERSION = "0.5.6";
/**
* Default URL to download the maven-wrapper.jar from, if no 'downloadUrl' is provided.
*/
private static final String DEFAULT_DOWNLOAD_URL = "https://repo.maven.apache.org/maven2/io/takari/maven-wrapper/"
+ WRAPPER_VERSION + "/maven-wrapper-" + WRAPPER_VERSION + ".jar";

/**
* Path to the maven-wrapper.properties file, which might contain a downloadUrl property to
* use instead of the default one.
*/
private static final String MAVEN_WRAPPER_PROPERTIES_PATH =
".mvn/wrapper/maven-wrapper.properties";

/**
* Path where the maven-wrapper.jar will be saved to.
*/
private static final String MAVEN_WRAPPER_JAR_PATH =
".mvn/wrapper/maven-wrapper.jar";

/**
* Name of the property which should be used to override the default download url for the wrapper.
*/
private static final String PROPERTY_NAME_WRAPPER_URL = "wrapperUrl";

public static void main(String args[]) {
System.out.println("- Downloader started");
File baseDirectory = new File(args[0]);
System.out.println("- Using base directory: " + baseDirectory.getAbsolutePath());

// If the maven-wrapper.properties exists, read it and check if it contains a custom
// wrapperUrl parameter.
File mavenWrapperPropertyFile = new File(baseDirectory, MAVEN_WRAPPER_PROPERTIES_PATH);
String url = DEFAULT_DOWNLOAD_URL;
if(mavenWrapperPropertyFile.exists()) {
FileInputStream mavenWrapperPropertyFileInputStream = null;
try {
mavenWrapperPropertyFileInputStream = new FileInputStream(mavenWrapperPropertyFile);
Properties mavenWrapperProperties = new Properties();
mavenWrapperProperties.load(mavenWrapperPropertyFileInputStream);
url = mavenWrapperProperties.getProperty(PROPERTY_NAME_WRAPPER_URL, url);
} catch (IOException e) {
System.out.println("- ERROR loading '" + MAVEN_WRAPPER_PROPERTIES_PATH + "'");
} finally {
try {
if(mavenWrapperPropertyFileInputStream != null) {
mavenWrapperPropertyFileInputStream.close();
}
} catch (IOException e) {
// Ignore ...
}
}
}
System.out.println("- Downloading from: " + url);

File outputFile = new File(baseDirectory.getAbsolutePath(), MAVEN_WRAPPER_JAR_PATH);
if(!outputFile.getParentFile().exists()) {
if(!outputFile.getParentFile().mkdirs()) {
System.out.println(
"- ERROR creating output directory '" + outputFile.getParentFile().getAbsolutePath() + "'");
}
}
System.out.println("- Downloading to: " + outputFile.getAbsolutePath());
try {
downloadFileFromURL(url, outputFile);
System.out.println("Done");
System.exit(0);
} catch (Throwable e) {
System.out.println("- Error downloading");
e.printStackTrace();
System.exit(1);
}
}

private static void downloadFileFromURL(String urlString, File destination) throws Exception {
if (System.getenv("MVNW_USERNAME") != null && System.getenv("MVNW_PASSWORD") != null) {
String username = System.getenv("MVNW_USERNAME");
char[] password = System.getenv("MVNW_PASSWORD").toCharArray();
Authenticator.setDefault(new Authenticator() {
@Override
protected PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication(username, password);
}
});
}
URL website = new URL(urlString);
ReadableByteChannel rbc;
rbc = Channels.newChannel(website.openStream());
FileOutputStream fos = new FileOutputStream(destination);
fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);
fos.close();
rbc.close();
}

}
2 changes: 2 additions & 0 deletions .mvn/wrapper/maven-wrapper.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
distributionUrl=https://repo.maven.apache.org/maven2/org/apache/maven/apache-maven/3.6.3/apache-maven-3.6.3-bin.zip
wrapperUrl=https://repo.maven.apache.org/maven2/io/takari/maven-wrapper/0.5.6/maven-wrapper-0.5.6.jar
23 changes: 23 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# How to Contribute

We'd love to accept your patches and contributions to this project. There are
just a few small guidelines you need to follow.

## Contributor License Agreement

Contributions to this project must be accompanied by a Contributor License
Agreement. You (or your employer) retain the copyright to your contribution;
this simply gives us permission to use and redistribute your contributions as
part of the project. Head over to <https://cla.developers.google.com/> to see
your current agreements on file or to sign a new one.

You generally only need to submit a CLA once, so if you've already submitted one
(even if it was for a different project), you probably don't need to do it
again.

## Code reviews

All submissions, including submissions by project members, require review. We
use GitHub pull requests for this purpose. Consult
[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
information on using pull requests.
3 changes: 1 addition & 2 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
Expand Down Expand Up @@ -187,7 +186,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright 2014 Google, Inc.
Copyright [yyyy] [name of copyright owner]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
95 changes: 84 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,95 @@
# bigdata-interop
# Apache Hadoop Connectors

Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
[![GitHub release](https://img.shields.io/github/release/GoogleCloudDataproc/hadoop-connectors.svg)](https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/latest)
[![GitHub release date](https://img.shields.io/github/release-date/GoogleCloudDataproc/hadoop-connectors.svg)](https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/latest)
[![Code Quality: Java](https://img.shields.io/lgtm/grade/java/g/GoogleCloudDataproc/hadoop-connectors.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/GoogleCloudDataproc/hadoop-connectors/context:java)
[![codecov](https://codecov.io/gh/GoogleCloudDataproc/hadoop-connectors/branch/master/graph/badge.svg)](https://codecov.io/gh/GoogleCloudDataproc/hadoop-connectors)

## Google Cloud Storage connector for Hadoop
Libraries and tools for interoperability between Apache Hadoop related
open-source software and Google Cloud Platform.

## Google Cloud Storage connector for Apache Hadoop (HCFS)

[![Maven Central](https://img.shields.io/maven-central/v/com.google.cloud.bigdataoss/gcs-connector/hadoop1.svg?label=Maven%20Central)](https://search.maven.org/search?q=g:com.google.cloud.bigdataoss%20AND%20a:gcs-connector%20AND%20v:hadoop1-*)
[![Maven Central](https://img.shields.io/maven-central/v/com.google.cloud.bigdataoss/gcs-connector/hadoop2.svg?label=Maven%20Central)](https://search.maven.org/search?q=g:com.google.cloud.bigdataoss%20AND%20a:gcs-connector%20AND%20v:hadoop2-*)
[![Maven Central](https://img.shields.io/maven-central/v/com.google.cloud.bigdataoss/gcs-connector/hadoop3.svg?label=Maven%20Central)](https://search.maven.org/search?q=g:com.google.cloud.bigdataoss%20AND%20a:gcs-connector%20AND%20v:hadoop3-*)

The Google Cloud Storage connector for Hadoop enables running MapReduce jobs
directly on data in Google Cloud Storage by implementing the Hadoop FileSystem
interface. For details, see the README in the `/gcs/` folder.
interface. For details, see [the README](gcs/README.md).

## Google BigQuery connector for Apache Hadoop MapReduce

[![Maven Central](https://img.shields.io/maven-central/v/com.google.cloud.bigdataoss/bigquery-connector/hadoop1.svg?label=Maven%20Central)](https://search.maven.org/search?q=g:com.google.cloud.bigdataoss%20AND%20a:bigquery-connector%20AND%20v:hadoop1-*)
[![Maven Central](https://img.shields.io/maven-central/v/com.google.cloud.bigdataoss/bigquery-connector/hadoop2.svg?label=Maven%20Central)](https://search.maven.org/search?q=g:com.google.cloud.bigdataoss%20AND%20a:bigquery-connector%20AND%20v:hadoop2-*)
[![Maven Central](https://img.shields.io/maven-central/v/com.google.cloud.bigdataoss/bigquery-connector/hadoop3.svg?label=Maven%20Central)](https://search.maven.org/search?q=g:com.google.cloud.bigdataoss%20AND%20a:bigquery-connector%20AND%20v:hadoop3-*)

The Google BigQuery connector for Hadoop MapReduce enables running MapReduce
jobs on data in BigQuery by implementing the `InputFormat` & `OutputFormat`
interfaces. For more details see
[the documentation](https://cloud.google.com/dataproc/docs/concepts/connectors/bigquery)

## Building the Cloud Storage and BigQuery connectors

> Note that build requires Java 8 and fails with newer Java versions.

To build the connector for specific Hadoop version, run the following commands
from the main directory:

```bash
# with Hadoop 2 and YARN support:
./mvnw -P hadoop2 clean package

# with Hadoop 3 and YARN support:
./mvnw -P hadoop3 clean package
```

In order to verify test coverage for specific Hadoop version, run the following
commands from the main directory:

```bash
# with Hadoop 2 and YARN support:
./mvnw -P hadoop2 -P coverage clean verify

# with Hadoop 3 and YARN support:
./mvnw -P hadoop3 -P coverage clean verify
```

The Cloud Storage connector JAR can be found in `gcs/target/`. The BigQuery
connector JAR can be found in `bigquery/target/`.

## Adding the Cloud Storage and BigQuery connectors to your build

Maven group ID is `com.google.cloud.bigdataoss` and artifact ID for Cloud
Storage connector is `gcs-connector` and for BigQuery connectors is
`bigquery-connector`.

To add a dependency on one of the connectors using Maven, use the following:

### Building
* Cloud Storage connector:

The Google Cloud Storage (GCS) connector is built with Maven 3 (as of 2017-10-25, version 3.5.0 has been tested).
To build the connector for Hadoop 1, run the following commands from the main directory:
```xml
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>gcs-connector</artifactId>
<version>hadoop2-2.2.7</version>
</dependency>
```

mvn -P hadoop1 package
* BigQuery connector:

To build the connector with support for Hadoop 2 & YARN, run the following commands from the main directory:
```xml
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>bigquery-connector</artifactId>
<version>hadoop2-1.2.0</version>
</dependency>
```

mvn -P hadoop2 package
## Resources

In both cases the GCS connector JAR can be found in gcs/target/.
On **Stack Overflow**, use the tag
[`google-cloud-dataproc`](https://stackoverflow.com/tags/google-cloud-dataproc)
for questions about the connectors in this repository. This tag receives
responses from the Stack Overflow community and Google engineers, who monitor
the tag and offer unofficial support.
Loading