Aerospike Connect for Spark Release Notes
-
Release Date: July 31, 2024
- Supported for 15 months from the release date.
- Supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x.
- Supports Aerospike Database 5.2 and later.
Bug Fixes
- [CONNECTOR-1116] - Password should be masked in logs.
-
Release Date: June 13, 2024
- Supported for 15 months from the release date.
- Supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x.
- Supports Aerospike Database 5.2 and later.
New Features
- [CONNECTOR-1044] -Support Aerospike 7.1.0 in Spark connector.
Bug Fixes
- [CONNECTOR-910] - Batch size is not computed correctly.
- [CONNECTOR-1092] - Update documentation only
update
and update_only
can be used to delete record using aerospike.update.partial
flag.
-
Release Date: April 9, 2024
- Supported for 15 months from the release date.
- Supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x.
- Supports Aerospike Database 5.2 and later.
New Features
- [CONNECTOR-956] - Support Spark 3.5.0 in Spark connector.
Bug Fixes
- [CONNECTOR-889] - Spark connector returns a subset of data rather than an error on a query failure.
- [CONNECTOR-908] - Secondary Index query creation prior to 6.0 Database throws an error.
Known Issues
- In Apache Spark 3.1.x, a join between multiple tables may throw a ClassCastException. See the Apache Spark bug for the detailed discussion.
-
Release Date: November 2, 2023
- Supported for 15 months from the release date.
- Supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x.
- Supports Aerospike Database 5.2 and later.
Bug Fixes
- [CONNECTOR-820] - Fix Denial of Service (DoS).
- [CONNECTOR-818] - Spark log4j logs break when adding aerospike spark connector jar to the classpath.
Known Issues
- In Apache Spark 3.1.x, a join between multiple tables may throw a ClassCastException. See the Apache Spark bug for the detailed discussion.
-
Release Date: September 20, 2023
- Supported for 15 months from the release date.
- Supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x.
- Supports Aerospike Database 5.2 and later.
New Features
- [CONNECTOR-816] - Allow setting client durableDelete by setting the aerospike.client.durabledelete flag.
Bug Fixes
- [CONNECTOR-810] - Connector should not invoke scan query if at query time it can be deduced that query will fetch no record.
- [CONNECTOR-807] - Having nested and mixed CDTs cause TypeException instead of nullifying the contents when unable to infer schema.
Known Issues
- In Apache Spark 3.1.x, a join between multiple tables may throw a ClassCastException. See the Apache Spark bug for the detailed discussion.
-
Release Date: August 30, 2023
- Supported for 15 months from the release date.
- Supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x.
- Supports Aerospike Database 5.2 and later.
New Features
- [CONNECTOR-783] - In Spark connector only validate if the feature key is present and true.
- [CONNECTOR-792] - Upgrade Spark connector java client to 7.1.0.
- [CONNECTOR-791] - Introduce option aerospike.client.maxconnspernodeExpose to set clientPolicy.maxConnsPerNode.
- [CONNECTOR-794] - Introduce option aerospike.client.asyncmaxconnspernode to set clientPolicy.asyncMaxConnsPerNode.
- [CONNECTOR-795] - Introduce option aerospike.client.asyncminconnspernode to set clientPolicy.asyncMinConnsPerNode.
- [CONNECTOR-796] - Introduce option aerospike.client.minconnspernode to set clientPolicy.minConnsPerNode.
Bug Fixes
- [CONNECTOR-778] - Unable to throttle Aerospike Write TPS after Database upgrade to 6.2.
Known Issues
- In Apache Spark 3.1.x, a join between multiple tables may throw a ClassCastException. See the Apache Spark bug for the detailed discussion.
-
Release Date: August 10, 2023
- Supported for 15 months from the release date.
- Supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x.
- Supports Aerospike Database 5.2 and later.
Bug Fixes
- [CONNECTOR-776] - Upgrade to aerospike-java-client version 7 (addresses CVE-2023-36480).
Known Issues
- Attempting to read records that contain serialized (unknown) data types with Java client 7.0.0/Spark connector 4.1.1 will throw an exception. The upcoming spark connector with java 7.1.0 will read such records gracefully.
- In Apache Spark 3.1.x, a join between multiple tables may throw a ClassCastException. See the Apache Spark bug for the detailed discussion.
-
Release Date: August 2, 2023
- Supported for 15 months from the release date.
- Supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x.
- Supports Aerospike Database 5.2 and later.
New Features
- [CONNECTOR-724] - Support PKI authentication with Spark connector.
- [CONNECTOR-398] - Support Secondary Index cardinality in the Spark connector.
Improvements
- [CONNECTOR-708] - Update Aerospike Java client to 6.1.11.
- [CONNECTOR-704] - Lowercase properties key received as input config in aerolookup.
Bug Fixes
- [CONNECTOR-721] - Duplicate section in Spark tutorial document.
- [CONNECTOR-748] - Invocation of Pushdown Aerospike expressions from PySpark errors "package not callable".
Known Issues
- In Apache Spark 3.1.x, a join between multiple tables may throw a ClassCastException. See the Apache Spark bug for the detailed discussion.
-
Release Date: April 21, 2023
- Supported until July 20, 2025.
- Supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x.
- Supports Aerospike Database 5.2 and later.
- aeroJoin API has been discontinued. Please use aerolookup API.
New Features
- [CONNECTOR-585] - Support Apache Spark 3.3.x for scala 2.12 and scala 2.13.
- [CONNECTOR-657] - Support Apache Spark 3.4.x for scala 2.12 and scala 2.13.
- [CONNECTOR-622] - Port Spark connector for Spark 3.2.x with Scala 2.13.x.
Improvements
- [CONNECTOR-635] - Use one threadpool for all processing in spark connector.
- [CONNECTOR-631] - Drop Aerojoin API.
- [CONNECTOR-598] - Change binary name convention from connector 4.0.0 onwards.
- [CONNECTOR-258] - Test aerolookup with spark streaming.
- [CONNECTOR-601] - Spark connector should resolve featureKey on remote Aerospike Database.
- [CONNECTOR-597] - aerolookup API should merge user provided configuration map with SparkSession aerospike properties.
- [CONNECTOR-602] - aerolookup rows resulting from non existent primary keys should set corresponding columns as null.
Known Issues
- In Apache Spark 3.1.x, a join between multiple tables may throw a ClassCastException. See the Apache Spark bug for the detailed discussion.
-
Release Date: January 17, 2023
- Supported until April 17, 2025.
- Supports Apache Spark 3.0.x, 3.1.x and 3.2.x.
- Supports Aerospike Database 5.2 and later.
- aeroJoin API is deprecated. Please consider using aerolookup API.
Bug Fixes
- [CONNECTOR-567] - Rate limiting writes with spark with Database 6.0+ does not account for subtransactions in aerospike.transaction.rate.
Known Issues
- In Apache Spark 3.1.x, a join between multiple tables may throw a ClassCastException. See the Apache Spark bug for the detailed discussion.
-
Release Date: December 6, 2022
- Supported until March 6, 2024.
- Supports Apache Spark 3.0.x, 3.1.x and 3.2.x.
- Supports Aerospike Database 5.2 and later.
- aeroJoin API is deprecated. Please consider using aerolookup API.
Improvements
- [CONNECTOR-548] - Fix snyk security vulnerabilities.
Known Issues
- In Apache Spark 3.1.x, a join between multiple tables may throw a ClassCastException. See the Apache Spark bug for the detailed discussion.
-
Release Date: November 23, 2022
- Supported until February 23, 2024.
- Supports Apache Spark 3.0.x, 3.1.x and 3.2.x.
- Supports Aerospike Database 5.2 and later.
- aeroJoin API is deprecated. Please consider using aerolookup API.
Bug Fixes
- [CONNECTOR-444] - Unable to end spark-aerospike connection.
Known Issues
- In Apache Spark 3.1.x, a join between multiple tables may throw a ClassCastException. See the Apache Spark bug for the detailed discussion.
-
Release Date: October 21, 2022
- Supported until January 21, 2024.
- Supports Apache Spark 3.0.x, 3.1.x and 3.2.x.
- Supports Aerospike Database 5.2 and later.
- aeroJoin API is deprecated. Please consider using aerolookup API.
Improvements
- [CONNECTOR-463] - Remove Scala runtime libs from assembly jar.
Bug Fixes
- [CONNECTOR-444] - Unable to end spark-aerospike connection.
- [CONNECTOR-458] - Connector 3.5.0+ BatchWrite implementation does not pass write policy.
Known Issues
- In Apache Spark 3.1.x, a join between multiple tables may throw a ClassCastException. See the Apache Spark bug for the detailed discussion.
-
Release Date: August 2, 2022
- Supported until November 2, 2023.
- Supports Apache Spark 3.0.x, 3.1.x and 3.2.x.
- Supports Aerospike Database 5.2 and later.
- aeroJoin API is deprecated. Please consider using aerolookup API.
Bug Fixes
- [CONNECTOR-400] - Task not serializable when spark.sql.codegen.wholeStage is set to false.
Known Issues
- In Apache Spark 3.1.x, a join between multiple tables may throw a ClassCastException. See the Apache Spark bug for the detailed discussion.
-
Release Date: July 14, 2022
- Supported until October 14, 2023.
- Supports Apache Spark 3.0.x, 3.1.x and 3.2.x.
- Supports Aerospike Database 5.2 and later.
- aeroJoin API is deprecated. Please consider using aerolookup API.
New Features
- [CONNECTOR-317] - Support Apache Spark 3.2.x.
- [CONNECTOR-360] - Add batchwrite support.
Improvements
- [CONNECTOR-372] - Add ability to use alternate service addresses to Spark.
- [CONNECTOR-359] - Add enum support for secondary index enum type.
Bug Fixes
- [CONNECTOR-375] - spark.conf.set("key","value") is not propagated to the spark connector.
-
Release Date: July 1, 2022
- Supported until October 1, 2023.
- Supports Apache spark 3.0.x and 3.1.x.
- Supports Aerospike Database 5.2 and later.
- aeroJoin API is deprecated. Please consider using aerolookup API.
Bug Fixes
- [CONNECTOR-374] - Handle NullPointerException with flexible schema and schema inference.
Known Issues
- Aerospike Connect for Spark supports selected secondary index query for CDTs. Please refer to the documentation for a complete set of examples.
-
Release Date: June 16, 2022
- Supported until September 16, 2023.
- Supports Apache spark 3.0.x and 3.1.x.
- Supports Aerospike Database 5.2 and later.
- aeroJoin API is deprecated. Please consider using aerolookup API.
Bug Fixes
- [CONNECTOR-363] - Runtime error while using rate limit due to incorrect library shading.
Known Issues
- Aerospike Connect for Spark supports selected secondary index query for CDTs. Please refer to the documentation for a complete set of examples.
-
Release Date: May 9, 2022
- Supported until August 9, 2023.
- Supports Apache spark 3.0.x and 3.1.x.
- Supports Aerospike Database 5.2 and later.
- aeroJoin API is deprecated. Please consider using aerolookup API.
New Features
- [CONNECTOR-137] - Support Secondary indexes for the Spark Connector.
- [CONNECTOR-336] - Support SINDEX for CDT in the Spark connector.
Improvements
- [CONNECTOR-335] - Add spark shutdown hook.
Known Issues
- Aerospike Connect for Spark supports selected secondary index query for CDTs. Please refer to the documentation for a complete set of examples.
-
Release Date: June 16, 2022
- Supported until September 16, 2023.
- Tested with Apache Spark 3.1.2, Scala 2.12.11 and Aerospike 6.0.0.1 EE.
- Supports Aerospike Database 5.2 and later.
- The aeroJoin API is deprecated. Please consider using the performant and simpler aerolookup API.
Bug Fixes
- [CONNECTOR-363] - Runtime error while using rate limit due to incorrect library shading.
-
Release Date: February 18, 2022
- Supported until May 18, 2023.
- Tested with Apache Spark 3.1.2, Scala 2.12.11 & Python 3.7, Aerospike 5.7.0.7 EE.
- Supports Aerospike Database 5.2 and later.
- The aeroJoin API is deprecated. Please consider using the performant and simpler aerolookup API.
New Features
- [CONNECTOR-132] - Support for spark 3.1.x. Please use this connector only with Spark 3.1.x cluster.
- [CONNECTOR-306] - Add and test write retry logic to the Spark connector to handle quota breaches.
Improvements
- [CONNECTOR-303] - Remove the obsolete Predicate Filtering support from the Spark connector.
- [CONNECTOR-329] - Add an additional default parameter in aerolookup and aerojoin to specify aerospike configuration parameters.
-
Release Date: June 16, 2022
- Supported until September 16, 2023.
- Tested with Apache Spark 3.0.3, Scala 2.12.11
- Supports Aerospike Database 5.0 and later.
Bug Fixes
- [CONNECTOR-363] - Runtime error while using rate limit due to incorrect library shading.
Known Issues
- This connector release shades all internal libraries. Please update application build files accordingly.
- Spark connector stores spark DateType and TimestampType as long. In
aeroJoin
API calls convert aforementioned types to Longtype.
- DataSource v2 API does not support the SQL statement
INSERT INTO
a temp view. Use DataFrame syntax for equivalent functionality.
-
Release Date: December 21, 2021
- Supported until March 21, 2023.
- Tested with Apache Spark 3.0.3, Scala 2.12.11 & Python 3.7.
- Supports Aerospike Database 5.0 and later.
Improvements
- This library is an uber shaded jar.
- Update Client version to 5.1.11.
Bug Fixes
- [CONNECTOR-312] - Update spark connectors with latest java clients to address CLIENT-1637 ("partition unavailable" errors occur).
Known Issues
- This connector release shades all internal libraries. Please update application build files accordingly.
- Spark connector stores spark DateType and TimestampType as long. In
aeroJoin
API calls convert aforementioned types to Longtype.
- DataSource v2 API does not support the SQL statement
INSERT INTO
a temp view. Use DataFrame syntax for equivalent functionality.
-
Release Date: November 15, 2021
- Supported until February 15, 2023.
- Tested with Apache Spark 3.0.3, Scala 2.12.11 & Python 3.7.
- Supports Aerospike Database 5.0 and later.
New Features
- [CONNECTOR-131] - Expression pushdown support in the Spark Connector.
- [CONNECTOR-210] - Limit the write rate from Spark to Aerospike.
- [CONNECTOR-260] - Create DataFrame API for AeroJoin functionality
aerolookup
.
Improvements
- This library is an uber shaded jar.
- Update Client version to 5.1.8.
Bug Fixes
- [CONNECTOR-305] - Create one client instance per spark partition.
Known Issues
- This connector release shades all internal libraries. Please update application build files accordingly.
- Spark connector stores spark DateType and TimestampType as long. In
aeroJoin
API calls convert aforementioned types to Longtype.
- DataSource v2 API does not support the SQL statement
INSERT INTO
a temp view. Use DataFrame syntax for equivalent functionality.
- [CONNECTOR-312] - Update spark connectors with latest java clients to address CLIENT-1637 ("partition unavailable" errors occur). Fixed in version 3.2.1.
-
Release Date: December 21, 2021
- Supported until March 21, 2023.
- Tested with Apache Spark 3.0.3, Scala 2.12.11 & Python 3.7.
- Supports Aerospike Database 5.0 and later.
Improvements
- This library is an uber shaded jar.
- Update Client version to 5.1.11.
- Migrated to Expressions for scans.
- Pushdown support for Float & Double datatypes.
Bug Fixes
- [CONNECTOR-312] - Update spark connectors with latest java clients to address CLIENT-1637 ("partition unavailable" errors occur).
Known Issues
- This connector release shades all internal libraries. Please update application build files accordingly.
- Spark connector stores spark DateType and TimestampType as long. In Aerojoin API calls convert aforementioned types to Longtype.
- DataSource v2 API does not support the SQL statement
INSERT INTO
a temp view. Use DataFrame syntax for equivalent functionality.
Updates
-
Release Date: July 21, 2021
- Supported until October 21, 2022.
- Tested with Apache Spark 3.0.3, Scala 2.12.11 & Python 3.7.
- Supports Aerospike Database 5.0 and later.
New Features
- [CONNECTOR-247] - Spark connector should persist Map bins as K-Ordered.
- [CONNECTOR-166] - Support batchget queries with digests in Spark Connector.
- [CONNECTOR-142] - Data Sampling using the Spark Connector using
aerospike.sample.size
flag.
- [CONNECTOR-142] - Support boolean bins in the Spark Connector (refer to
aerospike.booleanbin
in the documentation).
Improvements
- This library is an uber shaded jar.
- Migrated from queryPartiton() call to ScanPartitions().
- Update Client version to 5.1.5.
- Migrated to Expressions for scans.
- Pushdown support for Float & Double datatypes.
Bug Fixes
- [CONNECTOR-215] - Writes are slower in the Spark Connector v2 version. Introduced a new flag
aerospike.write.batchsize
to control write throughput.
Known Issues
- This connector release shades all internal libraries. Please update application build files accordingly.
- Spark connector stores spark DateType and TimestampType as long. In Aerojoin API calls convert aforementioned types to Longtype.
- DataSource v2 API does not support the SQL statement
INSERT INTO
a temp view. Use DataFrame syntax for equivalent functionality.
- [CONNECTOR-312] - Update spark connectors with latest java clients to address CLIENT-1637 ("partition unavailable" errors occur). Fixed in version 3.1.1.
Updates
-
Release Date: January 5, 2022
- Supported until April 5, 2023.
- Tested with Apache Spark 3.0.0, Scala 2.12.11, & Python 3.7.
- Supports Aerospike Database 5.0 and later.
Improvements
- This library is an uber shaded jar.
Bug Fixes
- [CONNECTOR-312] - Update spark connectors with latest java clients to address CLIENT-1637 ("partition unavailable" errors occur).
Known Issues
- This release does not support Aerospike 5.6 boolean bin and quota features.
- DataSource v2 API does not support the SQL statement
INSERT INTO
a temp view. Use DataFrame syntax for equivalent functionality.
- Streaming write does not work with Apache Spark 3.1.0.
- Streaming update trait
SupportsStreamingUpdate
from Spark 3.0.0 has been renamed to SupportsStreamingUpdateAsAppend
in Spark 3.1.0.
Updates
- The default value of flag
aerospike.partition.factor
have changed from 12 to 8. Please update your application accordingly.
-
Release Date: June 2, 2021
- Supported until September 2, 2022.
- Tested with Apache Spark 3.0.0, Scala 2.12.11, & Python 3.7.
- Supports Aerospike Database 5.0 and later.
Improvements
- This library is an uber shaded jar.
Bug Fixes
- [CONNECTOR-208] - Spark connector with default timeout settings is timing out after 1 second.
- [CONNECTOR-205] - Filter out records that breach write block size in Aerospike via Spark Connector.
- [CONNECTOR-212] - Handle nulls in full record writes (REPLACE, REPLACE_ONLY, and CREATE_ONLY).
Known Issues
- This release does not support Aerospike 5.6 boolean bin and quota features.
- DataSource v2 API does not support the SQL statement
INSERT INTO
a temp view. Use DataFrame syntax for equivalent functionality.
- Streaming write does not work with Apache Spark 3.1.0.
- Streaming update trait
SupportsStreamingUpdate
from Spark 3.0.0 has been renamed to SupportsStreamingUpdateAsAppend
in Spark 3.1.0.
- [CONNECTOR-312] - Update spark connectors with latest java clients to address CLIENT-1637 ("partition unavailable" errors occur). Fixed in version 3.0.3.
Updates
- The default value of flag
aerospike.partition.factor
have changed from 12 to 8. Please update your application accordingly.
-
Release Date: February 24, 2021
- Supported until May 24, 2022.
- [CONNECTOR-110] - Spark 3.x branch - Aerospike configuration passed from spark configuration should be accessible downstream.
Known Issues
- DataSource v2 API does not support the SQL statement
INSERT INTO
a temp view. Use DataFrame syntax for equivalent functionality.
- Streaming write does not work with Apache Spark 3.1.0.
- Streaming update trait
SupportsStreamingUpdate
from Spark 3.0.0 has been renamed to SupportsStreamingUpdateAsAppend
in Spark 3.1.0.
- [CONNECTOR-312] - Update spark connectors with latest java clients to address CLIENT-1637 ("partition unavailable" errors occur). Fixed in version 3.0.3.
Updates
- The default value of flag
aerospike.partition.factor
have changed from 12 to 8. Please update your application accordingly.
-
Release Date: February 18, 2021
- Supported until May 18, 2022.
- [CONNECTOR-103] - Extend support for Apache Spark 3.0.0 Data Source V2.
- Supports Aerospike Database 5.0 and later.
New Features
- Data Source V2 implementation for Apache Spark 3.0.0.
Known Issues
- DataSource v2 API does not support the SQL statement
INSERT INTO
a temp view. Use DataFrame syntax for equivalent functionality.
- Streaming write does not work with Apache Spark 3.1.0.
- Streaming update trait
SupportsStreamingUpdate
from Spark 3.0.0 has been renamed to SupportsStreamingUpdateAsAppend
in Spark 3.1.0.
- We have observed that a configuration set using "spark.conf.set()" is not passed along to the connector, hence defaults are used by the connector, which may produce unintended results. Consider using .option() or .options() along with the read and write statements for the configuration to take effect. Fixed in version 3.0.1.
- [CONNECTOR-312] - Update spark connectors with latest java clients to address CLIENT-1637 ("partition unavailable" errors occur). Fixed in version 3.0.3.
Updates
- The default value of flag
aerospike.partition.factor
have changed from 12 to 8. Please update your application accordingly.
-
Release Date: July 12, 2022
- 2.9.0 is very likely the last release that will be compatible with Apache Spark 2.4.7 binary. Aerospike has ceased developing new features to support Spark 2.x.y. However, bug fixes will be available until October 14, 2022. Please plan to move to Apache Spark 3.0.x and use Aerospike Connect for Spark version 3.x.y.
- Supported until October 12, 2023.
- Tested with Apache Spark 2.4.7, Scala 2.12.11 & Python 3.7.
- Supports Aerospike Database 5.0 and later.
Improvements
- [CONNECTOR-372] - Add ability to use alternate service addresses to Spark.
Bug Fixes
- [CONNECTOR-375] - spark.conf.set("key","value") is not propagated to the spark connector.
-
Release Date: January 3, 2022
- Apache Spark 2.4.8 is the last release in Spark’s 2.x.y branch. No more 2.x.y releases of Spark are expected, even for bug fixes. Therefore, 2.8.0 is the last version of Aerospike Connect for Spark 2.8 that will be compatible with that Spark branch. Aerospike has ceased developing new features to support Spark 2.x.y. However, bug fixes will be available until October 14, 2022. If you are using Apache Spark 2.4.x and Aerospike Connect for Spark 2.8.0 or earlier, please plan to move to Apache Spark 3.0.x and use Aerospike Connect for Spark version 3.x.y.
- Supported until April 3, 2023.
- Tested with Apache Spark 3.0.3, Scala 2.12.11 & Python 3.7.
- Supports Aerospike Database 5.0 and later.
Bug Fixes
- [CONNECTOR-312] - Update spark connectors with latest java clients to address CLIENT-1637 ("partition unavailable" errors occur).
Known Issues
- DataSource v2 API does not support the SQL statement
INSERT INTO
a temp view. Use DataFrame syntax for equivalent functionality.
aerospike.write.mode
flag overrides Apache Spark write mode.
- Spark connector stores spark DateType and TimestampType as long. In Aerojoin API calls convert these types to long.
-
Release Date: July 14, 2021
- Apache Spark 2.4.8 is the last release in Spark’s 2.x.y branch. No more 2.x.y releases of Spark are expected, even for bug fixes. Therefore, 2.8.0 is the last version of Aerospike Connect for Spark 2.8 that will be compatible with that Spark branch. Aerospike has ceased developing new features to support Spark 2.x.y. However, bug fixes will be available until October 14, 2022. If you are using Apache Spark 2.4.x and Aerospike Connect for Spark 2.8.0 or earlier, please plan to move to Apache Spark 3.0.x and use Aerospike Connect for Spark version 3.x.y.
- Supported until October 14, 2022.
- Tested with Apache Spark 2.4.7, Scala 2.11.12, & Python 3.7.
- Supports Aerospike Database 5.0 and later.
New Features
- [CONNECTOR-166] - Support batchget queries with digests in Spark Connector.
- [CONNECTOR-142] - Data Sampling using the Spark Connector using
aerospike.sample.size
flag.
- [CONNECTOR-142] - Support boolean bins in the Spark Connector (refer to
aerospike.booleanbin
in the documentation).
- [CONNECTOR-211] - Support partial updates of records using the
aerospike.update.partial
flag.
Improvements
- Migrated from queryPartiton() call to ScanPartitions().
- Updated Spark version to 2.4.7.
- Update Client version to 5.1.5.
- Migrated to Expressions for scans.
- Pushdown support for Float & Double datatypes.
Bug Fixes
- [CONNECTOR-205] - Filter out records that breach write block size in Aerospike via Spark Connector.
- [CONNECTOR-212] - Handle nulls in full record writes (REPLACE, REPLACE_ONLY, and CREATE_ONLY).
- [CONNECTOR-215] - Writes are slower in the Spark Connector v2 version. Introduced a new flag
aerospike.write.batchsize
to control write throughput.
Known Issues
- DataSource v2 API does not support the SQL statement
INSERT INTO
a temp view. Use DataFrame syntax for equivalent functionality.
aerospike.write.mode
flag overrides Apache Spark write mode.
- Spark connector stores spark DateType and TimestampType as long. In Aerojoin API calls convert these types to long.
- [CONNECTOR-312] - Update spark connectors with latest java clients to address CLIENT-1637 ("partition unavailable" errors occur). Fixed in version 2.8.1.
Updates
-
Release Date: February 24, 2021
- Supported until May 24, 2022.
- [CONNECTOR-111] - Spark 2.x branch - Aerospike configuration passed from spark configuration should be accessible downstream.
Known Issues
- DataSource v2 API does not support the SQL statement
INSERT INTO
a temp view. Use DataFrame syntax for equivalent functionality.
Updates
- The default value of flag
aerospike.partition.factor
have changed from 12 to 8. Please update your application accordingly.
-
Release Date: January 25, 2021
- Supported until April 25, 2022.
- [CONNECTOR-105] - Fixed a TLS issue in the Aerospike Spark 2.7.0 release.
Known Issues
- DataSource v2 API does not support the SQL statement
INSERT INTO
a temp view. Use DataFrame syntax for equivalent functionality.
- We have observed that a configuration set using "spark.conf.set()" is not passed along to the connector, hence defaults are used by the connector, which may produce unintended results. Consider using .option() or .options() along with the read and write statements for the configuration to take effect. Fixed in version 2.7.2.
Updates
- The default value of flag
aerospike.partition.factor
have changed from 12 to 8. Please update your application accordingly.
-
Release Date: January 19, 2021
- Supported until April 19, 2022.
- Datasource V2 implementation.
- Tested with Aerospike Enterprise Edition Database version 5.2.0 & Apache Spark version 2.4.0.
New Features
- [CONNECTOR-96] - Upgrade DataSource APIs used in the Spark Connector to v2.
- [CONNECTOR-101] - Spark Feature file verification expires one day early.
Improvements
- Aerospike datasource format can be specified with brevity.
Known Issues
- DataSource v2 API does not support the SQL statement
INSERT INTO
a temp view. Use DataFrame syntax for equivalent functionality.
- We have observed that configuration set using "spark.conf.set()" is not passed along to the connector, hence defaults are used by the connector, which may produce unintended results. Consider using .option() or .options() along with the read and write statements for the configuration to take effect.
Updates
- The default value of flag
aerospike.partition.factor
have changed from 12 to 8. Please update your application accordingly.
-
Release Date: October 29, 2020
- Supported until January 29, 2022.
- Support Writes in Spark SQL Format.
- Tested with Aerospike Enterprise Edition Database version 5.2.0 & Apache Spark version 2.4.0.
New Features
- [CONNECTOR-94] - Support Writes in Spark SQL Format.
Improvements
- Aerospike datasource format can be specified with brevity.
-
Release Date: October 14, 2020
- Supported until January 14, 2022.
- Flexible schema support in spark, to read mixed data types from aerospike bin.
- Tested with Aerospike Enterprise Edition Database version 5.2.0 & Apache Spark version 2.4.0.
New Features
- [CONNECTOR-85] - Support records with a different number of bins and types in a set.
- [CONNECTOR-82] - Support pushdown of spark datetype and timestamptype.
Improvements
- Additional error handling to address underflow and overflow in Short, Int, and Float types.
-
Release Date: September 3, 2020
- Supported until December 3, 2021.
- Extended primary key types support.
New Features
- Introduced a flag aerospike.keyType, to hint primary key type during schema inference.
-
Release Date: July 16, 2020
- Supported until October 16, 2021.
- Fixed a broken API to create AerospikeConfig instance.
-
Release Date: June 19, 2020
- Supported until September 19, 2021.
- Nested updateByKey support and prioritizing __digest, __ttl, __generation filters.
New Features
- Record insertion can be done by nested updateByKey.
- Spark Filters are rearranged such that __digest, __ttl, __generation are always in the beginning, if present.
Known Issues
- Aerospike Connect for Spark version 2.0 and above is only compatible with Aerospike Database version 4.9 and above.
- updateByKey only supports keys which are accepted by the Java client.
-
Release Date: May 12, 2020
- Supported until August 12, 2021.
New Features
- Ability to extend aerospike partitions up to 32768 (2^15).
- Ability to specify the target set for spark write operations through the aerospike.writeset flag.
Known Issues
- Aerospike Connect for Spark version 2.0 and above is only compatible with Aerospike Database version 4.9 and above.
- The default value of aerospike.partition.factor has changed to 12 from 0.
- Previous to version 2.2, the number of aerospike partitions were computed by 4096 >> f, where f is the aerospike.partition.factor.
- From version 2.2 onwards, the number of aerospike partitions will be computed by 2^f, where f is the aerospike.partition.factor.
-
Release Date: April 28, 2020
- Supported until July 28, 2021.
New Features
- Added capability of streaming writes to Aerospike.
Known Issues
- Aerospike Connect for Spark version 2.0 and above is only compatible with Aerospike Database version 4.9 and above.
-
Release Date: April 15, 2020
- Supported until July 15, 2021.
New Features
- Ability to fine tune up to 4096 scan partitions concurrently.
- This can be further tuned by setting the aerospike.partition.factor value appropriately.
- TLS and LDAP support.
- Ability to query multiple primary keys through connector.
Improvements
- Query engine improvements.
- Ability to specify seed nodes through Aerospike configuration.
- Ability to specify feature file from configuration or HDFS.
- Improved error handling in case of write/save failure.
- Ability to enable client-server compression in spark connector.
- Ability to set records per second for scans.
- Fixed issue of duplicate data accumulation in primary key call.
Known Issues
- Aerospike Connect for Spark version 2.0 and above is only compatible with Aerospike Database version 4.9 and above.
-
Release Date: October 21, 2019
New Features
- Added explicit schema for saves.
Known Issues
- Primary key call will fetch mutiple copies of record, hence accumulating duplicate data.
-
Release Date: March 26, 2019
- Initial Standalone Connector General Availability release.
- Embedded Spark update.
New Features
- Spark 2.4.0 support.
- Added dataset aeroIncrease function which enables dataset send add/increment operations to Aerospike Database.
-
Release Date: March 12, 2019
- Initial Embedded Spark General Availability release.
New Features
- Reading from Aerospike to a DataFrame/Dataset.
- Saving a DataFrame/Dataset to Aerospike.
- Spark SQL multiple filters pushed down to the Aerospike cluster.
- Support for Geo points-within-region query using Aerospike.
- Join a Spark Dataset that contains record keys to record data stored in Aerospike.