
[SPARK-9478] Add sample weights to Random Forest - ASF JIRA
I didn't find any discussion about the semantic change in sampling strategy. No matter we implement class weights or instance weights, the simple random sampling should become …
[SPARK-22947] SPIP: as-of join in Spark SQL - ASF Jira
This approach suffers in performance if sampling data is expensive. For instance, when the data to be sampled is the output of an expensive computation, sampling the data would cause the …
[SPARK-46094] Support Executor JVM Profiling - ASF Jira
Nov 24, 2023 · This feature is to add a low overhead sampling profiler like async-profiler as a built in capability to the Spark job that can be turned on using only user configurable parameters …
[SPARK-23173] from_json can produce nulls for fields which are …
The from_json function uses a schema to convert a string into a Spark SQL struct. This schema can contain non-nullable fields. The underlying JsonToStructs expression does not check if a …
Allow tracking of detailed metrics such as CPU Usage by processors
So we should provide the ability to turn this feature on/off and ideally also allow for sampling of metrics and extrapolating out those numbers so that we can monitor these things only for a …
[SPARK-15689] Data source API v2 - ASF Jira
Nice-to-have: support additional common operators, including limit and sampling. Note that both 1 and 2 are problems that the current data source API (v1) suffers.
Support large partitions on the 3.0 sstable format
The index summary is a sampling of the index so most of the time we aren't going to get a hit into the data file right? We have to scan the index to find the RIE and that entire process is what …
Should not coerce decimal type to double type when it's join column
PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - …
Release Notes - ASF Jira
Jan 7, 2010 · [HIVE-12165] - wrong result when hive.optimize.sampling.orderby=true with some aggregate functions [HIVE-12367] - Lock/unlock database should add current database to …
[NIFI-13633] AllowScientificNotation default false causing existing ...
2024-08-06 06:42:20,852 ERROR [Timer-Driven Process Thread-17] o.a.n.processors.standard.ConvertRecord ConvertRecord [id=2f9e8b7a-045c-301d-f79c …