hive - Creating deciles in SQL - Stack Overflow
I'm trying to bucket my data into deciles, but not in the traditional sense where the dimension is the basis of the decile.
I have 463 unique it_scores ranging from 316-900 (my dimension) with 1,296,070 trade_counts (my measure) total. Using the following code breaks my data into 10 buckets with 47 unique it_scores:
ntile(10) over (order by it_score)) as tileno
While this is definitely doing what it's supposed to, I need my buckets to be built on the basis of total trade_counts, with each bucket containing about 129.6k observations. The it_score is still the dimension but the ranges wouldn't necessarily be equal i.e. decile 10 might have a range of 316-688 with 129.6k observations while decile 9 might be 689-712 also with 129.6k observations.
How would I achieve that?
I'm trying to bucket my data into deciles, but not in the traditional sense where the dimension is the basis of the decile.
I have 463 unique it_scores ranging from 316-900 (my dimension) with 1,296,070 trade_counts (my measure) total. Using the following code breaks my data into 10 buckets with 47 unique it_scores:
ntile(10) over (order by it_score)) as tileno
While this is definitely doing what it's supposed to, I need my buckets to be built on the basis of total trade_counts, with each bucket containing about 129.6k observations. The it_score is still the dimension but the ranges wouldn't necessarily be equal i.e. decile 10 might have a range of 316-688 with 129.6k observations while decile 9 might be 689-712 also with 129.6k observations.
How would I achieve that?
Share Improve this question asked Nov 15, 2024 at 21:05 A. OliA. Oli 431 silver badge6 bronze badges 3 |1 Answer
Reset to default 0SUM(trade_count) OVER (ORDER BY it_score)
to assign deciles based on cumulative trade_counts
.
SELECT
decile,
SUM(trade_count) AS decile_trade_count
FROM
(
SELECT
it_score,
trade_count,
FLOOR(
(SUM(trade_count) OVER (ORDER BY it_score) - 1) / (SUM(trade_count) OVER ()) * 10
) + 1 AS decile
FROM table
) sub
GROUP BY decile
ORDER BY decile;
- 2013年科技行业推出的失败产品(组图)
- 谷歌重塑软件业务围剿苹果
- java - ChatResponse how to get the history of the ToolResponseMessage Spring ai - Stack Overflow
- caching - Django: Slow Dashboard with Aggregated Data and Related Models - Stack Overflow
- Youtube APi Fetching Gender Percentage of my channel - Stack Overflow
- How to optimize query performance in a large fact table with billions of rows? - Stack Overflow
- python - NaN values in Pandas are not being filled by the interpolate function when it's applied to a full dataframe - S
- ggplot2 - alluvial diagram in R, Error: Data not in recognizable format - Stack Overflow
- xamarin - How to add static library(.a) in to .Net Maui iOS application - Stack Overflow
- differential equations - Calculating a rocket trajectory in Matlab - Stack Overflow
- visual studio code - Cant use pip install on pytorch Python 3.13.01, MacOS Sonoma 14.6.1 - Stack Overflow
- sublimetext3 - Sublime Text 34: copypaste all text excluding comments - Stack Overflow
- sockets - PICO WMicropython websocket client can't send to PHP websocket properly - Stack Overflow
- keyboard - Issue with java.awt.Robot KeyEvent for Special Characters (e.g., : and ) - Stack Overflow
- OneNote with embeded Excel having Data Connection - security warning and disabled - Stack Overflow
- tsx - Does Inversify actually require emitDecoratorMetadata for Typescript? - Stack Overflow
- python - Convert numpy float to string - Stack Overflow
sql
andhive
and instead use the tagsalgorithm
anddistribution
and write a more precise description. This may or may not include that you want to get the standard deviation of the buckets' trade_counts as low as possible. – Thorsten Kettner Commented Nov 16, 2024 at 16:08