天天微动态丨SQL concat_ws, collect_set, 和explode合并使用
有一个这样的数据集:字段和字段的值是两列
目的是将这个数据转换成规整的一个特征是一列的数据:
(相关资料图)
第一步:先造出列
select ucid,CASE WHEN type ="性别" THEN label end `性别` ,CASE WHEN type ="产品" THEN label end `产品` ,CASE WHEN type ="还款表现" THEN label end `还款表现` ,CASE WHEN type ="营业点" THEN label end `营业点` ,CASE WHEN type ="base" THEN label end `base`From input
第二步:多行合并为一行
select ucid,concat_ws(",",collect_set(CASE WHEN type ="性别" THEN label end)) `性别` ,concat_ws(",",collect_set(CASE WHEN type ="产品" THEN label end)) `产品` ,concat_ws(",",collect_set(CASE WHEN type ="还款表现" THEN label end)) `还款表现` ,concat_ws(",",collect_set(CASE WHEN type ="营业点" THEN label end)) `营业点` ,concat_ws(",",collect_set(CASE WHEN type ="base" THEN label end)) `base`From inputgroup by `ucid`
第三步:特征枚举值展开
select ucid, explode(split(`产品`,",")) as `产品` from (select ucid,concat_ws(",",collect_set(CASE WHEN type ="性别" THEN label end)) `性别` ,concat_ws(",",collect_set(CASE WHEN type ="产品" THEN label end)) `产品` ,concat_ws(",",collect_set(CASE WHEN type ="还款表现" THEN label end)) `还款表现` ,concat_ws(",",collect_set(CASE WHEN type ="营业点" THEN label end)) `营业点` ,concat_ws(",",collect_set(CASE WHEN type ="base" THEN label end)) `base`From inputgroup by `ucid` )
上面是示范展开一列。由于一个select里只能有一个explode,因此,为了得到最终想要的数据,需要explode多层(哭哭)
select ucid, `性别`, `产品explode`, `还款表现explode`, `营业点explode`, `base` ,explode(split(`营业点`,",")) as `营业点explode` from (select *, explode(split(`还款表现`,",")) as `还款表现explode` from (select *, explode(split(`产品`,",")) as `产品explode` from (select ucid,concat_ws(",",collect_set(CASE WHEN type ="性别" THEN label end)) `性别` ,concat_ws(",",collect_set(CASE WHEN type ="产品" THEN label end)) `产品` ,concat_ws(",",collect_set(CASE WHEN type ="还款表现" THEN label end)) `还款表现` ,concat_ws(",",collect_set(CASE WHEN type ="营业点" THEN label end)) `营业点` ,concat_ws(",",collect_set(CASE WHEN type ="base" THEN label end)) `base`From inputgroup by `ucid` )))
其实 spark SQL 3.3.2可以用lateral view 实现一次explode多个字段:https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-lateral-view.html
CREATE TABLE person (id INT, name STRING, age INT, class INT, address STRING);INSERT INTO person VALUES (100, "John", 30, 1, "Street 1"), (200, "Mary", NULL, 1, "Street 2"), (300, "Mike", 80, 3, "Street 3"), (400, "Dan", 50, 4, "Street 4");SELECT * FROM person LATERAL VIEW EXPLODE(ARRAY(30, 60)) tableName AS c_age LATERAL VIEW EXPLODE(ARRAY(40, 80)) AS d_age;
3. 函数1. concat_ws 和concat的联系与区别concat(col1, col2, ..., colN) - Returns the concatenation of col1, col2, ..., colN.,可以拼接多个字符串concat_ws(sep[, str | array(str)]+) - Returns the concatenation of the strings separated by sep.返回用指定分隔符进行拼接的字符串,指定的分隔符放在第一个参数位置,后面的参数默认为需要进行拼接的字符串。二者的区别在于:concat中若有一个参数为null ,则返回null。而concat_ws,不会因为存在null 值就返回null 。
select concat("-","DS","A", "B",null) from input
select concat_ws("-","DS","A", "B",null) from input
2. collect_set 函数collect_set(expr) - Collects and returns a set of unique elements.
3. explode 函数explode里面的参数,可以是array ,也可以是map
explode(expr) - Separates the elements of array expr into multiple rows, or the elements of map expr into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map
select explode(array(10, 20)) from input
array如果需要分割,需要和split嵌套使用
SELECT explode(split("1,2,3",",")) from input
SELECT explode(map("A","1","B","2","C","3")) from input
时间的展开:
select `date`,`min_date`,`max_date`,explode(sequence( `min_date`,`max_date`,interval 1 day )) as `展开日期`# day是day颗粒度,也可以换成month
关键词:
为您推荐
-
1 背景有一个这样的数据集:字段和字段的值是两列目的是将这个数据转换成规整的一个特征是一列的数据:2 做法第一步:先造出列selectucid,CASE
23-04-12
-
(潘伟峰王昊)福州海警局12日披露,近日,该局在辖区海域成功查扣1艘“三无”走私船舶,查获无合法齐全手续牛肚约15000公斤,案值100余万元。4
23-04-12
-
以下是科信技术在北京时间4月12日13:42分盘口异动快照:4月12日,科信技术盘中涨幅达5%,截至13点42分,报20 49元,成交1 89亿元,换手率5 14%
23-04-12
-
4月11日,恩威医药(301331)融资买入141 26万元,融资偿还110 46万元,融资净买入30 8万元,融资余额3531 45万元,近3个交易日已连续净买入累
23-04-12
-
一、泰国登革热致11人死亡4月10日,泰国公共卫生部就该国登革热情况发布报告称,2023年1月1日至4月5日期间,泰国登革热确诊病例达到1 439万例
23-04-12
-
央视网消息(新闻联播):中国汽车工业协会今天(4月11日)发布的数据显示,今年一季度,我国新能源汽车产销量继续保持快速增长,新能源汽车正
23-04-12
-
4月10日,联合国安理会举行公开会,讨论“违反武器和军事装备出口协定造成的风险”。会上,俄罗斯等多国代表批评美西方向乌克兰提供武器,并且
23-04-12
-
4月11日VR&AR板块较上一交易日上涨0 67%,百纳千成领涨。当日上证指数报收于3313 57,下跌0 05%。深证成指报收于11877 15,上涨0 04%。VR&AR板
23-04-12
-
新海南客户端、南海网、南国都市报4月11日消息(记者周静泊)豪车、美酒、咖啡、时尚、艺术……当消博会遇上“意大利制造”,会迸发怎样的火花
23-04-12