首页
快讯
关注
资讯
- 健康
- 科技
- 热点
- 视频
- 产业
- 政策
- 护理
- 投资
- 医改
- 养老
- 疫情
- 人物
- 医保
- 疾病
- 管理
- English
- 临床
- 心血管
- 肿瘤
- 内分泌
- 妇儿
- 感染
专题
活动
知识

首页
快讯
关注
资讯
- 健康
- 科技
- 热点
- 视频
- 产业
- 政策
- 护理
- 投资
- 医改
- 养老
- 海外
- 人物
- 保险
- 疾病
- 管理
- English
- 临床
- 心血管
专题
活动
智库
知识
- 北斗学苑
评选
会议
排行
全球医疗
健康界APP

欢迎登录体验更多功能

搜索

高级的交集可视化工具--ComplexUpset！

2021

06/09

作图丫

A-

A+

ComplexUpset是一个应用于复杂的多个数据集合可视化的R语言包。

导语

ComplexUpset是一个应用于复杂的多个数据集合可视化的R语言包。

背景介绍

对于集合的可视化，我们第一反应想到的往往是韦恩图，韦恩图本身在集合数目较少的情况下是非常适用的，但是一旦集合数目较多，并且想要展示的数据维度复杂，仅仅通过韦恩图去可视化是不能完全展示数据内容的。今天小编在这里就给大家介绍一个非常棒的集合可视化R包--ComplexUpset，它在实现集合可视化方面具有非常强大的优势！

87641623202394903

R包安装

BiocManager::install("ComplexUpset")library(ggplot2)library(ComplexUpset)#数据展示library(ggplot2movies)movies = as.data.frame(ggplot2movies::movies)head(movies, 3)

64901623202399338

genres = colnames(movies)[18:24]genresmovies[genres] = movies[genres] == 1t(head(movies[genres], 3))movies[movies$mpaa == '', 'mpaa'] = NAmovies = na.omit(movies)#设置绘图区域函数set_size = function(w, h, factor=1.5) { s = 1 * factor options( repr.plot.width=w * s, repr.plot.height=h * s, repr.plot.res=100 / factor, jupyter.plot_mimetypes='image/png', jupyter.plot_scale=1 )}

3461623202399638

25211623202399768

可视化展示

01 基本用法

函数有两个必需参数：

第一个参数是一个dataframe，包含group的指标变量和协变量，

第二个参数指定一个列表，其中列的名称表示样本的标签。

upset(movies, genres, name='genre', width_ratio=0.1)

71291623202400265

可以通过设置min_size函数筛选最小的集合水平，例如设置min_size=10，将关注至少有10个数据的集合。 upset(movies, genres, name='genre', width_ratio=0.1, min_size=10, wrap=TRUE, set_sizes=FALSE)+ ggtitle('Without empty groups (Short dropped)')+ #通过patchwork实现拼图 upset(movies, genres, name='genre', width_ratio=0.1, min_size=10, keep_empty_groups=TRUE, wrap=TRUE, set_sizes=FALSE)+ ggtitle('With empty groups')

49221623202400475

韦恩图的区域选择： 实例：给定三组数据A、B、C，通过韦恩图展示结果。 abc_data = create_upset_abc_example()abc_venn = ( ggplot(arrange_venn(abc_data)) + coord_fixed() + theme_void() + scale_color_venn_mix(abc_data))##设置具体绘图的参数( abc_venn + geom_venn_region(data=abc_data, alpha=0.05) + geom_point(aes(x=x, y=y, color=region), size=1) + geom_venn_circle(abc_data) + geom_venn_label_set(abc_data, aes(label=region)) + geom_venn_label_region( abc_data, aes(label=size), outwards_adjust=1.75, position=position_nudge(y=0.2) ) + scale_fill_venn_mix(abc_data, guide=FALSE))

69201623202400641

通过定义一个highlight函数对特定区域进行选择。 set_size(6, 6.5)simple_venn = ( abc_venn + geom_venn_region(data=abc_data, alpha=0.3) + geom_point(aes(x=x, y=y), size=0.75, alpha=0.3) + geom_venn_circle(abc_data) + geom_venn_label_set(abc_data, aes(label=region), outwards_adjust=2.55))#设置highlight函数highlight = function(regions) scale_fill_venn_mix( abc_data, guide=FALSE, highlight=regions, inactive_color='NA')( (#选择A-B交集，下方选择A-B交集和A-B-C交集 simple_venn + highlight(c('A-B')) + labs(title='Exclusive intersection of A and B') | simple_venn + highlight(c('A-B', 'A-B-C')) + labs(title='Inclusive intersection of A and B') ) / (#选择对应集合 simple_venn + highlight(c('A-B', 'A', 'B')) + labs(title='Exclusive union of A and B') | simple_venn + highlight(c('A-B', 'A-B-C', 'A', 'B', 'A-C', 'B-C')) + labs(title='Inclusive union of A and B') ))

91731623202400804

展示全部关联： set_size(8, 3)upset( movies, genres, width_ratio=0.1, min_size=10, mode='inclusive_union', base_annotations=list('Size'=(intersection_size(counts=FALSE, mode='inclusive_union'))), intersections='all', max_degree=3)

87091623202400944

02 为交集图添加更加丰富的注释

我们可以使用以下三种方法添加多个注释： upset( movies, genres, annotations = list( #第一种方法：通过列表 'Length'=list( aes=aes(x=intersection, y=length), # 如果您想添加多几何图形，可以提供一个列表 geom=geom_boxplot(na.rm=TRUE) ), #第二种方法：使用ggplot2 'Rating'=( ggplot(mapping=aes(y=rating)) + geom_jitter(aes(color=log10(votes)), na.rm=TRUE) + geom_violin(alpha=0.5, na.rm=TRUE) ), #第三种方法：使用`upset_annotate` 'Budget'=upset_annotate('budget', geom_boxplot(na.rm=TRUE)) ), min_size=10, width_ratio=0.1)

50841623202401082

还可以使用条形图演示分类变量比例的差异： set_size(8, 5)upset( movies, genres, annotations = list( 'MPAA Rating'=( ggplot(mapping=aes(fill=mpaa)) + geom_bar(stat='count', position='fill') + scale_y_continuous(labels=scales::percent_format()) + scale_fill_manual(values=c( 'R'='#E41A1C', 'PG'='#377EB8', 'PG-13'='#4DAF4A', 'NC-17'='#FF7F00' )) + ylab('MPAA Rating') ) ), width_ratio=0.1)

64301623202401239

颜色填充： set_size(8, 3)upset( movies, genres, base_annotations=list( 'Intersection size'=intersection_size( counts=FALSE, mapping=aes(fill=mpaa) ) ), width_ratio=0.1)

4301623202401427

展示百分比 set_size(8, 6)upset( movies, genres, name='genre', width_ratio=0.1, min_size=10, base_annotations=list( # with manual aes specification: 'Intersection size'=intersection_size(text_mapping=aes(label=paste0(round( !!get_size_mode('exclusive_intersection')/!!get_size_mode('inclusive_union') * 100 ), '%'))), # using shorthand: 'Intersection ratio'=intersection_ratio(text_mapping=aes(label=!!upset_text_percentage())) ))

17451623202401558

03 其他设置

条纹颜色设置： set_size(6, 4)upset( movies, genres, min_size=10, width_ratio=0.2, stripes=c('cornsilk1', 'deepskyblue1'))

91851623202401774

将颜色设置为白色以禁用条纹： set_size(6, 4)upset( movies, genres, min_size=10, width_ratio=0.2, stripes='white')

72281623202401934

特殊标记： 使用set或intersect（而不是两者）指定要特殊显示的内容：-set将标记对应的条形图，-intersect将标记所有元素。 set_size(8, 6)upset( movies, genres, name='genre', width_ratio=0.1, min_size=10, annotations = list( 'Length'=list( aes=aes(x=intersection, y=length), geom=geom_boxplot(na.rm=TRUE) ) ), queries=list( upset_query( intersect=c('Drama', 'Comedy'), color='red', fill='red', only_components=c('intersections_matrix', 'Intersection size') ), upset_query( set='Drama', fill='blue' ), upset_query( intersect=c('Romance', 'Comedy'), fill='yellow', only_components=c('Length') ) ))

22681623202402148

对集合进行排序： set_size(8, 3)#按照参数degree排序upset(movies, genres, width_ratio=0.1, sort_intersections_by='degree')

44771623202402292

还可以设置多个排序参数 set_size(8, 3)#按照两个参数degree, cardinality排序upset(movies, genres, width_ratio=0.1, sort_intersections_by=c('degree', 'cardinality'))

81131623202402454

对集合分类：group_by 按照数据对应的set不同对集合进行分类 set_size(8, 3)upset( movies, c("Action", "Comedy", "Drama"), width_ratio=0.2, group_by='sets', queries=list( upset_query( intersect=c('Drama', 'Comedy'), color='red', fill='red', only_components=c('intersections_matrix', 'Intersection size') ), upset_query(group='Drama', color='blue'), upset_query(group='Comedy', color='orange'), upset_query(group='Action', color='purple'), upset_query(set='Drama', fill='blue'), upset_query(set='Comedy', fill='orange'), upset_query(set='Action', fill='purple') ))

72961623202402619