GitHub - genecell/single-cell-papers-with-code: Papers with code for single cell related papersPapers with code for single cell related papers. Contribute to genecell/single-cell-papers-with-code development by creating an account on GitHub.https://github.com/genecell/single-cell-papers-with-code
python学习
GitHub - CodementorIO/Python-Learning-Resources
r语言实战 r语言基础
https://github.com/biotrainee/RiA2/blob/master/Ch01%20Introduction%20to%20R.Rhttps://github.com/biotrainee/RiA2/blob/master/Ch01%20Introduction%20to%20R.R
pca分析 umap分析 tsne分析
r语言入门单细胞
Advanced R, matching and reordering shortened | Introduction to R - ARCHIVED
Introduction to R - ARCHIVEDhttps://hbctraining.github.io/Intro-to-R/
learning Objectives
- R syntax: Understand the different 'parts of speech'.
- Data types structures in R: Describe the various data types and data structures.
- Data inspection and wrangling: Demonstrate the utilization of functions and indices to inspect and subset data from various data structures.
- Visualizing data: Demonstrate the use of the ggplot2 package to create plots for easy data visualization.
最好的单细胞教程
List of functions for data inspection
We already saw how the functions head()
and str()
can be useful to check the content and the structure of a data.frame
. Here is a non-exhaustive list of functions to get a sense of the content/structure of data.
- All data structures - content display:
str()
: compact display of data contents (env.)class()
: data type (e.g. character, numeric, etc.) of vectors and data structure of dataframes, matrices, and lists.summary()
: detailed display, including descriptive statistics, frequencieshead()
: will print the beginning entries for the variabletail()
: will print the end entries for the variable
- Vector and factor variables:
length()
: returns the number of elements in the vector or factor
- Dataframe and matrix variables:
dim()
: returns dimensions of the datasetnrow()
: returns the number of rows in the datasetncol()
: returns the number of columns in the datasetrownames()
: returns the row names in the datasetcolnames()
: returns the column names in the dataset
Data subsetting with base R: vectors and factors | Introduction to R - ARCHIVED
qc
qc
Chapter 3 The Seurat object | scRNAseq Analysis in R with Seurat
cpp
Sample sex
When working with human or animal samples, you should ideally constrain you experiments to a single sex to avoid including sex bias in the conclusions. However this may not always be possible. By looking at reads from chromosomeY (males) and XIST (X-inactive specific transcript) expression (mainly female) it is quite easy to determine per sample which sex it is. It can also bee a good way to detect if there has been any sample mixups, if the sample metadata sex does not agree with the computational predictions.
To get choromosome information for all genes, you should ideally parse the information from the gtf file that you used in the mapping pipeline as it has the exact same annotation version/gene naming. However, it may not always be available, as in this case where we have downloaded public data. Hence, we will use biomart to fetch chromosome information. As the biomart instances quite often are unresponsive, you can try the code below, but if it fails, we have the file with gene annotations on github here. Make sure you put it at the correct location for the path genes.file to work.
genes.file = "data/results/genes.table.csv"
if (!file.exists(genes.file)) {
suppressMessages(require(biomaRt))
# initialize connection to mart, may take some time if the sites are
# unresponsive.
mart <- useMart("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl")
# fetch chromosome info plus some other annotations
genes.table <- try(biomaRt::getBM(attributes = c("ensembl_gene_id", "external_gene_name",
"description", "gene_biotype", "chromosome_name", "start_position"), mart = mart,
useCache = F))
if (!dir.exists("data/results")) {
dir.create("data/results")
}
if (is.data.frame(genes.table)) {
write.csv(genes.table, file = genes.file)
}
if (!file.exists(genes.file)) {
download.file("https://raw.githubusercontent.com/NBISweden/workshop-scRNAseq/master/labs/misc/genes.table.csv",
destfile = "data/results/genes.table.csv")
genes.table = read.csv(genes.file)
}
} else {
genes.table = read.csv(genes.file)
}
genes.table <- genes.table[genes.table$external_gene_name %in% rownames(data.filt),
]
Now that we have the chromosome information, we can calculate per cell the proportion of reads that comes from chromosome Y.
chrY.gene = genes.table$external_gene_name[genes.table$chromosome_name == "Y"]
data.filt$pct_chrY = colSums(data.filt@assays$RNA@counts[chrY.gene, ])/colSums(data.filt@assays$RNA@counts)
Then plot XIST expression vs chrY proportion. As you can see, the samples are clearly on either side, even if some cells do not have detection of either.
FeatureScatter(data.filt, feature1 = "XIST", feature2 = "pct_chrY")
正常流程,未使用harmony
Single Cell RNA-Seq Analysis and Visualization Workshop
从原始faseq数据开始单细胞流程4 Data Preprocessing | ANALYSIS OF SINGLE CELL RNA-SEQ DATA
画图
Ch 3: Data visualization | Yet another 'R for Data Science' study guideNotes and solutions to Garrett Grolemund and Hadley Wickham's 'R for Data Science'https://brshallo.github.io/r4ds_solutions/03-data-visualization.html#aesthetic-mappings
ggplot2 偷图最全 动图
https://exts.ggplot2.tidyverse.org/gallery/
ggplot2语法根源
ggplot2: Elegant Graphics for Data Analysis (3e) - 1 Introduction
ggplot2 is designed to work iteratively. You start with a layer that shows the raw data. Then you add layers of annotations and statistical summaries. This allows you to produce graphics using the same structured thinking that you would use to design an analysis. This reduces the distance between the plot in your head and the one on the page.
图形语法:. The grammar of graphics is an answer to the question of what is a statistical graphic? ggplot2 (Wickham 2009) builds on Wilkinson's grammar by focussing on the primacy of layers and adapting it for use in R. In brief, the grammar tells us that a graphic maps the data to the aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars).
所有的图表都由数据、映射描述了数据变量如何映射到审美属性。有五个映射组件:
-
图层(Layer)是几何元素和统计转换的集合。几何元素(简称geoms)代表你在图表中实际看到的内容:点、线、多边形等。统计转换(简称stats)总结数据:例如,对观察结果进行分箱和计数以创建直方图,或者拟合一个线性模型。
-
比例尺(Scales)将数据空间中的值映射到审美空间中的值。这包括颜色、形状或大小的使用。比例尺还绘制图例和坐标轴,这使得可以从图表中读取原始数据值(一种逆映射)。
-
坐标系统(Coord)描述了数据坐标如何映射到图形平面。它还提供了轴线和网格线来帮助阅读图表。我们通常使用笛卡尔坐标系统,但也有其他可用的坐标系统,包括极坐标和地图投影。
-
分面(Facet)指定如何将数据子集分解并显示为小多重图。这也被称为条件化或格栅化/镂空化。
-
主题(Theme)控制显示的细节,如字体大小和背景颜色。虽然ggplot2中的默认设置已经经过精心选择,但你可能需要查阅其他参考资料来创建一个吸引人的图表。一个好的起点是Tufte的早期作品(Tufte 1990, 1997, 2001)。
2.3 Key components
Every ggplot2 plot has three key components:
-
data,
-
A set of aesthetic mappings between variables in the data and visual properties, and
-
At least one layer which describes how to render each observation. Layers are usually created with a geom function.
Here's a simple example:
https://ggplot2.tidyverse.org/reference/ggplot.html(mpg, https://ggplot2.tidyverse.org/reference/aes.html(x = displ, y = hwy)) +
https://ggplot2.tidyverse.org/reference/geom_point.html()
cs
This produces a scatterplot defined by:
Data: mpg.
Aesthetic mapping: engine size mapped to x position, fuel economy to y position.
Layer: points.
Pay attention to the structure of this function call: data and aesthetic mappings are supplied in ggplot(), then layers are added on with +. This is an important pattern, and as you learn more about ggplot2 you'll construct increasingly sophisticated plots by adding on more types of components.
Almost every plot maps a variable to x and y, so naming these aesthetics is tedious, so the first two unnamed arguments to aes() will be mapped to x and y. This means that the following code is identical to the example above:
ggplot(mpg, aes(displ, hwy)) +
geom_point()
We'll stick to that style throughout the book, so don't forget that the first two arguments to aes() are x and y. Note that we've put each command on a new line. We recommend doing this in your own code, so it's easy to scan a plot specification and see exactly what's there. In this chapter, we'll sometimes use just one line per plot, because it makes it easier to see the differences between plot variations.
2.4 Colour, size, shape and other aesthetic attributes
To add additional variables to a plot, we can use other aesthetics like colour, shape, and size (NB: while we use British spelling throughout this book, ggplot2 also accepts American spellings). These work in the same way as the x
and y
aesthetics, and are added into the call to aes():
aes(displ, hwy, colour = class)
aes(displ, hwy, shape = drv)
aes(displ, hwy, size = cyl)
ggplot2 takes care of the details of converting data (e.g., 'f', 'r', '4') into aesthetics (e.g., 'red', 'yellow', 'green') with a scale . **There is one scale for each aesthetic mapping in a plot. The scale is also responsible for creating a guide, an axis or legend, that allows you to read the plot, converting aesthetic values back into data value****s.**For now, we'll stick with the default scales provided by ggplot2. You'll learn how to override them in Chapter 11.
To learn more about those outlying variables in the previous scatterplot, we could map the class variable to colour:
https://ggplot2.tidyverse.org/reference/ggplot.html(mpg, https://ggplot2.tidyverse.org/reference/aes.html(displ, hwy, colour = class)) +
https://ggplot2.tidyverse.org/reference/geom_point.html()
r for datascience
R for Data Science: Exercise Solutions
python学习
Intro to Pythonhttps://ourcodingclub.github.io/tutorials/python-intro/