This is an example of how to prepare the input file(s) and run metilene3. You should have a table of your samples and a folder with BED files. Here, we use Python3 and bedtools.
import os
import pandas as pd
Load the table that contains sample IDs, group information (if you want to use supervised mode) and names of BED files:
df = pd.read_excel('samples.xlsx')
df[['ID','Group']].to_csv("group.tsv", sep='\t', index=False,)
df[['ID','Group','bedfile']].head(5)
ID | Group | bedfile | |
---|---|---|---|
0 | S0 | Blood-T | S0.bed |
1 | S100 | Blood-B | S100.bed |
2 | S101 | Blood-NK | S101.bed |
3 | S102 | Blood-T | S102.bed |
4 | S103 | Blood-T | S103.bed |
Use bedtools to merge all BED files to a matrix:
os.system('cd /path_to_BEDfiles/;\
bedtools unionbedg -i '+(df['bedfile']+' ').sum()+\
'-header -names '+(df['ID']+' ').sum()+\
'-filler NA > ./input.raw')
Load the merged matrix and convert it to the metilene3 format:
met = pd.read_table('./input.raw')
met = met.drop(columns='start')
met.to_csv("./input.tsv", sep='\t', index=False, float_format='%.3f', na_rep='.')
met.head(5)
chrom | end | S0 | S100 | S101 | S102 | S103 | S104 | S105 | S106 | ... | S91 | S92 | S93 | S94 | S95 | S96 | S97 | S98 | S99 | S9 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | chr1 | 10470 | 0.666667 | NaN | 0.675000 | 0.625000 | 0.575000 | 0.652174 | 0.760000 | 0.552632 | ... | 0.690476 | 0.700000 | 0.562500 | 0.708333 | 0.875000 | 0.700000 | 0.968750 | 0.550000 | 0.727273 | 0.774194 |
1 | chr1 | 10472 | 0.600000 | 0.900000 | 0.725000 | 0.800000 | 0.846154 | 0.869565 | 0.750000 | 0.625000 | ... | 0.880952 | 0.750000 | 0.740000 | 0.863636 | 0.833333 | 0.633333 | 0.968750 | 0.625000 | 0.909091 | 0.882353 |
2 | chr1 | 10485 | 0.875000 | 0.909091 | 0.682927 | 0.904762 | 0.790698 | 0.846154 | 0.814815 | 0.727273 | ... | 0.886364 | 0.818182 | 0.758621 | 0.800000 | 0.958333 | 0.812500 | 0.976744 | 0.795918 | 0.962963 | 0.897436 |
3 | chr1 | 10490 | 0.882353 | 1.000000 | 0.860465 | 1.000000 | 0.795455 | 0.827586 | 0.964286 | 0.767442 | ... | 0.866667 | 0.878788 | 0.896552 | 0.920000 | 0.875000 | 0.787879 | 1.000000 | 0.843137 | 0.896552 | 0.925000 |
4 | chr1 | 10494 | 0.705882 | 0.818182 | 0.813953 | 1.000000 | 0.727273 | 0.678571 | 0.586207 | 0.642857 | ... | 0.911111 | 0.764706 | 0.766667 | 0.760000 | 0.791667 | 0.848485 | 0.979167 | 0.673077 | 0.933333 | 0.875000 |
Run metilene3 with unsupervised mode:
os.system('python /path_to_metilene3/metilene3.py \
-i ./input.tsv \
-o ./results_unsupervised')
Or, run metilene3 with supervised mode:
os.system('python /path_to_metilene3/metilene3.py \
-i ./input.tsv \
-g ./group.tsv \
-o ./results_supervised')
Finally you can find the results under ./results_unsupervised
or ./results_supervised
.