batch processing - automatic loading of directory files into a python script -
i have 125 files in directory on linux machine. have script called annotate.py takes in 1 single file , adds features column. able put filename of 1 of 125 files , run annotate.py script, not effective programming.
all 125 files have similar format in terms of column names, , column numbers. can please tell me how can run annotate.py on 125 files?
annotate.py merges 2 files on chromosome , position columns. input_file1 125 files read in 1 @ time , merged input_file2. output should different files each name of original input file 1.
#!/usr/bin/python #python snp_search.py input_file1 input_file2 import numpy np import pandas pd snp_f=pd.read_table('input_file1.txt', sep="\t", header=none)#input_file1 snp_f.columns=['chr','pos'] lsnp_f=pd.read_table('input2_snpsearch.txt', sep="\t", header=true)#input_file2 lsnp_f.columns=['snpid','chr','pos'] final_snp=pd.merge(snp_f,lsnp_f, on=['chr','pos']) final_snp.to_csv('input_file1_annotated.txt', index=false,sep='\t')
please help! thanks!
the os
module friend http://docs.python.org/2/library/os.html. basic idea import os
, use os.listdir()
list of files in directory you're interested in. following work.
import numpy np import pandas pd import os input_file2 = 'input2_snpssearch.txt' input_dir = './' #or other path files = os.lisdir(input_dir) #listdir give file names #you don't want merge input_file2 , #in case it's in same directory other files #filter out. files_of_interest = (f f in files if f != input_file2) f in files_of_interest: full_name = os.path.join(input_dir, f) #necessary if input_dir not './' snp_f=pd.read_table(full_name, sep="\t", header=none)#input_file1 snp_f.columns=['chr','pos'] lsnp_f=pd.read_table(input_file2, sep="\t", header=true)#input_file2 lsnp_f.columns=['snpid','chr','pos'] final_snp=pd.merge(snp_f,lsnp_f, on=['chr','pos']) new_fname = f.split('.')[0] + '_annotated.txt' final_snp.to_csv(os.path.join(input_dir, new_fname), index=false,sep='\t')
Comments
Post a Comment