How to get millions of records in sql table faster using python?
Answers
Explanation:
Here is my Python code:
import MySQLdb as mdb import MySQLdb.cursors condb = mdb.connect(host = "db01.foo.com.jp", user = "coolguy", passwd = "xxxx", db = "CISBP_DB", cursorclass = mdb.cursors.SSCursor ) crsr = condb.cursor() sql = """ SELECT uniprot_id, gene_symbol FROM id_mapping_uniprotkb """ crsr.execute(sql) rows = crsr.fetchall() #gene_symbol2uniprot = dict(crsr.fetchall()) gene_symbol2uniprot = {} for uniprotid,gene_symbol in rows:
Answer:
I need to process large database within a time frame of three weeks. First I tried by retrieving the entire data collected within three weeks with MySQL statement (million of rows) but the process got killed as the %MEM reached 100%. Then I modified the python code to retrieve the data within a sequence of time frames and processing with one row at a time instead of dealing with entire data. Though it improves the memory utilization a bit but made the scripts very slow. How can I improve the performance of the script? Is there any tip for that. I tried with pypy but it conflicted with MySQLdb module.