EllisLab text mark
Advanced Search
     
Suggestions on Importing data from XLS (Excel) of 100,000 records
Posted: 16 January 2012 01:54 AM   [ Ignore ]
Joined: 2011-07-19
12 posts

Hi Folks,
I have a XLS file with 100,000 records. Basically I have to extract data from XLS and distributed to few tables in my database. Before I did this with about 9k records, works well but performance issue, It would take about 9-15 mins depending on my network traffic, to mention its locally. If I were to run in my production i’m not sure how long it would take it.
I’m not ready to make php script to run from command prompt I guess that would be bit faster than running it from browser, as customer do not want to do that way.

My Idea was to run from the browser with by
- Uploading the XLS file to Server
- Parsing the XLS file
- Extracting each column and store in the tables
Considering 100k records, I guess it would take an half hour or an hour for migration.
any other suggestions would really appreciate.

 
Posted: 16 January 2012 02:50 AM   [ Ignore ]   [ # 1 ]   [ Rating: 0 ]
Avatar
Joined: 2008-09-11
758 posts

my ideas is to save XLS as CSV header will be low and parsing will be easy and it will work really fast for such a huge amount of data.

 Signature 

CI,JQuery,Google Maps | widget with CI loader | Thumbnail, Image Resize, Image Crop Helper | CI shortcode

 
Posted: 16 January 2012 04:16 AM   [ Ignore ]   [ # 2 ]   [ Rating: 0 ]
Joined: 2011-07-19
12 posts

Thanks for reply Mr. Farooq. I’m going to try it.
Which one would be faster enough?
Is CSV with tab separated OR textfile with tab separated.?
I’m going to try both and will post the result later.

 
Posted: 16 January 2012 08:14 AM   [ Ignore ]   [ # 3 ]   [ Rating: 0 ]
Joined: 2008-08-02
107 posts

just something which might help you decide. i have a situation where i need to regularly import > 250k records from some ado tables into mysql tables. I use a small program to convert this into a tab delimited text file and then execute `load data infile “abc.txt” into table mytable`. 100mbytes of data into 30 tables in circa 3mins. the slow part is extracting it from the ado table. the sql load is done in a flash 8=)

 
Posted: 16 January 2012 08:23 AM   [ Ignore ]   [ # 4 ]   [ Rating: 0 ]
Joined: 2008-08-02
107 posts

forgot to mention. i trigger the importing by clicking a button on a webpage, this sets a bit in a table. the table is checked every 5mins by my import program and if the bit is set it imports the latest data. so i can control it remotely from any browser, no need to go near the server (its a full 20metres away, I’d have to walk)

 
Posted: 16 January 2012 10:18 PM   [ Ignore ]   [ # 5 ]   [ Rating: 0 ]
Joined: 2011-07-19
12 posts

Hi BG,
Thank you, great statistics grin. I’m still trying out both the solutions. I’m facing different problem now. I have comma separated CSV and TXT, I have fields that contains comma (between the names). Ex: Micheal,Schumacher. When I parse the CSV or TXT, Micheal is considered as one field, Schumacher is consider as another field as it supposedly one filed ‘Micheal,Schumacher’. :-( how do I over come this problem.

 
Posted: 17 January 2012 04:23 AM   [ Ignore ]   [ # 6 ]   [ Rating: 0 ]
Joined: 2008-08-02
107 posts

tab delimited is best imo. field data very seldom has tabs in it whereas apos and commas often occur such as in an address, name etc. with csv, you have to escape these chars whereas no need to with tab. whatever approach you use to prepare the data for importing, you need to ensure date,datetime,boolean,float and integer data is in the format which mysql expects

 
Posted: 19 January 2012 04:46 AM   [ Ignore ]   [ # 7 ]   [ Rating: 0 ]
Joined: 2011-07-19
12 posts

Hi BG,
Agree with you, yes I follow the same approach, working fine, but I still need to tune the performance. will update later on this issue.
Thank you very much for your time. appreciate that.