Blaise F Egan commented on Ahmed's blog post Help Needed! Getting a user-friendly tool (or system) to prepare large data sets for analysis...

"I would recommend MariaDB, the replacement for MySQL. There are lots of good books on SQL and MySQL that you can learn from. The software is free."

Oct 25, 2013

Blaise F Egan replied to Vincent Granville's discussion Job interview question: what is wrong with this picture?

"The two pictures would only be comparable if they were on the same scale. They aren't.
The Moon's diameter is 3470 km. At a distance of 1.6 billion km it subtends an angle of 3470/(1.6 x 10^9) radians. That's about 2 microradians or…"

Aug 5, 2013

Blaise F Egan replied to Vincent Granville's discussion Job interview question: what is wrong with this picture?

">Trying to understand why Earth is the brightest object in this image.
Because it is accurately pointed at Earth and other objects are out of the field of view.
>Also how can you see our moon from Saturn, but none of Saturn…"

Aug 5, 2013

Blaise F Egan replied to Vincent Granville's discussion Job interview question: what is wrong with this picture?

"The Earth-Mood distance in about 384 400 km. 1 billion miles is about 1.6 billion km. Viewed at its maximum the Earth-Moon distance subtends an angle of 384 400 / 1000 000 000 = 3.84 x 10^-4 radians or about 79 seconds of arc. This is easily…"

Aug 4, 2013

Blaise F Egan replied to ratheen chaturvedi's discussion Decision tree vs Logistic Regression

"If undecided go with LR as it has better diagnostics (analysis of deviance).
If your predictors are most numeric functions that your Exploratory Data Analysis show are smooth functions then you would want to go with LR. LR is good with smooth…"

Jul 26, 2013

Blaise F Egan replied to Vincent Granville's discussion Why is Vlookup (in Excel) 1,000 times slower than hash tables in Python?

"I've not yet encountered stray tabs in character fields, but I agree it's possible. As you say, there's no standard. I did encounter a frustrating 18 GB file recently that was tab-separated as I requested but two adjacent fields had…"

Jul 8, 2013

Blaise F Egan replied to Vincent Granville's discussion Why is Vlookup (in Excel) 1,000 times slower than hash tables in Python?

"I've had a comma in a data field many times. I now prefer to use tab-separated instead of CSV for exactly that reason."

Jun 25, 2013

Blaise F Egan replied to Vincent Granville's discussion Why is Vlookup (in Excel) 1,000 times slower than hash tables in Python?

Jun 24, 2013

Blaise F Egan replied to Vincent Granville's discussion Why is Vlookup (in Excel) 1,000 times slower than hash tables in Python?

"Excel performance tips.
http://www.techrepublic.com/blog/10things/10-ways-to-improve-excel-performance/2842"

Jun 24, 2013

Blaise F Egan replied to Vincent Granville's discussion Why is Vlookup (in Excel) 1,000 times slower than hash tables in Python?

"R is really good at this. You can use the merge() function
http://127.0.0.1:20335/library/base/html/merge.html
or for real speed you can use the data.table…"

Jun 24, 2013

