sqldump

(coffee) => code

Homebrew and Python-MySQL

To be able to pip install the python mysql library on OS X, you need mysql client installed locally. If you have no need for the full mysql package, here’s how to get it working:

1
2
brew install mysql --client-only --universal
pip install MySQL-python

Those @!#$ Weird Characters in Hadoop / Faunus Output

Text-file output from Faunus often always contains garbage characters. To scrub them out, I use this little python script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import re
from string import printable

f = open("output.csv", "r")
line = f.readline()
line = re.sub("[^{}]+".format(printable), "", line)
line = line.replace("\n", "")

while line:
  print line
  line = f.readline()
  if line:
    line = re.sub("[^{}]+".format(printable), "", line)
    line = line.replace("\n", "")

f.close()

And then a simple

1
python process.py > scrubbed.output.txt

IAM Policy for Access to a Single S3 Bucket

Assuming the bucket name is my-bucket

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
  "Statement": [
    {
      "Action": [
        "s3:ListAllMyBuckets"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::*"
    },
    {
      "Action": "s3:*",
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::my-bucket", 
        "arn:aws:s3:::my-bucket/*"
      ]
    }
  ]
}

Source: http://andrewhitchcock.org/?post=325

MongoDB Cursor Timeout

When using the MongoDB Java Driver, if you have long running operations that require you to keep a cursor open, you’ll end up with a MongoException that says “oops, the cursor timed out” after about 10 minutes of activity.

The Mongo docs say that cursors timeout due to inactivity, but as of driver version 2.11.1, I’ve had cursors timeout after 10 minutes even though documents were being fetched from the cursor continuously. Turns out, the fix to keep the cursor alive is surprisingly easy.

1
cursor.addOption(com.mongodb.Bytes.QUERYOPTION_NOTIMEOUT);

Vandalism

Generally against vandalism but this is funny.

Twitter API: 401 Unauthorized

Symptom: Twitter API returns a 401 Unauthorized when you start the OAuth process by obtaining a bearer token. This can happen all of a sudden possibly breaking existing processes that were working.

Problem: The most likely culprit is the clock on your server. If this gets out of sync (even by as little as 20s), Twitter’s amazing API will barf and return a helpful “401 Unauthorized”

Solution:

1
sudo ntpdate ntp.ubuntu.com

MongoDB 2.4.0 EXT4 Readahead Warnings

After deploying MongoDB 2.4.0 for the first time and connecting to it from a remote shell, it warned me:

1
2
3
4
5
6
7
8
MongoDB shell version: 2.4.0
connecting to: abc.com/test
Server has startup warnings:
Wed Mar 20 22:40:49.850 [initandlisten]
Wed Mar 20 22:40:49.850 [initandlisten] ** WARNING: Readahead for /data/db is set to 2048KB
Wed Mar 20 22:40:49.850 [initandlisten] ** We suggest setting it to 256KB (512 sectors) or less
Wed Mar 20 22:40:49.850 [initandlisten] ** http://dochub.mongodb.org/core/readahead
>

The fix for this turned out to be:

1
sudo blockdev --setra 256 /dev/md2

Where /dev/md2 is the disk where the database files are stored.

Install Oracle JDK7 on Ubuntu

1
2
3
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer