Tuesday, October 1, 2013

MongoDB Export to JSON/CSV

In this tutorial, i’ll show you how to backup your mongodb data in JSON or CSV and make the MongoDB data ready to be imported into R, I've recently been using the RMongoDB package for this but thought now that i'm running larger data files i'll switch to csv/json formats that are read faster by R.

Backup DB with mongoexport

Dive into  some of the common use options.
$ mongoexport
Export MongoDB data to JSON or CSV files.

[ --host ] arg         mongo host to connect to ( <set name>/s1,s2 for
[ --username ] arg     username
[ --password ] arg     password
[ --db ] arg           database to use
[ --collection ] arg   collection to use (some commands)
[ --query ] arg        query filter, as a JSON string
[ --out ] arg          output file; if not specified, stdout is used
Export all documents (all fields) into the file “refugees.json“ or “refugees.csv”
$ mongoexport -d refugees -c stats -o refugees.json
connected to:
exported 1234242 records

$ mongoexport -d refugees -c stats --csv >  refugees.csv
Export all documents with specific fields “createdAt”, “actionTypes” and “sources” only.
$ mongoexport -d refugees -c stats -f "createdAt,actionTypes,sources" -o refugees.json
connected to:
exported 123243251 records
Export all documents with a specific query, in this case, only document with dates created “createdAt > 2013-09-01” will be exported.
$mongoexport -d refugees -c stat -f "createdAt,actionTypes,sources" -q '{"createdAt":{"$gte":new Date(1377982800000)}}' --csv >  refugees.csv
connected to:
exported 2320903 records
Connect to remote server like 192.168.xxx.xxx, using username and password.
$ mongoexport -h  192.168.xx.xx  -d refugees -c stats -u johndoe  -p pass123 -o refugees.json
connected to: 192.168.xx.xx
exported 10951 records
Review the exported file.
One issue, i came across was that i wasn’t representing my epoch date value correctly, using the mongo shell: I did the steps below.
> new Date(1377982800000)

If you are looking to convert ISODate to epoch, just call date in the Mongodb shell, something like this:
> new Date(2013,08,01)*11377982800000
Then to verify:
> new Date(1377982800000)

Getting a ranged date would be in these lines:
mongoexport --db refugees --collection stats --query '{"createdAt":{$gt:new Date(1360040400000),$lt:new Date(1360990800000)}, "source" : "31u314lpapd"}' --csv >  refugees.csv

No comments:

Post a Comment

Add any comments if it helped :)