持续更新, 最后更新时间2017-04-05

pymongo获得client的数据库名

`client.database_names()'

pymongo获得db中的collection名

`db.collection_names()'

pymong获得collection中的item名

`col.find_one().keys()' // 会不会有缺漏?
'''
因为会有缺漏,所以要写一个遍历所有document的函数。
stackflow上有mapreduce的方法,暂时不学,记着以后再看。
'''

连接数据库模块


当前目录下建立一个模块(即文件夹), 命名为util
文件目录树如下:
|util
|----+__init__.py #空文件, 表示此文件夹里都是模块内容
|----+mongo_util.py #连接数据库方法的文件. 代码如下:

import pymongo
# 导入pymongo模块
import sys # linux下调用系统指令 (是不是访管特权指令?)
from pymongo import MongoClient # 重点
from pymongo.errors import ConnectionFailure # 连接异常处理

def get_mongo_client(uri="mongodb://192.168.x.x:port", db='DBName'): 
    # port为端口号
    try:
        client = MongoClient(uri) # 连接客户端, client为连接了服务器的对象

    except: ConnectionFailure, e:
        # ConnectionFailure是一个继承于exception的类
        # (在声明class时括号里的为继承类,空为继承"造物主"类)
        sys.stderr.write('Could not connect to MongoDB: %s' % e)
        # '%'为格式化化输出, 'e'为指代ConnectionFailure的对象指针

        sys.exit(1) # 输出系统返回参数1

    db_handle = client[db]
    # 不能用client.db, 因为要使得client后的
    # 数据库是db指代的字符串,而不是'db'这个字符串

    print 'connected succeed.'
    print 'URI: ' + uri + ' Database: ' + db # 输出成功字样

    return db_handle # 返回指向对象的handle

MongoDB中的stage之$sample


users = auth_db.users.aggregate([{'$sample': {'size': 100}}])

$sample: Randomly selects the specified number of documents from its input.
stage: 工作阶段,$group,$out等等都是stage.
这是什么意思? 从他的输出中随机选择一个特定的数字?

通过查阅官方文档后得到:

Example
Given a collection named users with the following documents:

{ "_id" : 1, "name" : "dave123", "q1" : true, "q2" : true }
{ "_id" : 2, "name" : "dave2", "q1" : false, "q2" : false }
{ "_id" : 3, "name" : "ahn", "q1" : true, "q2" : true }
{ "_id" : 4, "name" : "li", "q1" : true, "q2" : false }
{ "_id" : 5, "name" : "annT", "q1" : false, "q2" : true }
{ "_id" : 6, "name" : "li", "q1" : true, "q2" : true }
{ "_id" : 7, "name" : "ty", "q1" : false, "q2" : true }

The following aggregation operation randomly selects 3 documents from the collection:

db.users.aggregate(
    [ { $sample: { size: 3 } } ]
)

The operation returns three random documents.

得出来的是一个list, 需要执行:

for i in a: 
    print i

上面代码是mongo shell的代码, 再python中需要加双引号. 即:

db.users.aggregate(
    [{'$sample':{'size':3}}]
)

MongoDB基础命令之aggregate

NoSQL不像关系型Sql用select命令,在MongoDB中查询一类东西用的是aggregate(聚合),如以下代码:

cursor = db.restaurants.aggregate(
    [
        {"$group": {"_id": "$borough", "count": {"$sum": 1}}}
        # 此_id不是主键, 只是字符串.
    ]
)

db是handle,即db = client.db_name, cursor是光标, 代表handle.collection.find之后的指示针

for document in cursor:
    print document

得到输出结果:

{u'count': 51, u'_id': u'Missing'}
{u'count': 969, u'_id': u'Staten Island'}
{u'count': 10259, u'_id': u'Manhattan'}
{u'count': 2338, u'_id': u'Bronx'}
{u'count': 5656, u'_id': u'Queens'}
{u'count': 6086, u'_id': u'Brooklyn'}

即相当于select borough from restaurants group by borough.
在关系型数据库中应该是这么写,但是输出统计就忘了,以后再补上.
把NoSQL中的document按照特性聚合起来, aggregate就是这种用处.
注: $group语句中, 先写_id:borough和先写count:$sum 都是一样的输出结果, 即先出count再出_id.

bulk


资料参考:mongodb_api官方文档

bulk_write(requests, ordered=True, bypass_document_validation=False)
Send a batch of write operations to the server.

>>> for doc in db.test.find({}):
... print(doc)
...
{u'x': 1, u'_id': ObjectId('54f62e60fba5226811f634ef')}
{u'x': 1, u'_id': ObjectId('54f62e60fba5226811f634f0')}
>>> # DeleteMany, UpdateOne, and UpdateMany are also available.
...
>>> from pymongo import InsertOne, DeleteOne, ReplaceOne
>>> requests = [InsertOne({'y': 1}), DeleteOne({'x': 1}),
... ReplaceOne({'w': 1}, {'z': 1}, upsert=True)]
>>> result = db.test.bulk_write(requests)
>>> result.inserted_count
1
>>> result.deleted_count
1
>>> result.modified_count
0
>>> result.upserted_ids
{2: ObjectId('54f62ee28891e756a6e1abd5')}
>>> for doc in db.test.find({}):
... print(doc)
...
{u'x': 1, u'_id': ObjectId('54f62e60fba5226811f634f0')}
{u'y': 1, u'_id': ObjectId('54f62ee2fba5226811f634f1')}
{u'z': 1, u'_id': ObjectId('54f62ee28891e756a6e1abd5')}

从上面代码中可以看出, bulk_write中的requests是一系列请求的list, 一次性处理一系列请求, 效率会大大提高, 虽然还没看源码, 但原理应该是减少了每次插入时的开关数据库访问的时间.

hashlib


看了下hashlib的源码,源码注释中有以下代码注释

'''
$Id$
Copyright (C) 2005 Gregory P. Smith (greg@krypto.org)
Licensed to PSF under a Contributor Agreement.Licensed to PSF under a Contributor Agreement.

__doc__ = """hashlib module - A common interface to many hash functions.

new(name, string='') - returns a new hash object implementing the
given hash function; initializing the hash
using the given string data.

Named constructor functions are also available, these are much faster
than using new():

md5(), sha1(), sha224(), sha256(), sha384(), and sha512()

More algorithms may be available on your platform but the above are guaranteed
to exist. See the algorithms_guaranteed and algorithms_available attributes
to find out what algorithm names can be passed to new().

NOTE: If you want the adler32 or crc32 hash functions they are available in
the zlib module.

Choose your hash function wisely. Some have known collision weaknesses.
sha384 and sha512 will be slow on 32 bit platforms.

Hash objects have these methods:
- update(arg): Update the hash object with the string arg. Repeated calls
are equivalent to a single call with the concatenation of all
the arguments.
- digest(): Return the digest of the strings passed to the update() method
so far. This may contain non-ASCII characters, including
NUL bytes.
- hexdigest(): Like digest() except the digest is returned as a string of
double length, containing only hexadecimal digits.
- copy(): Return a copy (clone) of the hash object. This can be used to
............
'''

源码注释中还附上了用法例子:

import hashlib
m = hashlib.md5()
m.update("Nobody inspects")
m.update(" the spammish repetition")
m.digest()
'\\xbbd\\x9c\\x83\\xdd\\x1e\\xa5\\xc9\\xd9\\xde\\xc9\\xa1\\x8d\\xf0\\xff\\xe9'

More condensed:

hashlib.sha224("Nobody inspects the spammish repetition").hexdigest()
'a4337bc45a8fc544c03f52dc550cd6e1e87021bc896588bd79e901e2'

Python2 与 python3 的range()函数

待更新

python中多线程处理模块multiprocessing.Process

请看官方文档中关于multiprocessing的说明

其中,multiprocessing中的join()函数作用是lock

请看博客园中的这篇博文

mongodb设置用户登录及备份
db.createUser({user:"bkuser2",pwd:"Bkuser2",roles:[{role:"backup",db:"admin"}]})     ---创建用于备份时的用户,如若是恢复权限,则将backup换为restore即可
注:新建backup账户时,roles里面的db必须是admin,要不然会报错,如下:
rs1:PRIMARY> db.createUser({user:"bkuser2",pwd:"Bkuser2",roles:[{role:"backup",db:"test"}]})
2016-11-11T12:25:05.103+0800 E QUERY    [thread1] Error: couldn't add user: No role named backup@test :