1. 실습환경 : ubuntu20.04[Virtual Box]
2. 파일경로 : /usr/spark/pyspark-mongodb-connector.py
3. 작성코드
##Spark 연결 테스트 코드###
from pyspark.sql import SparkSession
# 관리자명:비밀번호@아이피주소:포트/데이터베이스.collection?authSource=admin
input_uri = "mongodb://hyeongju:dbgudwn1!@127.0.0.1:27017/mydatabase.testcol?authSource=admin"
output_uri = "mongodb://hyeongju:dbgudwn1!@127.0.0.1:27017/mydatabase.testcol?authSource=admin"
myspark = SparkSession\
.builder\
.appName("twitter")\
.config("spark.mongodb.input.uri", input_uri)\
.config("spark.mongodb.output.uri", output_uri)\
.config('spark.jars.packages','org.mongodb.spark:mongo-spark-connector_2.12:2.4.2')\
.getOrCreate()
df = myspark.read.format('com.mongodb.spark.sql.DefaultSource').load()
print(df.first())
###추가1###
mongodb.conf에 Security가 enabled 되있다면, authSource=admin을 붙여줘야한다.
admin 데이터베이스 > system.users에 있는 계정정보들을 확인하기때문..
input uri와 output uri는 동일하게 설정
###추가2###
Spark <-> Mongodb간 Session을 pyspark.sql import SparkSession을 사용하여
열어줘야 한다.
[결과 출력]
21/10/07 23:50:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform...
using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Row(_id=Row(oid='615edc5c341bae6b308c259e'), name='hyeongju') << 로드결과
참고링크 : https://docs.mongodb.com/spark-connector/current/python-api/
Spark Connector Python Guide — MongoDB Spark Connector
Docs Home → MongoDB Spark ConnectorSource CodeBasic working knowledge of MongoDB and Apache Spark. Refer to the MongoDB documentation and Spark documentation for more details.Running MongoDB instance (version 2.6 or later).Spark 2.4.x.Scala 2.12.xThis tu
docs.mongodb.com
Write, Read tutorial을 꼭해봐야한다..