Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KDT5_Team_6 과제 제출 #2

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1,693 changes: 380 additions & 1,313 deletions README.md

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions Summak-Crawling/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
node_modules
dist
package-lock.json
tsconfig.tsbuildinfo
.env
8 changes: 8 additions & 0 deletions Summak-Crawling/.prettierrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"singleQuote": true,
"semi": false,
"useTabs": false,
"tabWidth": 2,
"trailingComma": "all",
"printWidth": 200
}
70 changes: 70 additions & 0 deletions Summak-Crawling/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# ToyProject Ninteno Scripts

Summtendo(가명) ToyProject 개발 단계에서 필요한 Scripts를 모아놓은 저장소 입니다.

환경 변수를 확인 후 실행 해주세요.

**(주의)** 문구가 있는 스크립트를 실행하면 서버에 영향을 미칩니다.

> 근데 다시 추가하면 되니까 마음 편하게 쓰세요!

## Product Scrapper

nintendo site의 products을 스크래핑하고 추가한다.

### 제품 추가 (주의)

30초 간격으로 product 정보가 **서버에 저장된다.**

중복된 데이터가 삽입될 수 있음

```bash
npm run scrape

> Enter a Scrap URL :
https://store.nintendo.co.kr/games/best-sellers
```

### 제품 추가 테스트 (권장)

30초 간격으로 product 정보가 **콘솔에 출력된다.**

테스트용 스크립트로 **얼마든지 사용해도 됨**

```bash
npm run scrape:test

> Enter a Scrap URL :
https://store.nintendo.co.kr/games/best-sellers
```

## Product 관련 Scripts

### findAll

products를 반환

```bash
npm run findAll
```

### getProduct

product의 상세 정보를 반환

```bash
npm run getProduct

> Enter a product id :
asda12k3lasf
```

### <p style="color : red">deleteAll (주의)</p>

실행 후 10초 뒤 모든 products를 삭제한다.

사용하기전 주의가 필요

```bash
npm run deleteAll
```
23 changes: 23 additions & 0 deletions Summak-Crawling/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"name": "nintendo_scripts",
"version": "1.0.0",
"description": "",
"main": "index.js",
"scripts": {
"scrape": "tsc & cross-env NODE_ENV=production node ./dist/main.js",
"scrape:test": "tsc & cross-env NODE_ENV=development node ./dist/main.js",
"findAll": "tsc & cross-env NODE_ENV=test node ./dist/findAll.js",
"deleteAll": "tsc & node ./dist/deleteAll.js",
"product": "tsc & node ./dist/getProduct.js"
},
"keywords": [],
"author": "",
"license": "ISC",
"dependencies": {
"axios": "^1.4.0",
"cross-env": "^7.0.3",
"dotenv": "^16.3.1",
"puppeteer": "^20.7.2"
},
"type": "module"
}
4 changes: 4 additions & 0 deletions Summak-Crawling/src/assets/referers.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
["https://www.google.com","https://liquipedia.net/dota2/Dota_Pro_Circuit/2021-22/3/Western_Europe/Division_I","https://www.qq.com/","http://www.baidu.com/","http://www.google.cn/","https://www.sina.com.cn/","https://www.writtenchinese.com/top-10-popular-chinese-websites/"
,"https://weibo.com/","https://youku.com/","http://163.com/","https://www.sohu.com/", "http://soso.com/","https://stackoverflow.com/","https://www.partseurope.eu/en/","https://dictionary.cambridge.org/","https://www.carparts.com/","https://www.gov.cn/english/","https://www.chinadaily.com.cn/",
"https://www.cvce.eu/en/education/unit-content/-/unit/803b2430-7d1c-4e7b-9101-47415702fc8e/6d9db05c-1e8c-487a-a6bc-ff25cf1681e0", "http://www.china.org.cn/","https://world.taobao.com/",
"https://blog.btrax.com/", "https://www.facebook.com", "https://www.instagram.com", "https://octopart.com/", "https://www.naver.com/", "https://world.taobao.com/", "https://www.wordsense.eu/wenag/", "https://www.wenag.ch/", "https://soundcloud.com/wongwo", "https://forebears.io/surnames/wongwo"]
31 changes: 31 additions & 0 deletions Summak-Crawling/src/assets/userAgents.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
["Mozilla/5.0 (Linux; Android 8.1.0; jhs561 Build/GIADA.eng.zc.20200922.153858; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/81.0.4044.138 Safari/537.36",
"Mozilla/5.0 (Linux; Android 9; BDL8051C Build/BDL3552T; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/66.0.3359.158 Safari/537.36",
"Mozilla/5.0 (Linux; Android 12; SM-S908U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Mobile Safari/537.36",
"Mozilla/5.0 (Linux; Android 12; SM-N986U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Mobile Safari/537.36",
"Mozilla/5.0 (Linux; Android 10; SM-G960U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Mobile Safari/537.36",
"Mozilla/5.0 (Linux; Android 12; SM-G998U Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/103.0.5060.71 Mobile Safari/537.36 [FB_IAB/FB4A;FBAV/375.1.0.28.111;]",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36 Edg/103.0.1264.71",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:97.0) Gecko/20100101 Firefox/97.0",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36",
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:101.0) Gecko/20100101 Firefox/101.0",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
"Mozilla/5.0 (iPhone; CPU iPhone OS 15_5 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 Instagram 243.1.0.14.111 (iPhone14,3; iOS 15_5; en_US; en-US; scale=3.00; 1284x2778; 382468104)",
"Mozilla/5.0 (iPhone; CPU iPhone OS 15_5 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 Instagram 243.1.0.14.111 (iPhone14,5; iOS 15_5; en_US; en-US; scale=3.00; 1170x2532; 382468104)",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36 Edg/100.0.1185.44",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36 Edg/100.0.1185.44",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.63 Safari/537.36 Edg/102.0.1245.33",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; WebView/3.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18363",
"Mozilla/5.0 (iPhone; CPU iPhone OS 15_5 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 musical_ly_25.1.1 JsSdk/2.0 NetType/WIFI Channel/App Store ByteLocale/en Region/US ByteFullLocale/en isDarkMode/0 WKWebView/1 BytedanceWebview/d8a21c6 FalconTag/",
"Mozilla/5.0 (iPhone; CPU iPhone OS 15_5 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 musical_ly_25.3.0 JsSdk/2.0 NetType/WIFI Channel/App Store ByteLocale/en Region/US RevealType/Dialog isDarkMode/0 WKWebView/1 BytedanceWebview/d8a21c6 FalconTag/",
"AppleCoreMedia/1.0.0.19F77 (iPhone; U; CPU OS 15_5 like Mac OS X; nl_nl)",
"Autoplius.lt/6.6.0 Mozilla/5.0 (iPhone; CPU iPhone OS 15_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 EmbeddedBrowser DeviceUID:",
"Podcasts/1660.5 CFNetwork/1335.0.3 Darwin/21.6.0",
"Mozilla/5.0 (iPhone; CPU iPhone OS 15_5 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 musical_ly_25.1.1 JsSdk/2.0 NetType/WIFI Channel/App Store ByteLocale/en Region/GB ByteFullLocale/en isDarkMode/1 WKWebView/1 BytedanceWebview/d8a21c6 FalconTag/",
"Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Mobile/15E148 Safari/604.1",
"Mozilla/5.0 (Linux; Android 12; SM-G781W Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/103.0.5060.71 Mobile Safari/537.36",
"Mozilla/5.0 (Linux; Android 12; SM-S908U Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/103.0.5060.71 Mobile Safari/537.36 [FB_IAB/FB4A;FBAV/375.1.0.28.111;]",
"Mozilla/5.0 (Linux; Android 12; SM-G998B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Mobile Safari/537.36",
"Mozilla/5.0 (Linux; Android 12; SM-S908B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Mobile Safari/537.36",
"Mozilla/5.0 (Linux; Android 11; REVVL V+ 5G) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Mobile Safari/537.36",
"Mozilla/5.0 (Linux; U; Android 8.0.0; zh-cn; Mi Note 2 Build/OPR1.170623.032) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/61.0.3163.128 Mobile Safari/537.36 XiaoMi/MiuiBrowser/10.1.1"
]
35 changes: 35 additions & 0 deletions Summak-Crawling/src/deleteAll.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import dotenv from 'dotenv'
import axios from 'axios'
dotenv.config()

const api = axios.create({
baseURL: process.env.BASE_URL,
headers: {
apiKey: process.env.API_KEY,
username: process.env.USERNAME,
masterKey: process.env.MASTER_KEY,
},
})

async function init() {
await new Promise((res) => {
console.log('Warnning : 10초뒤 실행 됩니다. 이 작업은 되돌릴 수 없습니다.')
setTimeout(res, 10000)
})

// 모든 제품 검색
const { data: products } = await api({
method: 'POST',
url: '/api/products/search',
})

// 모든제품 삭제
for (let product of products) {
await api({
method: 'DELETE',
url: `/api/products/${product.id}`,
})
}
Comment on lines +27 to +32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

반복문 안에서 비동기 처리가 많이 이루어 지는 것은 지양하는 것이 좋을 것 같아요.
특정 개수 만큼 Promise.all 등으로 처리하는 등 개선이 필요해 보입니다.

}

init()
31 changes: 31 additions & 0 deletions Summak-Crawling/src/findAll.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import dotenv from 'dotenv'
import axios from 'axios'
dotenv.config()
console.log({
baseURL: process.env.BASE_URL,
headers: {
apiKey: process.env.API_KEY,
username: process.env.USER_NAME,
masterKey: process.env.MASTER_KEY,
},
})

const api = axios.create({
baseURL: process.env.BASE_URL,
headers: {
apiKey: process.env.API_KEY,
username: process.env.USER_NAME,
masterKey: process.env.MASTER_KEY,
},
})

async function init() {
// 모든 제품 검색
const res = await api({
method: 'POST',
url: '/api/products/search',
})
console.log(res.data)
}

init()
36 changes: 36 additions & 0 deletions Summak-Crawling/src/getProduct.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import dotenv from 'dotenv'
import readline from 'readline'
import axios from 'axios'
dotenv.config()

const api = axios.create({
baseURL: process.env.BASE_URL,
headers: {
apiKey: process.env.API_KEY,
username: process.env.USERNAME,
masterKey: process.env.MASTER_KEY,
},
})
Comment on lines +6 to +13
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

반복적으로 사용되어 따로 관리하여 재사용 하는 것도 좋을 것 같습니다.


const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
})

async function init() {
console.log('Enter a product id : ')
rl.on('line', async (id: string) => {
const res = await api({
method: 'GET',
url: `/api/products/${id}`,
})
const products = res.data
console.log(products)
rl.close()
})
rl.on('close', () => {
process.exit(0)
})
}

init()
33 changes: 33 additions & 0 deletions Summak-Crawling/src/main.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
import readline from 'readline'
import productsScrapper from './scripts/puppeteer/getProducts/index.js'
import setProductDetail from './scripts/puppeteer/setProductsDetail/index.js'

const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
})

async function main() {
try {
console.info('Enter a Scrap URL : ')
rl.on('line', async (line: string) => {
if (process.env.NODE_ENV === 'development') {
console.log('개발 모드에서 스크래핑 시작')
}
if (process.env.NODE_ENV === 'production') {
console.log('배포 모드에서 스크래핑 시작')
}
const list = await productsScrapper(line)
await setProductDetail(list)
rl.close()
})
rl.on('close', () => {
console.log('스크래핑 종료')
process.exit(0)
})
} catch (err) {
console.error(err)
process.exit()
}
}
main()
50 changes: 50 additions & 0 deletions Summak-Crawling/src/scripts/puppeteer/getProductDetail/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import { Page } from 'puppeteer'
import { ProductDetailSource, ProductSource } from '../../../types/index.js'
import { delay } from '../../../utils/index.js'

// * product 상세 정보를 가져옵니다.
export default async function setProductDetail(page: Page, product: ProductSource): Promise<ProductSource & ProductDetailSource> {
await page.goto(product.link, {
waitUntil: 'networkidle0',
})
await page.waitForSelector('#maincontent')
await delay(1500)
const price = await page.evaluate(() => {
const container = document.querySelector('.price-container') as HTMLSpanElement
const meta = container.querySelector("meta[itemprop='price']") as HTMLMetaElement
const priceStr = meta?.content ? meta.content : 0
return Number(priceStr)
})
const imgSrc = await page.evaluate(() => {
return (document.querySelector('.fotorama-item img') as HTMLImageElement)?.src
})
const description = await page.evaluate(() => {
const descriptionCover = document.querySelector('#maincontent > div.columns > div > div.product-page-container > div.product-page-media > div.product.attribute.mfr_description') as HTMLDivElement
const highlightText = Array.from(descriptionCover.querySelectorAll('strong'))
.filter((item) => {
if (item?.textContent?.includes('알림')) return false
if (item?.textContent?.includes('메이커로부터의 설명입니다')) return false
return true
})
.map((item) => `<p>${item.textContent?.trim()}</p>`)
.join('\n')
.trim()
return highlightText
})
const genres = await page.evaluate(() => {
const genreText = document.querySelector(
'#maincontent > div.columns > div > div.product-page-container > div.product-page-media > div.product-attributes > div.product-attributes-all > div:nth-child(1) > div.product-attribute.game_category > div.product-attribute-val',
) as HTMLDivElement
if (!genreText) return []
const content = genreText.textContent?.split(',').map((item) => item.trim())
return content ? content : []
})

return {
...product,
price,
imgSrc,
genres,
description,
}
}
32 changes: 32 additions & 0 deletions Summak-Crawling/src/scripts/puppeteer/getProducts/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
import { ProductSource } from 'src/types/index.js'
import { delay } from '../../../utils/index.js'
import initPuppeteer from '../init/index.js'

// * product list를 스크랩합니다.
export default async function productsScrapper(url: string): Promise<ProductSource[]> {
const { browser, page } = await initPuppeteer()
await page.goto(url, {
waitUntil: 'networkidle0',
})
await page.waitForSelector('#maincontent > div.columns > div.column.main')
await delay(2000)

const productList = await page.evaluate(() => {
const main = document.querySelector('#maincontent > div.columns > div.column.main')
const _productList = main?.querySelectorAll('.category-product-item')
if (typeof _productList == 'undefined') return []
return Array.from(_productList).map((item) => {
const title = item?.querySelector('a.category-product-item-title-link')?.textContent?.trim()
const link = (item?.querySelector('a.category-product-item-title-link') as HTMLAnchorElement)?.href
const thumbSrc = (item.querySelector('img.product-image-photo') as HTMLImageElement)?.src
return {
title: title ? title : '',
link: link ? link : '',
thumbSrc: thumbSrc ? thumbSrc : '',
}
})
}, [])
await page.close()
await browser.close()
return productList
}
20 changes: 20 additions & 0 deletions Summak-Crawling/src/scripts/puppeteer/init/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import puppeteer, { Browser, Page } from 'puppeteer'
import referers from '../../../assets/referers.json' assert { type: 'json' }
import userAgents from '../../../assets/userAgents.json' assert { type: 'json' }

// * puppeteer를 초기화합니다.
export default async function initPuppeteer(): Promise<{ browser: Browser; page: Page }> {
// * referer와 agent를 설정합니다.
const randomReferers = referers[Math.floor(Math.random() * referers.length)]
const randomUserAgent = userAgents[Math.floor(Math.random() * userAgents.length)]
const browser = await puppeteer.launch({
headless: false,
args: ['--disable-notifications'],
devtools: false,
})
const page = await browser.newPage()
page.setUserAgent(randomUserAgent)
await page.setUserAgent(randomUserAgent)
await page.setExtraHTTPHeaders({ referers: randomReferers })
return { browser, page }
}
Loading