Kowal's Igloo

์ถ”์ฒœ ์‹œ์Šคํ…œ: ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๋ฅผ ํ™œ์šฉํ•œ ํ‚ค์›Œ๋“œ ๊ธฐ๋ฐ˜ ์ถ”์ฒœ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ณธ๋ฌธ

AI

์ถ”์ฒœ ์‹œ์Šคํ…œ: ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๋ฅผ ํ™œ์šฉํ•œ ํ‚ค์›Œ๋“œ ๊ธฐ๋ฐ˜ ์ถ”์ฒœ ์•Œ๊ณ ๋ฆฌ์ฆ˜

์ฝ”์™ˆ์ด 2023. 11. 24. 14:33
๐Ÿง™ ์ฑ„์šฉ ๋งค์นญ ํ”Œ๋žซํผ: ๊ธฐ์—… ์‚ฌ์šฉ์ž๊ฐ€ "๋ฉ€ํ‹ฐํ”Œ๋ ˆ์ด์–ด ์ „๋žต ๊ฒŒ์ž„์„ ๊ฐœ๋ฐœํ•  ์„œ๋ฒ„ ๊ฐœ๋ฐœ์ž" ๋“ฑ์˜ ํ”„๋กœ์ ํŠธ ๊ฐœ์š”๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ์„œ๋น„์Šค์— ์˜ฌ๋ผ์™€ ์žˆ๋Š” ์‚ฌ์šฉ์ž๋“ค์˜ ์ด๋ ฅ์„œ์™€ ๋น„๊ตํ•ด ๊ฐ€์žฅ ์œ ์‚ฌ๋„๊ฐ€ ๋†’์€ ์ˆœ์„œ๋Œ€๋กœ ์ถ”์ฒœํ•œ๋‹ค.
1. ์ถ”์ฒœ ์‹œ์Šคํ…œ์ด๋ž€?
     1.1 ๋ฐ์ดํ„ฐ์˜ ์ข…๋ฅ˜
     1.2 ์œ ์‚ฌ๋„
     1.3 ์œ ์‚ฌ๋„ ์ธก์ • ๋ฐฉ์‹
     1.4 ์ถ”์ฒœ ์‹œ์Šคํ…œ์˜ ์ข…๋ฅ˜
     1.5 ์ถ”์ฒœ ์‹œ์Šคํ…œ์˜ ํ•œ๊ณ„
2. ์šฐ๋ฆฌ ์„œ๋น„์Šค์— ์ ์šฉ
     2.1 Step 1. ํ‚ค์›Œ๋“œ ์ถ”์ถœ
     2.2 Step 2. ๋‹จ์–ด ๋ฒกํ„ฐํ™”
     2.3 Step 3. ๋ฒกํ„ฐ ๊ฐ„ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„ ๊ณ„์‚ฐ
     2.4 ์ „์ฒด ์ฝ”๋“œ
     2.5 ์‹คํ–‰ ๊ฒฐ๊ณผ
     2.6 ๊ฐœ์„ ์ 
3 ์ฐธ๊ณ  ์ž๋ฃŒ

์ถ”์ฒœ ์‹œ์Šคํ…œ์ด๋ž€?

์‚ฌ์šฉ์ž์˜ ์ •๋ณด ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜์—ฌ ๊ฐœ์ธ์˜ ์ทจํ–ฅ์— ๋งž๋Š” ์•„์ดํ…œ์„ ์ถ”์ฒœํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์‚ฌ์šฉ์ž๋Š” ์ƒํ’ˆ์— ์ข‹์•„์š”๋ฅผ ๋‚จ๊ธฐ๊ฑฐ๋‚˜, ์ƒํ’ˆ์„ ๊ตฌ๋งคํ•˜๊ณ  ๋ฆฌ๋ทฐ๋ฅผ ๋‚จ๊ธธ ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์‚ฌ์šฉ์ž์˜ ํ”ผ๋“œ๋ฐฑ๊ณผ ํ–‰๋™ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•ด์„œ ์‚ฌ์šฉ์ž์˜ ์ทจํ–ฅ์„ ํŒŒ์•…ํ•˜๊ณ , ์ถ”์ฒœ ์‹œ์Šคํ…œ์— ํ™œ์šฉํ•œ๋‹ค.

 

๋ฐ์ดํ„ฐ์˜ ์ข…๋ฅ˜

  • Explicit ๋ฐ์ดํ„ฐ
    • ์‚ฌ์šฉ์ž๊ฐ€ ์ง์ ‘์ ์œผ๋กœ ์„ ํ˜ธ๋„๋ฅผ ํ‘œํ˜„ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋งํ•œ๋‹ค.
    • ์˜ˆ) ํ‰์ , ์ข‹์•„์š”, ๊ตฌ๋…
    • ์‹ค๋ฌด์—์„œ๋Š” ์–ป๊ธฐ ์–ด๋ ต๋‹ค.
    • ํ•˜์ง€๋งŒ ์šฐ๋ฆฌ ์„œ๋น„์Šค์—์„œ๋Š” ํ”„๋กœ์ ํŠธ ๊ฐœ์š”๋ฅผ ์ง์ ‘์ ์œผ๋กœ ์ž…๋ ฅํ•˜๊ธฐ ๋•Œ๋ฌธ์—, explicit ๋ฐ์ดํ„ฐ๋ฅผ ์–ป๊ธฐ ์‰ฝ๋‹ค!
  • Implicit ๋ฐ์ดํ„ฐ
    • ์‚ฌ์šฉ์ž๊ฐ€ ๊ฐ„์ ‘์ ์œผ๋กœ ์„ ํ˜ธ๋„๋ฅผ ํ‘œํ˜„ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋งํ•œ๋‹ค.
    • ์˜ˆ) ๊ฒ€์ƒ‰ ๊ธฐ๋ก, ํด๋ฆญ ๊ธฐ๋ก, ๊ตฌ๋งค ๋‚ด์—ญ

 

์œ ์‚ฌ๋„

์ถ”์ฒœ ์‹œ์Šคํ…œ์—์„œ ์œ ์‚ฌ๋„๋Š” ์ค‘์š”ํ•œ ๊ฐœ๋…์ด๋‹ค. ๋ฐ์ดํ„ฐ์˜ ํ˜•ํƒœ๊ฐ€ ํ…์ŠคํŠธ, ์ด๋ฏธ์ง€, ์˜์ƒ ๋“ฑ ์–ด๋–ค ํ˜•ํƒœ์ด๋”๋ผ๋„ ๋ฒกํ„ฐ ํ˜•ํƒœ๋กœ ํ‘œํ˜„ํ•˜์—ฌ ์•„์ดํ…œ ๋ฒกํ„ฐ ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์˜ท ๋ฐ์ดํ„ฐ์…‹์—์„œ ์นดํ…Œ๊ณ ๋ฆฌ, ์‚ฌ์ด์ฆˆ, ์ƒ‰ ๋“ฑ์ด ๋ฒกํ„ฐ์˜ ์—ฌ๋Ÿฌ ์†์„ฑ ์ค‘ ํ•˜๋‚˜๊ฐ€ ๋  ์ˆ˜ ์žˆ๋‹ค.

์œ ์‚ฌ๋„ ์ธก์ • ๋ฐฉ์‹

  • ์ž์นด๋“œ ์œ ์‚ฌ๋„
    • ์ง‘ํ•ฉ ๊ฐ„ ๊ต์ง‘ํ•ฉ ํฌ๊ธฐ๋ฅผ ์ด์šฉํ•˜์—ฌ ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•œ๋‹ค.
    • 0~1 ์‚ฌ์ด์˜ ๋ฒ”์œ„๋ฅผ ๊ฐ€์ง„๋‹ค.
    • ํ•ฉ์ง‘ํ•ฉ๊ณผ ๊ต์ง‘ํ•ฉ์˜ ํฌ๊ธฐ๊ฐ€ ๋น„์Šทํ• ์ˆ˜๋ก ๊ฐ’์ด 1์— ๊ฐ€๊นŒ์›Œ์ง„๋‹ค.
    • (๋‘ ์•„์ดํ…œ ๊ต์ง‘ํ•ฉ)/(๋‘ ์•„์ดํ…œ ํ•ฉ์ง‘ํ•ฉ)
  • ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„
    • ๋‘ ์•„์ดํ…œ์˜ ๋ฒกํ„ฐ ๊ณต๊ฐ„ ๋ชจ๋ธ(vector space model)์—์„œ ๋‘ ๋ฒกํ„ฐ ๊ฐ„์˜ ์œ ์‚ฌ์„ฑ์„ ์ธก์ •ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.
    • A⋅B๋Š” ๋ฒกํ„ฐ A์™€ B์˜ ๋‚ด์ (dot product)
    • โˆฅAโˆฅ์™€ โˆฅBโˆฅ๋Š” ๊ฐ๊ฐ ๋ฒกํ„ฐ A์™€ B์˜ ํฌ๊ธฐ(norm)
    • -1์—์„œ 1 ์‚ฌ์ด์˜ ๋ฒ”์œ„๋ฅผ ๊ฐ€์ง„๋‹ค.

    • ๋‘ ๋ฒกํ„ฐ๊ฐ€ ๋น„์Šทํ• ์ˆ˜๋ก ๊ฐ’์ด 1์— ๊ฐ€๊นŒ์›Œ์ง„๋‹ค. ๊ฐ’์ด -1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ๋‘ ๋ฒกํ„ฐ๋Š” ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ์„ ๊ฐ€์ง€๋ฉฐ, 0์€ ๋ฒกํ„ฐ ๊ฐ„์˜ ๊ด€๋ จ์„ฑ์ด ์—†์Œ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.
  • ์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ
    • ๋‘ ์•„์ดํ…œ์˜ ๋ฒกํ„ฐ ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ด์šฉํ•ด์„œ ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•œ๋‹ค.
    • ๋‘ ๋ฒกํ„ฐ ๊ฐ„์˜ ๋ฐฉํ–ฅ์€ ๋ฌด์‹œํ•œ๋‹ค.
  • ํ”ผ์–ด์Šจ ์ƒ๊ด€๊ณ„์ˆ˜
    • ๋‘ ์•„์ดํ…œ์ด ์„œ๋กœ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์žˆ๋Š”์ง€ ์ธก์ •ํ•œ๋‹ค.

 

์ถ”์ฒœ ์‹œ์Šคํ…œ์˜ ์ข…๋ฅ˜

์ถ”์ฒœ ์‹œ์Šคํ…œ์˜ ์ข…๋ฅ˜๋Š” ํฌ๊ฒŒ ์ปจํ…์ธ  ๊ธฐ๋ฐ˜ ํ•„ํ„ฐ๋ง, ํ˜‘์—… ํ•„ํ„ฐ๋ง, ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ถ”์ฒœ ์‹œ์Šคํ…œ์ด ์žˆ๋‹ค. ๋” ์‹ฌํ™”๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ๋Š” ๋”ฅ๋Ÿฌ๋‹์„ ํ™œ์šฉํ•œ ์ถ”์ฒœ ์‹œ์Šคํ…œ์ด ์žˆ๋‹ค.

์ปจํ…์ธ  ๊ธฐ๋ฐ˜ ํ•„ํ„ฐ๋ง

์‚ฌ์šฉ์ž๊ฐ€ ์„ ํƒํ•˜๊ฑฐ๋‚˜ ๊ตฌ๋งคํ•œ ์•„์ดํ…œ๊ณผ ์œ ์‚ฌํ•œ ์•„์ดํ…œ์„ ์ถ”์ฒœํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ์•„์ดํ…œ ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•œ๋‹ค.

ํ˜‘์—… ํ•„ํ„ฐ๋ง

๋น„์Šทํ•œ ์ทจํ–ฅ์„ ๊ฐ€์ง„ ๋‹ค๋ฅธ ์‚ฌ์šฉ์ž๊ฐ€ ์ข‹์•„ํ•œ ์•„์ดํ…œ์„ ์‚ฌ์šฉ์ž์—๊ฒŒ ์ถ”์ฒœํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ์‚ฌ์šฉ์ž ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•œ๋‹ค.

 

์ถ”์ฒœ ์‹œ์Šคํ…œ์˜ ํ•œ๊ณ„

์ฝœ๋“œ ์Šคํƒ€ํŠธ ๋ฌธ์ œ

๋งŒ์•ฝ ํ˜‘์—… ํ•„ํ„ฐ๋ง์„ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด, ๊ธฐ์กด ๋ฐ์ดํ„ฐ๊ฐ€ ๋ฐ˜๋“œ์‹œ ํ•„์š”ํ•˜๊ธฐ์— ์‹ ๊ทœ ์‚ฌ์šฉ์ž์—๊ฒŒ๋Š” ์–ด๋– ํ•œ ์•„์ดํ…œ๋„ ์ถ”์ฒœํ•  ์ˆ˜ ์—†๋‹ค.

๊ณ„์‚ฐ ํšจ์œจ์„ฑ ์ €ํ•˜

ํ˜‘์—… ํ•„ํ„ฐ๋ง์€ ๊ณ„์‚ฐ๋Ÿ‰์ด ๋งŽ์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๊ธฐ ๋•Œ๋ฌธ์— ์‚ฌ์šฉ์ž ์ˆ˜๊ฐ€ ๋งŽ์€ ๊ฒฝ์šฐ์— ๊ณ„์‚ฐ์ด ์žฅ๊ธฐ๊ฐ„ ์†Œ์š”๋œ๋‹ค.

๋กฑํ…Œ์ผ ๋ฌธ์ œ

์‚ฌ์šฉ์ž๋Š” ์†Œ์ˆ˜์˜ ์ธ๊ธฐ ์žˆ๋Š” ์ฝ˜ํ…์ธ ์—๋งŒ ๊ด€์‹ฌ์„ ๋ณด์ด๊ธฐ ๋งˆ๋ จ์ธ๋ฐ, ์ด ๋•Œ๋ฌธ์— ์†Œ์ˆ˜์˜ ์ธ๊ธฐ ์ฝ˜ํ…์ธ ๊ฐ€ ์ „์ฒด ์ถ”์ฒœ ์ฝ˜ํ…์ธ ์˜ ๋งŽ์€ ๋น„์œจ์„ ์ฐจ์ง€ํ•˜๊ฒŒ ๋œ๋‹ค.

→ ์ด๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๊ฐ„๋‹จํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ผ์–ด๋‚˜๋Š” ๋ฌธ์ œ์ ์ด๋‹ค.

ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ถ”์ฒœ ์‹œ์Šคํ…œ

2๊ฐœ ์ด์ƒ์˜ ๋‹ค์–‘ํ•œ ์ข…๋ฅ˜์˜ ์ถ”์ฒœ ์‹œ์Šคํ…œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์กฐํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

 

์šฐ๋ฆฌ ์„œ๋น„์Šค์— ์ ์šฉ

์—…๋ฌด ํ•œ ์ค„ ์†Œ๊ฐœ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ถ”์ฒœํ•˜๋ฏ€๋กœ, ์•„์ดํ…œ ๊ฐ„ ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•˜๋Š” ์ปจํ…์ธ  ๊ธฐ๋ฐ˜ ํ•„ํ„ฐ๋ง ๋ฐฉ์‹์— ๋” ๊ฐ€๊น๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ, ํ‚ค์›Œ๋“œ ๋ฒกํ„ฐ ๊ฐ„ ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•˜๊ธฐ ์œ„ํ•ด ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๋ฅผ ์ด์šฉํ•  ๊ฒƒ์ด๋‹ค.

 

๊ทธ๋ ‡๋‹ค๋ฉด ๋ณธ๊ฒฉ์ ์œผ๋กœ ์ถ”์ฒœ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ฐœ๋ฐœ์„ ์‹œ์ž‘ํ•˜๊ฒ ๋‹ค.

๐Ÿง™ ๊ธฐ์—… ์‚ฌ์šฉ์ž๊ฐ€ "๋ฉ€ํ‹ฐํ”Œ๋ ˆ์ด์–ด ์ „๋žต ๊ฒŒ์ž„์„ ๊ฐœ๋ฐœํ•  ์„œ๋ฒ„ ๊ฐœ๋ฐœ์ž" ๋“ฑ์˜ ์—…๋ฌด ํ•œ ์ค„ ์†Œ๊ฐœ๋ฅผ ์ž…๋ ฅํ•˜๋ฉด, ์„œ๋น„์Šค์— ์˜ฌ๋ผ์™€ ์žˆ๋Š” ์‚ฌ์šฉ์ž๋“ค์˜ ์ด๋ ฅ์„œ์™€ ๋น„๊ตํ•ด ๊ฐ€์žฅ ์œ ์‚ฌ๋„๊ฐ€ ๋†’์€ ์ˆœ์„œ๋Œ€๋กœ ์ถ”์ฒœํ•˜๋ ค๊ณ  ํ•œ๋‹ค. ๊ฒ€์ƒ‰ํ•œ ์—…๋ฌด ํ•œ ์ค„ ์†Œ๊ฐœ์™€ ๊ตฌ์ง์ž์˜ ์ด๋ ฅ์„œ ๊ฐ๊ฐ์—์„œ ํ‚ค์›Œ๋“œ๋ฅผ ์ถ”์ถœํ•ด์„œ ๋‘ ํ‚ค์›Œ๋“œ ๋ฒกํ„ฐ ๊ฐ„ ์œ ์‚ฌ๋„(์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„)๋ฅผ ์ธก์ •ํ•  ๊ฒƒ์ด๋‹ค.

 

 

Step 1. ํ‚ค์›Œ๋“œ ์ถ”์ถœ

KoNLPy(์ฝ”์—”์—˜ํŒŒ์ด)๋Š” ํ•œ๊ตญ์–ด ์ •๋ณด์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ํŒŒ์ด์ฌ ํŒจํ‚ค์ง€์ด๋‹ค.

์‚ฌ์šฉํ•˜๋ ค๋ฉด KoNLPy ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•œ ๋‹ค์Œ, KoNLPy์—์„œ ์ œ๊ณตํ•˜๋Š” ํ•œ๋‚˜๋ˆ”(Hannanum) ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

์ด๋ ฅ์„œ ํ…์ŠคํŠธ๋ฅผ hannanum.nouns() ํ•จ์ˆ˜์— ์ „๋‹ฌํ•˜์—ฌ ๋ช…์‚ฌ๋งŒ ์ถ”์ถœํ•œ๋‹ค.

!pip install konlpy # KoNLPy ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜

from konlpy.tag import Hannanum
 
hannanum = Hannanum() # Hannanum ๊ฐ์ฒด ์ƒ์„ฑ
resumes = [
    "ํ”„๋กœ์ ํŠธ ๊ด€๋ฆฌ ๋ฐ ๊ฐœ๋ฐœ ๊ฒฝํ—˜์ด ํ’๋ถ€ํ•œ ๊ฐœ๋ฐœ์ž",
    "๋น ๋ฅธ ํ•™์Šต ๋Šฅ๋ ฅ๊ณผ ํ”„๋กœ์ ํŠธ ๊ด€๋ฆฌ ์Šคํ‚ฌ์„ ๊ฐ–์ถ˜ ๊ฒฝ์˜ํ•™ ์ „๊ณต์ž",
    "์›น ๊ฐœ๋ฐœ๊ณผ ํ”„๋กœ์ ํŠธ ์ผ์ • ๊ด€๋ฆฌ์— ๋Šฅ์ˆ™ํ•œ ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด"
]

for resume in resumes:
	# ์ด๋ ฅ์„œ ํ…์ŠคํŠธ์—์„œ ๋ช…์‚ฌ๋งŒ ์ถ”์ถœ
  resume_keywords = hannanum.nouns(resume)
  print(resume_keywords)

๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ‘๋ฐ’, ‘์ด’, ‘์™€/๊ณผ’ ๋“ฑ ์กฐ์‚ฌ์™€ ์ ‘์†์‚ฌ๋ฅผ ์ƒ๋žตํ•˜๊ณ  ์ž˜ ์ถ”์ถœ๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

['ํ”„๋กœ์ ํŠธ', '๊ด€๋ฆฌ', '๊ฐœ๋ฐœ', '๊ฒฝํ—˜', 'ํ’๋ถ€', '๊ฐœ๋ฐœ์ž']
['ํ•™์Šต', '๋Šฅ๋ ฅ', 'ํ”„๋กœ์ ํŠธ', '๊ด€๋ฆฌ', '์Šคํ‚ฌ', '๊ฒฝ์˜ํ•™', '์ „๊ณต์ž']
['์›น', '๊ฐœ๋ฐœ', 'ํ”„๋กœ์ ํŠธ', '์ผ์ •', '๊ด€๋ฆฌ', '๋Šฅ์ˆ™', '์†Œํ”„ํŠธ์›จ์–ด', '์—”์ง€๋‹ˆ์–ด']

 

Step 2. ๋‹จ์–ด ๋ฒกํ„ฐํ™”

!pip install scikit-learn

from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer()
vectors = vectorizer.fit_transform([project_keywords, resume_keywords])

์ด ์ฝ”๋“œ๋Š” scikit-learn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ TF-IDF (Term Frequency-Inverse Document Frequency) ๋ฒกํ„ฐ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. TF-IDF๋Š” ํŠน์ • ๋‹จ์–ด๊ฐ€ ๋ฌธ์„œ ๋‚ด์—์„œ ์–ผ๋งˆ๋‚˜ ์ค‘์š”ํ•œ์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ง€ํ‘œ๋กœ, ๋‹จ์–ด์˜ ๋นˆ๋„์™€ ์—ญ๋ฌธ์„œ ๋นˆ๋„๋ฅผ ๊ณ ๋ คํ•œ๋‹ค.

 

TfidfVectorizer๋Š” ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ TF-IDF ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค.

fit_transform() ํ•จ์ˆ˜๋Š” ๊ฐ ํ…์ŠคํŠธ์˜ ๋‹จ์–ด๋“ค์„ ๊ธฐ๋ฐ˜์œผ๋กœ TF-IDF ๊ฐ€์ค‘์น˜๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , ์ตœ์ข…์ ์œผ๋กœ ํ…์ŠคํŠธ๋ฅผ TF-IDF ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค. ๊ฒฐ๊ณผ๋กœ ๋‚˜์˜ค๋Š” vectors๋Š” ํฌ์†Œ ํ–‰๋ ฌ(Sparse Matrix) ํ˜•ํƒœ๋กœ, ๊ฐ ํ–‰์€ ๊ฐ ๋ฌธ์„œ์— ๋Œ€ํ•œ TF-IDF ๋ฒกํ„ฐ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. ํฌ์†Œ ํ–‰๋ ฌ์€ ๋Œ€๋ถ€๋ถ„์˜ ๊ฐ’์ด 0์ธ ํ–‰๋ ฌ์„ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ ˆ์•ฝํ•˜์—ฌ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.

 

Step 3. ๋ฒกํ„ฐ ๊ฐ„ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„ ๊ณ„์‚ฐ

from sklearn.metrics.pairwise import cosine_similarity

# ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„ ๊ณ„์‚ฐ
similarity_matrix = cosine_similarity(vectors)
similarity = similarity_matrix[0, 1]

cosine_similarity() ํ•จ์ˆ˜๋Š” TF-IDF ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„๋œ ํ…์ŠคํŠธ๋“ค ๊ฐ„์˜ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.

similarity_matrix[0, 1]์€ ์ฒซ ๋ฒˆ์งธ ๋ฒกํ„ฐ์™€ ๋‘ ๋ฒˆ์งธ ๋ฒกํ„ฐ ๊ฐ„์˜ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค. ์ด ๊ฐ’์€ 0๊ณผ 1 ์‚ฌ์ด์˜ ์‹ค์ˆ˜์ด๋ฉฐ, ๊ฐ’์ด 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ์œ ์‚ฌํ•˜๊ณ  0์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ์„œ๋กœ ๊ด€๊ณ„๊ฐ€ ์—†๋‹ค.

 

์ „์ฒด ์ฝ”๋“œ

!pip install scikit-learn
!pip install konlpy

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from konlpy.tag import Hannanum
 
hannanum = Hannanum()

def extract_keywords(text):
    # ํ…์ŠคํŠธ์—์„œ ํ‚ค์›Œ๋“œ ์ถ”์ถœ
    keywords = hannanum.nouns(text)
    print(keywords)
    return " ".join(keywords)  # ๋‹จ์–ด ๋ฆฌ์ŠคํŠธ๋ฅผ ๋‹ค์‹œ ๋ฌธ์ž์—ด๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๋ฐ˜ํ™˜

def calculate_similarity(project_keywords, resume_keywords):
    # ๋‹จ์–ด ๋ฒกํ„ฐํ™”
    vectorizer = TfidfVectorizer()
    vectors = vectorizer.fit_transform([project_keywords, resume_keywords])

    # ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„ ๊ณ„์‚ฐ
    similarity_matrix = cosine_similarity(vectors)
    similarity = similarity_matrix[0, 1]

    return similarity

def recommend_resumes(project_overview, resumes): # ์ด๋ ฅ์„œ ์ถ”์ฒœ ๊ตฌํ˜„
    # ์—…๋ฌด ํ•œ ์ค„ ์†Œ๊ฐœ๋กœ๋ถ€ํ„ฐ ํ‚ค์›Œ๋“œ ์ถ”์ถœ
    project_keywords = extract_keywords(project_overview)

    # ์ด๋ ฅ์„œ๋ณ„๋กœ ์œ ์‚ฌ๋„ ๊ณ„์‚ฐ
    similarity_scores = []
    for resume in resumes:
        # ์ด๋ ฅ์„œ๋กœ๋ถ€ํ„ฐ ํ‚ค์›Œ๋“œ ์ถ”์ถœ
        resume_keywords = extract_keywords(resume)
        # ์—…๋ฌด ํ•œ ์ค„ ์†Œ๊ฐœ์™€ ์ด๋ ฅ์„œ ์‚ฌ์ด์˜ ์œ ์‚ฌ๋„ ๊ณ„์‚ฐ
        similarity = calculate_similarity(project_keywords, resume_keywords)
        similarity_scores.append((resume, similarity))

    # ์œ ์‚ฌ๋„๊ฐ€ ๋†’์€ ์ˆœ์œผ๋กœ ์ด๋ ฅ์„œ ์ •๋ ฌ
    sorted_resumes = sorted(similarity_scores, key=lambda x: x[1], reverse=True)

    return sorted_resumes

# ๊ธฐ์—… ์‚ฌ์šฉ์ž๊ฐ€ ๊ฒ€์ƒ‰ํ•˜๋Š” ์—…๋ฌด ํ•œ ์ค„ ์†Œ๊ฐœ
project_overview = "๊ธฐ์—… ํ”„๋กœ์ ํŠธ ๊ด€๋ฆฌ ๋ฐ ๊ฐœ๋ฐœ"
print("์‚ฌ์šฉ์ž ๊ฒ€์ƒ‰์–ด: " + project_overview)

# ์‹œ๋‹ˆ์–ด ์‚ฌ์šฉ์ž์˜ ์ด๋ ฅ์„œ ํ‚ค์›Œ๋“œ
resumes = [
    "๊ธฐ์—… ํ”„๋กœ์ ํŠธ ๊ด€๋ฆฌ ๋ฐ ๊ฐœ๋ฐœ ๊ฒฝํ—˜์ด ํ’๋ถ€ํ•œ ๊ฐœ๋ฐœ์ž",
    "๋น ๋ฅธ ํ•™์Šต ๋Šฅ๋ ฅ๊ณผ ํ”„๋กœ์ ํŠธ ๊ด€๋ฆฌ ์Šคํ‚ฌ์„ ๊ฐ–์ถ˜ ๊ฒฝ์˜ํ•™ ์ „๊ณต์ž",
    "์›น ๊ฐœ๋ฐœ๊ณผ ํ”„๋กœ์ ํŠธ ์ผ์ • ๊ด€๋ฆฌ์— ๋Šฅ์ˆ™ํ•œ ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด"
]

# ์—…๋ฌด ํ•œ ์ค„ ์†Œ๊ฐœ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ด๋ ฅ์„œ ์ถ”์ฒœ
recommended_resumes = recommend_resumes(project_overview, resumes)

# ๊ฒฐ๊ณผ ์ถœ๋ ฅ
for resume, similarity in recommended_resumes:
    print(f"์œ ์‚ฌ๋„: {similarity:.2f}, ์ด๋ ฅ์„œ: {resume}")

 

์‹คํ–‰ ๊ฒฐ๊ณผ

์‚ฌ์šฉ์ž ๊ฒ€์ƒ‰์–ด: ๊ธฐ์—… ํ”„๋กœ์ ํŠธ ๊ด€๋ฆฌ ๋ฐ ๊ฐœ๋ฐœ
์œ ์‚ฌ๋„: 0.63, ์ด๋ ฅ์„œ: ๊ธฐ์—… ํ”„๋กœ์ ํŠธ ๊ด€๋ฆฌ ๋ฐ ๊ฐœ๋ฐœ ๊ฒฝํ—˜์ด ํ’๋ถ€ํ•œ ๊ฐœ๋ฐœ์ž
์œ ์‚ฌ๋„: 0.41, ์ด๋ ฅ์„œ: ์›น ๊ฐœ๋ฐœ๊ณผ ํ”„๋กœ์ ํŠธ ์ผ์ • ๊ด€๋ฆฌ์— ๋Šฅ์ˆ™ํ•œ ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด
์œ ์‚ฌ๋„: 0.24, ์ด๋ ฅ์„œ: ๋น ๋ฅธ ํ•™์Šต ๋Šฅ๋ ฅ๊ณผ ํ”„๋กœ์ ํŠธ ๊ด€๋ฆฌ ์Šคํ‚ฌ์„ ๊ฐ–์ถ˜ ๊ฒฝ์˜ํ•™ ์ „๊ณต์ž

 

๊ฐœ์„ ์ 

์‚ฌ์šฉ์ž ๊ฒ€์ƒ‰์–ด: ์ „๋žต ๋ฉ€ํ‹ฐํ”Œ๋ ˆ์ด์–ด ๊ฒŒ์ž„์„ ๊ฐœ๋ฐœํ•˜๋Š” ์„œ๋ฒ„ ๊ฐœ๋ฐœ์ž

์œ ์‚ฌ๋„: 0.41, ์ด๋ ฅ์„œ: ์ „๋žต ๋ฉ€ํ‹ฐํ”Œ๋ ˆ์ด์–ด ๊ฒŒ์ž„ ์„œ๋ฒ„๋ฅผ ์„ค๊ณ„ํ•˜๊ณ  ๊ตฌํ˜„ํ•œ ๊ฒฝํ—˜์ด ์žˆ๋Š” ์—”์ง€๋‹ˆ์–ด
์œ ์‚ฌ๋„: 0.38, ์ด๋ ฅ์„œ: ์ „๋žต ๊ฒŒ์ž„ ํ”„๋กœ์ ํŠธ์—์„œ์˜ ํ’๋ถ€ํ•œ ์„œ๋ฒ„ ์ธก ๊ฐœ๋ฐœ ๊ฒฝํ—˜์„ ๊ฐ€์ง„ ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ

๊ธฐ์—… ์‚ฌ์šฉ์ž์˜ ๊ฒ€์ƒ‰์–ด๊ฐ€ ‘์ „๋žต ๋ฉ€ํ‹ฐํ”Œ๋ ˆ์ด์–ด ๊ฒŒ์ž„์„ ๊ฐœ๋ฐœํ•˜๋Š” ์„œ๋ฒ„ ๊ฐœ๋ฐœ์ž’๋ผ๊ณ  ๊ฐ€์ •ํ•˜๊ณ  ์—ฌ๋Ÿฌ ์˜ˆ์‹œ๋ฅผ ๋„ฃ์–ด ํ…Œ์ŠคํŠธํ–ˆ๋Š”๋ฐ, ์ธ๊ฐ„์ด ๋ณด๊ธฐ์— ๋น„์Šทํ•œ ํ…์ŠคํŠธ์— ๋Œ€ํ•ด ์œ ์‚ฌ๋„๋ฅผ ๋‚ฎ๊ฒŒ ๊ณ„์‚ฐํ•˜๊ณ  ์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด, ‘์ „๋žต ๋ฉ€ํ‹ฐํ”Œ๋ ˆ์ด์–ด ๊ฒŒ์ž„์„ ๊ฐœ๋ฐœํ•˜๋Š” ์„œ๋ฒ„ ๊ฐœ๋ฐœ์ž’์™€ ‘์ „๋žต ๋ฉ€ํ‹ฐํ”Œ๋ ˆ์ด์–ด ๊ฒŒ์ž„ ์„œ๋ฒ„๋ฅผ ์„ค๊ณ„ํ•˜๊ณ  ๊ตฌํ˜„ํ•œ ๊ฒฝํ—˜์ด ์žˆ๋Š” ์—”์ง€๋‹ˆ์–ด’๋Š” ์ธ๊ฐ„์ด ๋ณด๊ธฐ์— ์œ ์‚ฌํ•˜๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ๊ฒฐ๊ณผ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋“ฏ์ด 0.41 ์ •๋„์˜ ์œ ์‚ฌ๋„๋งŒ์„ ์–ป์„ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

๋ฌธ์ œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

['์ „๋žต', '๋ฉ€ํ‹ฐํ”Œ๋ ˆ', '๊ฒŒ์ž„', '๊ฐœ๋ฐœ', '์„œ๋ฒ„', '๊ฐœ๋ฐœ์ž']
['์ „๋žต', '๋ฉ€ํ‹ฐํ”Œ๋ ˆ', '๊ฒŒ์ž„', '์„œ๋ฒ„', '์„ค๊ณ„', '๊ตฌํ˜„', '๊ฒฝํ—˜', '์—”์ง€๋‹ˆ์–ด']
  1. ‘๊ตฌํ˜„’, ‘๊ฒฝํ—˜’ ๋“ฑ ์ง๋ฌด์™€ ์ง์ ‘์ ์œผ๋กœ ์—ฐ๊ด€์ด ์—†๋Š” ์ผ๋ฐ˜์ ์ธ ๋‹จ์–ด๊ฐ€ ํฌํ•จ๋œ๋‹ค.
  2. ‘๊ฐœ๋ฐœ์ž’, ‘์—”์ง€๋‹ˆ์–ด’ ๋“ฑ ์œ ์˜์–ด๋ฅผ ์ฒ˜๋ฆฌํ•˜์ง€ ๋ชปํ•œ๋‹ค.

๋จผ์ € 1๋ฒˆ์˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ฒ ๋‹ค. ์ง๋ฌด์™€ ์ง์ ‘์ ์œผ๋กœ ๊ด€๋ จ์ด ์—†๋Š” ์ผ๋ฐ˜์ ์ธ ๋‹จ์–ด๋ฅผ stop words๋กœ ๋งŒ๋“ค์–ด ํ‚ค์›Œ๋“œ ๋ฒกํ„ฐํ™”์—์„œ ๋ฐฐ์ œํ•˜์˜€๋‹ค.

# ์ง๋ฌด์™€ ์ง์ ‘์ ์œผ๋กœ ๊ด€๋ จ ์—†๋Š” stop words
korean_stopwords = ['๊ฒฝํ—˜', '๋Šฅ๋ ฅ', '๊ฒฝ๋ ฅ', '๊ธฐ์ˆ ', '๋Šฅ์ˆ™', 'ํ’๋ถ€', 'ํ–ฅ์ƒ', '๋‹ค์–‘', '๋‹ค์–‘ํ•œ', '์™„๋ฃŒ', '๊ด€๋ จ', 'ํŠนํ™”', '๋ณด์œ ', '๋‹ด๋‹น', '์„ฑ๊ณต', '์„ฑ๊ณต์ ', 'ํ”„๋กœ์ ํŠธ', '๋ถ„์•ผ', 'ํ™œ์šฉ', '์Šคํ‚ฌ']

# ๋‹จ์–ด ๋ฒกํ„ฐํ™”
vectorizer = TfidfVectorizer(stop_words=korean_stopwords)

 

๊ฒฐ๊ณผ

์œ ์‚ฌ๋„๊ฐ€ ๋†’์•„์ง„ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

์‚ฌ์šฉ์ž ๊ฒ€์ƒ‰์–ด: ์ „๋žต ๋ฉ€ํ‹ฐํ”Œ๋ ˆ์ด์–ด ๊ฒŒ์ž„์„ ๊ฐœ๋ฐœํ•˜๋Š” ์„œ๋ฒ„ ๊ฐœ๋ฐœ์ž
์œ ์‚ฌ๋„: 0.50, ์ด๋ ฅ์„œ: ์ „๋žต ๊ฒŒ์ž„ ํ”„๋กœ์ ํŠธ์—์„œ์˜ ํ’๋ถ€ํ•œ ์„œ๋ฒ„ ์ธก ๊ฐœ๋ฐœ ๊ฒฝํ—˜์„ ๊ฐ€์ง„ ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด
์œ ์‚ฌ๋„: 0.45, ์ด๋ ฅ์„œ: ์ „๋žต ๋ฉ€ํ‹ฐํ”Œ๋ ˆ์ด์–ด ๊ฒŒ์ž„ ์„œ๋ฒ„๋ฅผ ์„ค๊ณ„ํ•˜๊ณ  ๊ตฌํ˜„ํ•œ ๊ฒฝํ—˜์ด ์žˆ๋Š” ์—”์ง€๋‹ˆ์–ด
์œ ์‚ฌ๋„: 0.38, ์ด๋ ฅ์„œ: ๋‹ค์–‘ํ•œ ๊ฒŒ์ž„ ํ”„๋กœ์ ํŠธ์—์„œ ์„œ๋ฒ„ ๊ฐœ๋ฐœ๊ณผ ํ”„๋กœ์ ํŠธ ๊ด€๋ฆฌ๋ฅผ ๋‹ด๋‹นํ•œ ์—”์ง€๋‹ˆ์–ด
์œ ์‚ฌ๋„: 0.32, ์ด๋ ฅ์„œ: ๊ฒŒ์ž„ ์„œ๋ฒ„ ๊ด€๋ จ ํ”„๋กœ์ ํŠธ์—์„œ ํŒ€์„ ๋ฆฌ๋“œํ•˜๊ณ  ํ”„๋กœ์ ํŠธ๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ์™„๋ฃŒํ•œ ๊ฒฝํ—˜์ด ์žˆ์Œ
์œ ์‚ฌ๋„: 0.23, ์ด๋ ฅ์„œ: ์ „๋žต์ ์ธ ๋งˆ์ธ๋“œ๋กœ ํ”„๋กœ์ ํŠธ๋ฅผ ์ฃผ๋„ํ•œ ๊ฒฝํ—˜์ด ํ’๋ถ€ํ•œ ์„œ๋ฒ„ ๊ฐœ๋ฐœ์ž
์œ ์‚ฌ๋„: 0.20, ์ด๋ ฅ์„œ: ์„œ๋ฒ„ ์•„ํ‚คํ…์ฒ˜์— ๋Œ€ํ•œ ๊นŠ์€ ์ดํ•ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ์ „๋žต์  ๊ฐœ๋ฐœ ๊ฒฝํ—˜์ด ํ’๋ถ€
์œ ์‚ฌ๋„: 0.20, ์ด๋ ฅ์„œ: ์ „๋žต์ ์ธ ๊ฒŒ์ž„์˜ ๋‹ค์–‘ํ•œ ๊ธฐ๋Šฅ์„ ๊ฐœ๋ฐœํ•˜๊ณ  ์œ ์ง€๋ณด์ˆ˜ํ•œ ๊ฒฝ๋ ฅ์„ ๋ณด์œ ํ•œ ์—”์ง€๋‹ˆ์–ด
์œ ์‚ฌ๋„: 0.18, ์ด๋ ฅ์„œ: ์„œ๋ฒ„ ์„ฑ๋Šฅ ์ตœ์ ํ™”์™€ ์•ˆ์ •์„ฑ ๊ฐ•ํ™”๋ฅผ ํ†ตํ•ด ๊ฒŒ์ž„ ํ”Œ๋ ˆ์ด ๊ฒฝํ—˜์„ ํ–ฅ์ƒ์‹œํ‚จ ๊ฒฝํ—˜ ์žˆ์Œ
์œ ์‚ฌ๋„: 0.18, ์ด๋ ฅ์„œ: ๋ฉ€ํ‹ฐํ”Œ๋ ˆ์ด์–ด ์‹œ์Šคํ…œ์— ํŠนํ™”๋œ ์„œ๋ฒ„ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์„ค๊ณ„ํ•˜๊ณ  ๊ตฌํ˜„ํ•œ ๊ฒฝํ—˜์ด ์žˆ๋Š” ์—”์ง€๋‹ˆ์–ด
์œ ์‚ฌ๋„: 0.14, ์ด๋ ฅ์„œ: ๋›ฐ์–ด๋‚œ ํŒ€์›Œํฌ์™€ ์†Œํ†ต ๋Šฅ๋ ฅ์œผ๋กœ ๊ฒŒ์ž„ ํ”„๋กœ์ ํŠธ๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ์ด๋ˆ ๊ฒฝํ—˜ ์žˆ์Œ

 

์‹ฌํ™” ํ…Œ์ŠคํŠธ

์˜ˆ์‹œ๋ฅผ ๋” ๋‹ค์–‘ํ™”ํ•ด์„œ ์‚ฌ์šฉ์ž์˜ ๊ฒ€์ƒ‰์–ด์™€ ๊ด€๋ จ ์—†๋Š” ํ…์ŠคํŠธ๊นŒ์ง€ ์ถ”๊ฐ€ํ•ด๋ณด์•˜๋‹ค.

๊ฒ€์ƒ‰์–ด์— ๋”ฐ๋ฅธ ๊ฒฐ๊ณผ๊ฐ€ ์ž˜ ๋‚˜์˜ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

์‚ฌ์šฉ์ž ๊ฒ€์ƒ‰์–ด: ์ „๋žต ๋ฉ€ํ‹ฐํ”Œ๋ ˆ์ด์–ด ๊ฒŒ์ž„์„ ๊ฐœ๋ฐœํ•˜๋Š” ์„œ๋ฒ„ ๊ฐœ๋ฐœ์ž
์œ ์‚ฌ๋„: 0.54, ์ด๋ ฅ์„œ: ์ „๋žต์ ์ธ ๋ฉ€ํ‹ฐํ”Œ๋ ˆ์ด์–ด ๊ฒŒ์ž„์„ ๊ฐœ๋ฐœํ•˜๋Š” ์„œ๋ฒ„ ๊ฐœ๋ฐœ ์ „๋ฌธ๊ฐ€
์œ ์‚ฌ๋„: 0.45, ์ด๋ ฅ์„œ: ์„œ๋ฒ„ ๊ฐœ๋ฐœ ๋ถ„์•ผ์—์„œ ๋‹ค์–‘ํ•œ ๊ฒŒ์ž„ ํ”„๋กœ์ ํŠธ๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ์ด๋ˆ ์—”์ง€๋‹ˆ์–ด
์œ ์‚ฌ๋„: 0.30, ์ด๋ ฅ์„œ: ๊ฒŒ์ž„ ์„œ๋ฒ„ ๋ณด์•ˆ ๊ฐ•ํ™”๋ฅผ ์œ„ํ•œ ๋‹ค์–‘ํ•œ ํ”„๋กœํ† ์ฝœ ๊ฐœ๋ฐœ์— ์ฐธ์—ฌํ•œ ์—”์ง€๋‹ˆ์–ด
์œ ์‚ฌ๋„: 0.18, ์ด๋ ฅ์„œ: ๋ชจ๋ฐ”์ผ ๊ฒŒ์ž„ ์„œ๋ฒ„ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ตฌ์ถ•ํ•˜๊ณ  ์ตœ์ ํ™”ํ•œ ๊ฒฝํ—˜์ด ์žˆ๋Š” ์—”์ง€๋‹ˆ์–ด
์œ ์‚ฌ๋„: 0.14, ์ด๋ ฅ์„œ: ๊ธ€๋กœ๋ฒŒ ๋น„์ฆˆ๋‹ˆ์Šค ์ „๋žต ์ˆ˜๋ฆฝ๊ณผ ์‹คํ–‰์— ๊ฒฝํ—˜์ด ์žˆ๋Š” ๋น„์ฆˆ๋‹ˆ์Šค ์ „๋žต ๊ธฐํš์ž
์œ ์‚ฌ๋„: 0.12, ์ด๋ ฅ์„œ: ๊ธฐ์—… ํ”„๋กœ์ ํŠธ ๊ด€๋ฆฌ ๋ฐ ๊ฐœ๋ฐœ ๊ฒฝํ—˜์ด ํ’๋ถ€ํ•œ ํ”„๋กœ์ ํŠธ ๋งค๋‹ˆ์ €
์œ ์‚ฌ๋„: 0.10, ์ด๋ ฅ์„œ: ์›น ๊ฐœ๋ฐœ๊ณผ ํ”„๋กœ์ ํŠธ ์ผ์ • ๊ด€๋ฆฌ์— ๋Šฅ์ˆ™ํ•œ ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด
์œ ์‚ฌ๋„: 0.08, ์ด๋ ฅ์„œ: ์ตœ์‹  ์›น ๊ธฐ์ˆ ์„ ํ™œ์šฉํ•œ ํ”„๋ก ํŠธ์—”๋“œ ๊ฐœ๋ฐœ์— ํŠนํ™”๋œ ํ”„๋ก ํŠธ์—”๋“œ ์—”์ง€๋‹ˆ์–ด
์œ ์‚ฌ๋„: 0.00, ์ด๋ ฅ์„œ: ๋น ๋ฅธ ํ•™์Šต ๋Šฅ๋ ฅ๊ณผ ํ”„๋กœ์ ํŠธ ๊ด€๋ฆฌ ์Šคํ‚ฌ์„ ๊ฐ–์ถ˜ ๊ฒฝ์˜ํ•™ ์ „๊ณต์ž
์œ ์‚ฌ๋„: 0.00, ์ด๋ ฅ์„œ: AI ๋ฐ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ์ˆ ์„ ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ ๊ณผํ•™์ž
์œ ์‚ฌ๋„: 0.00, ์ด๋ ฅ์„œ: ์‚ฌ์šฉ์ž ๊ฒฝํ—˜(UX) ๋””์ž์ธ์— ๋Šฅ์ˆ™ํ•œ UI/UX ๋””์ž์ด๋„ˆ
์œ ์‚ฌ๋„: 0.00, ์ด๋ ฅ์„œ: IT ๋ณด์•ˆ ๋ฐ ์ทจ์•ฝ์  ๋ถ„์„์— ์ „๋ฌธํ™”๋œ ๋ณด์•ˆ ์—”์ง€๋‹ˆ์–ด
์œ ์‚ฌ๋„: 0.00, ์ด๋ ฅ์„œ: ๋Œ€๊ทœ๋ชจ ๋ถ„์‚ฐ ์‹œ์Šคํ…œ์˜ ์•„ํ‚คํ…์ฒ˜ ์„ค๊ณ„์™€ ๊ด€๋ฆฌ์— ๋Šฅ์ˆ™ํ•œ ์‹œ์Šคํ…œ ์•„ํ‚คํ…ํŠธ

๋‹ค๋งŒ ํ•œ ๊ฐ€์ง€ ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค.

์œ ์‚ฌ๋„: 0.14, ์ด๋ ฅ์„œ: ๊ธ€๋กœ๋ฒŒ ๋น„์ฆˆ๋‹ˆ์Šค ์ „๋žต ์ˆ˜๋ฆฝ๊ณผ ์‹คํ–‰์— ๊ฒฝํ—˜์ด ์žˆ๋Š” ๋น„์ฆˆ๋‹ˆ์Šค ์ „๋žต ๊ธฐํš์ž
์œ ์‚ฌ๋„: 0.12, ์ด๋ ฅ์„œ: ๊ธฐ์—… ํ”„๋กœ์ ํŠธ ๊ด€๋ฆฌ ๋ฐ ๊ฐœ๋ฐœ ๊ฒฝํ—˜์ด ํ’๋ถ€ํ•œ ํ”„๋กœ์ ํŠธ ๋งค๋‹ˆ์ €
์œ ์‚ฌ๋„: 0.10, ์ด๋ ฅ์„œ: ์›น ๊ฐœ๋ฐœ๊ณผ ํ”„๋กœ์ ํŠธ ์ผ์ • ๊ด€๋ฆฌ์— ๋Šฅ์ˆ™ํ•œ ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด
์œ ์‚ฌ๋„: 0.08, ์ด๋ ฅ์„œ: ์ตœ์‹  ์›น ๊ธฐ์ˆ ์„ ํ™œ์šฉํ•œ ํ”„๋ก ํŠธ์—”๋“œ ๊ฐœ๋ฐœ์— ํŠนํ™”๋œ ํ”„๋ก ํŠธ์—”๋“œ ์—”์ง€๋‹ˆ์–ด

์œ„์˜ ๊ฒฐ๊ณผ์—์„œ ๊ธฐํš์ž๋‚˜ ํ”„๋กœ์ ํŠธ ๋งค๋‹ˆ์ €๊ฐ€ ์—”์ง€๋‹ˆ์–ด๋ณด๋‹ค ๋” ๋†’์€ ์œ ์‚ฌ๋„๋ฅผ ๊ฐ€์ง€๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด๋ ‡๊ฒŒ ์ง๋ฌด๊ฐ€ ๋‹ฌ๋ผ๋„ ํ‚ค์›Œ๋“œ๊ฐ€ ๋น„์Šทํ•˜๋ฉด ์œ ์‚ฌ๋„๊ฐ€ ๋†’๋‹ค ํ‰๊ฐ€ํ•œ๋‹ค.

๋”ฐ๋ผ์„œ, ์‚ฌ์šฉ์ž๊ฐ€ ๊ฒ€์ƒ‰ํ•  ๋•Œ ์ •ํ•ด์ง„ ์ง๋ฌด ๋ถ„์•ผ ์ค‘ ๋จผ์ € ์„ ํƒํ•œ ํ›„ ๊ฒ€์ƒ‰ํ•˜๋„๋ก ํ•˜๋ ค๊ณ  ํ•œ๋‹ค.

 

๊ฐœ์„  ํ›„ ์ „์ฒด ์ฝ”๋“œ

!pip install scikit-learn
!pip install konlpy

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from konlpy.tag import Hannanum

hannanum = Hannanum()

def extract_keywords(text):
    # ํ…์ŠคํŠธ์—์„œ ํ‚ค์›Œ๋“œ ์ถ”์ถœ
    keywords = hannanum.nouns(text)
    return " ".join(keywords)  # ๋‹จ์–ด ๋ฆฌ์ŠคํŠธ๋ฅผ ๋‹ค์‹œ ๋ฌธ์ž์—ด๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๋ฐ˜ํ™˜

def calculate_similarity(project_keywords, resume_keywords):
    # ์ง๋ฌด์™€ ์ง์ ‘์ ์œผ๋กœ ๊ด€๋ จ ์—†๋Š” stop words
    general_stopwords = ['๊ฒฝํ—˜', '๋Šฅ๋ ฅ', '๊ฒฝ๋ ฅ', '๊ธฐ์ˆ ', '์—…๋ฌด', '์ž‘์—…', '๋Šฅ์ˆ™', 'ํ’๋ถ€',
                         'ํ–ฅ์ƒ', '๋‹ค์–‘', '๋‹ค์–‘ํ•œ', '์™„๋ฃŒ', '๊ด€๋ จ', 'ํŠนํ™”', '์—ญ๋Ÿ‰', '๋ณด์œ ',
                         '๋‹ด๋‹น', '์„ฑ๊ณต', '์„ฑ๊ณต์ ', 'ํ”„๋กœ์ ํŠธ', '๋ถ„์•ผ', 'ํ™œ์šฉ', '์Šคํ‚ฌ', 
                         '๋ชฉํ‘œ', '๋„์ „', '๊ธฐ๋ก', '๋…ธ๋ ฅ', '์ˆ˜ํ–‰', '์ฐธ์—ฌ', '์ฐธ๊ฐ€', '๋‹ฌ์„ฑ', 
                         '์ ์šฉ', '๋ฐฐ์›€', '๊ธฐ์—ฌ', 'ํ˜‘๋ ฅ', 'ํ™œ๋™', 'ํ–ฅ์ƒ', '์„ฑ์žฅ', '๋ฐœ์ „']

    # ๋‹จ์–ด ๋ฒกํ„ฐํ™”
    vectorizer = TfidfVectorizer(stop_words=general_stopwords)
    vectors = vectorizer.fit_transform([project_keywords, resume_keywords])

    # ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„ ๊ณ„์‚ฐ
    similarity_matrix = cosine_similarity(vectors)
    similarity = similarity_matrix[0, 1]

    return similarity

def recommend_resumes(project_overview, resumes): # ์ด๋ ฅ์„œ ์ถ”์ฒœ ๊ตฌํ˜„
    # ์—…๋ฌด ํ•œ ์ค„ ์†Œ๊ฐœ๋กœ๋ถ€ํ„ฐ ํ‚ค์›Œ๋“œ ์ถ”์ถœ
    project_keywords = extract_keywords(project_overview)

    # ์ด๋ ฅ์„œ๋ณ„๋กœ ์œ ์‚ฌ๋„ ๊ณ„์‚ฐ
    similarity_scores = []
    for resume in resumes:
        # ์ด๋ ฅ์„œ๋กœ๋ถ€ํ„ฐ ํ‚ค์›Œ๋“œ ์ถ”์ถœ
        resume_keywords = extract_keywords(resume)
        # ์—…๋ฌด ํ•œ ์ค„ ์†Œ๊ฐœ์™€ ์ด๋ ฅ์„œ ์‚ฌ์ด์˜ ์œ ์‚ฌ๋„ ๊ณ„์‚ฐ
        similarity = calculate_similarity(project_keywords, resume_keywords)
        similarity_scores.append((resume, similarity))

    # ์œ ์‚ฌ๋„๊ฐ€ ๋†’์€ ์ˆœ์œผ๋กœ ์ด๋ ฅ์„œ ์ •๋ ฌ
    sorted_resumes = sorted(similarity_scores, key=lambda x: x[1], reverse=True)

    return sorted_resumes

# ๊ธฐ์—… ์‚ฌ์šฉ์ž๊ฐ€ ๊ฒ€์ƒ‰ํ•˜๋Š” ์—…๋ฌด ํ•œ ์ค„ ์†Œ๊ฐœ
project_overview = "์ „๋žต ๋ฉ€ํ‹ฐํ”Œ๋ ˆ์ด์–ด ๊ฒŒ์ž„์„ ๊ฐœ๋ฐœํ•˜๋Š” ์„œ๋ฒ„ ๊ฐœ๋ฐœ์ž"
print("์‚ฌ์šฉ์ž ๊ฒ€์ƒ‰์–ด: " + project_overview)

# ์‹œ๋‹ˆ์–ด ์‚ฌ์šฉ์ž์˜ ์ด๋ ฅ์„œ ํ‚ค์›Œ๋“œ
resumes = [
    "์ „๋žต์ ์ธ ๋ฉ€ํ‹ฐํ”Œ๋ ˆ์ด์–ด ๊ฒŒ์ž„์„ ๊ฐœ๋ฐœํ•˜๋Š” ์„œ๋ฒ„ ๊ฐœ๋ฐœ ์ „๋ฌธ๊ฐ€",
    "์„œ๋ฒ„ ๊ฐœ๋ฐœ ๋ถ„์•ผ์—์„œ ๋‹ค์–‘ํ•œ ๊ฒŒ์ž„ ํ”„๋กœ์ ํŠธ๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ์ด๋ˆ ์—”์ง€๋‹ˆ์–ด",
    "๋ชจ๋ฐ”์ผ ๊ฒŒ์ž„ ์„œ๋ฒ„ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ตฌ์ถ•ํ•˜๊ณ  ์ตœ์ ํ™”ํ•œ ๊ฒฝํ—˜์ด ์žˆ๋Š” ์—”์ง€๋‹ˆ์–ด",
    "๊ฒŒ์ž„ ์„œ๋ฒ„ ๋ณด์•ˆ ๊ฐ•ํ™”๋ฅผ ์œ„ํ•œ ๋‹ค์–‘ํ•œ ํ”„๋กœํ† ์ฝœ ๊ฐœ๋ฐœ์— ์ฐธ์—ฌํ•œ ์—”์ง€๋‹ˆ์–ด",
    "๊ธฐ์—… ํ”„๋กœ์ ํŠธ ๊ด€๋ฆฌ ๋ฐ ๊ฐœ๋ฐœ ๊ฒฝํ—˜์ด ํ’๋ถ€ํ•œ ํ”„๋กœ์ ํŠธ ๋งค๋‹ˆ์ €",
    "๋น ๋ฅธ ํ•™์Šต ๋Šฅ๋ ฅ๊ณผ ํ”„๋กœ์ ํŠธ ๊ด€๋ฆฌ ์Šคํ‚ฌ์„ ๊ฐ–์ถ˜ ๊ฒฝ์˜ํ•™ ์ „๊ณต์ž",
    "์›น ๊ฐœ๋ฐœ๊ณผ ํ”„๋กœ์ ํŠธ ์ผ์ • ๊ด€๋ฆฌ์— ๋Šฅ์ˆ™ํ•œ ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด",
    "AI ๋ฐ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ์ˆ ์„ ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ ๊ณผํ•™์ž",
    "์‚ฌ์šฉ์ž ๊ฒฝํ—˜(UX) ๋””์ž์ธ์— ๋Šฅ์ˆ™ํ•œ UI/UX ๋””์ž์ด๋„ˆ",
    "IT ๋ณด์•ˆ ๋ฐ ์ทจ์•ฝ์  ๋ถ„์„์— ์ „๋ฌธํ™”๋œ ๋ณด์•ˆ ์—”์ง€๋‹ˆ์–ด",
    "๊ธ€๋กœ๋ฒŒ ๋น„์ฆˆ๋‹ˆ์Šค ์ „๋žต ์ˆ˜๋ฆฝ๊ณผ ์‹คํ–‰์— ๊ฒฝํ—˜์ด ์žˆ๋Š” ๋น„์ฆˆ๋‹ˆ์Šค ์ „๋žต ๊ธฐํš์ž",
    "๋Œ€๊ทœ๋ชจ ๋ถ„์‚ฐ ์‹œ์Šคํ…œ์˜ ์•„ํ‚คํ…์ฒ˜ ์„ค๊ณ„์™€ ๊ด€๋ฆฌ์— ๋Šฅ์ˆ™ํ•œ ์‹œ์Šคํ…œ ์•„ํ‚คํ…ํŠธ",
    "์ตœ์‹  ์›น ๊ธฐ์ˆ ์„ ํ™œ์šฉํ•œ ํ”„๋ก ํŠธ์—”๋“œ ๊ฐœ๋ฐœ์— ํŠนํ™”๋œ ํ”„๋ก ํŠธ์—”๋“œ ์—”์ง€๋‹ˆ์–ด"
]

# ์—…๋ฌด ํ•œ ์ค„ ์†Œ๊ฐœ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ด๋ ฅ์„œ ์ถ”์ฒœ
recommended_resumes = recommend_resumes(project_overview, resumes)

# ๊ฒฐ๊ณผ ์ถœ๋ ฅ
for resume, similarity in recommended_resumes:
    print(f"์œ ์‚ฌ๋„: {similarity:.2f}, ์ด๋ ฅ์„œ: {resume}")

 

์ฐธ๊ณ  ์ž๋ฃŒ

์ถ”์ฒœ์‹œ์Šคํ…œ, ๊ทธ๊ฒƒ์ด ์•Œ๊ณ ์‹ถ๋‹ค | Product Analytics Playground (playinpap.github.io)

๋‚ด์ ๊ณผ ์ฝ”์‚ฌ์ธ์œ ์‚ฌ๋„ (dot product & cosine similarity) (tistory.com)

KoNLPy: ํŒŒ์ด์ฌ ํ•œ๊ตญ์–ด NLP — KoNLPy 0.6.0 documentation

KoNLPy ๊ฐ„๋‹จ ์‚ฌ์šฉ๋ฒ• : ๋„ค์ด๋ฒ„ ๋ธ”๋กœ๊ทธ (naver.com)