λ³Έλ¬Έ λ°”λ‘œκ°€κΈ°
πŸ‘¨‍πŸ’» web.dev/fe

μ •κ·œν‘œν˜„μ‹κ³Ό ν•¨κ»˜ URL λœ―μ–΄λ³΄κΈ°

by HandHand 2024. 3. 1.

 

 

⛔️ λ‹€μŒμ—μ„œ μ •μ˜ν•˜λŠ” μ •κ·œμ‹μ΄ λͺ¨λ“  URL ν˜•μ‹μ— λŒ€μ‘λ˜μ§€λŠ” μ•ŠλŠ” 것에 μœ μ˜ν•΄μ£Όμ„Έμš”.
⛔️ ipv4, ipv6 ν˜•μ‹μ˜ ip address ν˜•μ‹μ€ μ§€μ›ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.

 

URL(Uniform Resource Locator) 은 μ›Ή μƒμ˜ λ¦¬μ†ŒμŠ€λ₯Ό κ°€λ¦¬ν‚€λŠ” κ³ μœ ν•œ μ£Όμ†Œμž…λ‹ˆλ‹€.

HTML νŽ˜μ΄μ§€, CSS 및 이미지 λ¦¬μ†ŒμŠ€ λ“± κ·Έ μ–΄λ–€ 것을 가리킬 수 μžˆμŠ΅λ‹ˆλ‹€.

이전에 ν…μŠ€νŠΈ λ‚΄μ—μ„œ URL ν˜•μ‹μ˜ λ¬Έμžμ—΄ νŒŒμ‹±μ΄ ν•„μš”ν•΄μ„œ URL κ·œκ²©μ— λŒ€ν•΄ μ°Ύμ•„λ³Έ λ‚΄μš©μ„ μ •λ¦¬ν•˜κ² μŠ΅λ‹ˆλ‹€.

 

πŸ“Œ RFC λ¬Έμ„œ μ°Έμ‘°

이번 ν¬μŠ€νŠΈμ—μ„œ μ°Έκ³ ν•œ RFC λ¬Έμ„œλ“€μ€ λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€.

 

 

πŸ“Œ URL ν˜•μ‹ λœ―μ–΄λ³΄κΈ°

1️⃣ URL κ΅¬μ„±μš”μ†Œ

RFC 1738 chapter 3.3 을 보면 URL은 λ‹€μŒ ν˜•μ‹μœΌλ‘œ μ •μ˜λ˜μ–΄μžˆμŠ΅λ‹ˆλ‹€.

http://<host>:<port>/<path>?<searchpart>

2️⃣ URL 길이

RFC-1123 을 보면 hostname μ΅œλŒ€ 길이λ₯Ό 63자 (ꢌμž₯은 255자)κΉŒμ§€ μ œν•œν•˜λ„λ‘ ꢌμž₯ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€.

λ˜ν•œ RFC-952 와 λ‹€λ₯΄κ²Œ hostname의 첫 κΈ€μžλ‘œ digit을 ν—ˆμš©ν•©λ‹ˆλ‹€.

Host software MUST handle host names of up to 63 characters and SHOULD handle host names of up to 255 characters.

 

이번 ν¬μŠ€νŠΈμ—μ„œλŠ” 63자둜 μ œν•œν•˜κ² μŠ΅λ‹ˆλ‹€.

3️⃣ Host ν˜•μ‹

host ν˜•μ‹μ€ RFC 1034 λ₯Ό μ‚΄νŽ΄λ³΄λ©΄ μ•Œ 수 μžˆμŠ΅λ‹ˆλ‹€.

<domain> ::= <subdomain> | " "
<subdomain> ::= <label> | <subdomain> "." <label>
<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]
<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
<let-dig-hyp> ::= <let-dig> | "-"
<let-dig> ::= <letter> | <digit>
<letter> ::= A-Z a-z 쀑 ν•˜λ‚˜μ˜ 문자
<digit> ::= 0-9 쀑 ν•˜λ‚˜μ˜ 숫자

 

μœ„ μ˜μ‚¬ ν‘œν˜„μ‹μ„ ν’€μ–΄μ„œ μ„€λͺ…ν•΄ λ³΄κ² μŠ΅λ‹ˆλ‹€.
domain은 subdomain ν˜Ήμ€ 빈 λ¬Έμžμ—΄λ‘œ μ •μ˜λ©λ‹ˆλ‹€.
그리고 subdomain은 ν•˜λ‚˜ μ΄μƒμ˜ label 이 . 으둜 μ—°κ²°λœ ν˜•νƒœκ°€ λ˜λŠ” 것이죠.
μ—¬κΈ°μ„œ label 은 ν•˜λ‚˜ μ΄μƒμ˜ μ•ŒνŒŒλ²³μœΌλ‘œ μ‹œμž‘ν•˜λ©° - κ΅¬λΆ„μžμ™€ 영문자(λŒ€, μ†Œ)와 숫자둜 κ΅¬μ„±λ©λ‹ˆλ‹€.
μ΄λ•Œ label의 λ§ˆμ§€λ§‰μ—λŠ” - κ΅¬λΆ„μžκ°€ μ‘΄μž¬ν•˜λ©΄ μ•ˆ λ˜κ² μŠ΅λ‹ˆλ‹€.

 

μœ„ λ‚΄μš©μ„ λ°”νƒ•μœΌλ‘œ label 에 λŒ€ν•œ μ •κ·œμ‹μ€ λ‹€μŒκ³Ό 같이 μ •μ˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

μ΅œλŒ€ 63μžκΉŒμ§€ κ°€λŠ₯ν•˜λ„λ‘ ν•œ 것에 μœ μ˜ν•˜μ„Έμš”.

label: [a-z\d]([a-z\d-]{0,61}[a-z\d])?

 

label을 ν™œμš©ν•΄μ„œ host μ •κ·œμ‹μ„ μ •μ˜ν•˜λ©΄ λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€.

host: ([a-z\d]([a-z\d-]{0,61}[a-z\d])?)(\.[a-z\d]([a-z\d-]{0,61}[a-z\d])?)*

4️⃣ Port ν˜•μ‹

ν¬νŠΈλ²ˆν˜ΈλŠ” μƒλž΅ κ°€λŠ₯ν•˜λ©°, 이 경우 HTTP 포트 번호인 80번이 κΈ°λ³Έ ν• λ‹Ήλ©λ‹ˆλ‹€.

port: (:\d*)?

5️⃣ path와 searchpart의 ν˜•μ‹

path와 searchpart에 λŒ€ν•œ λ‚΄μš©μ€ RFC3986 3.3μ—μ„œ μžμ„Ένžˆ μ‚΄νŽ΄λ³Ό 수 μžˆμŠ΅λ‹ˆλ‹€.

 

🧩 path ν˜•μ‹ μ•Œμ•„λ³΄κΈ°

path    = path-abempty    ; begins with "/" or is empty 
        / path-absolute   ; begins with "/" but not "//"
        / path-noscheme   ; begins with a non-colon segment
        / path-rootless   ; begins with a segment
        / path-empty      ; zero characters
path-abempty  = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )
path-empty    = 0<pchar>
segment       = *pchar
segment-nz    = 1*pchar
segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )

 

μœ„ μ˜μ‚¬ ν‘œν˜„μ‹μ„ ν†΅ν•΄μ„œ pathλŠ” / 둜 κ΅¬λΆ„λœ 일련의 segment λ‚˜μ—΄λ‘œ μ •μ˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
μ„ΈλΆ€μ μœΌλ‘œλŠ” μ ˆλŒ€κ²½λ‘œ, μƒλŒ€κ²½λ‘œ, scheme μœ λ¬΄μ— 따라 path의 ꡬ성방식에 μ•½κ°„μ˜ 차이가 μ‘΄μž¬ν•©λ‹ˆλ‹€.

 

이제 segmentλ₯Ό κ΅¬μ„±ν•˜λŠ” pchar 의 μ˜μ‚¬ ν‘œν˜„μ‹μ„ μ‚΄νŽ΄λ΄…μ‹œλ‹€.

pchar : unreserved / pct-encoded / sub-delims / ":" / "@"

// pchar κ΅¬μ„±μš”μ†Œλ“€. RFC3986 Appendixλ₯Ό μ°Έκ³ ν•΄λ³΄μ„Έμš”.

unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    =  "$" / "&" / "+" / "," / ";" / "=" 
                 / "!" / "'" / "(" / ")" / "*"

 

sub-delims μ—μ„œ !, (, ), *, ' λ‹€μ„― λ¬ΈμžλŠ” μ‹€μ œλ‘œ ν˜•μ‹ν™”λœ μ‚¬μš©μ²˜λŠ” λ°ν˜€μ§€μ§€ μ•Šμ•˜μ§€λ§Œ, μ—„κ²©ν•œ λ°©μ‹μœΌλ‘œ μ œν•œν•˜λ €λ©΄ 이 λ¬Έμžλ“€κΉŒμ§€ μœ νš¨ν•œ 문자둜 ν¬ν•¨ν•΄μ€˜μ•Ό ν•©λ‹ˆλ‹€.

 

🧩 search part (query & fragment) ν˜•μ‹ μ•Œμ•„λ³΄κΈ°

search part λŠ” query parameter와 anchor둜 κ΅¬μ„±λ©λ‹ˆλ‹€.
parameter λŠ” ? 둜 μ‹œμž‘λ˜λ©° key=value ν˜•μ‹μ˜ 질의 리슀트둜 κ΅¬μ„±λ©λ‹ˆλ‹€.

// search parameter μ˜ˆμ‹œ

?key1=value&key2=value

 

anchorλŠ” λ¦¬μ†ŒμŠ€μ˜ νŠΉμ • 뢀뢄을 κ°€λ¦¬ν‚€λŠ” μΌμ’…μ˜ λΆλ§ˆν¬μž…λ‹ˆλ‹€.
ν”νžˆfragement identifier λΌλŠ” μš©μ–΄λ‘œ μ•Œλ €μ Έ μžˆμŠ΅λ‹ˆλ‹€.

RFC 3986의 section 3.4, 3.5 λ₯Ό μ‚΄νŽ΄λ³΄λ©΄, 두 URI μ»΄ν¬λ„ŒνŠΈκ°€ μš”κ΅¬ν•˜λŠ” ν˜•μ‹μ€ λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€.
μ•žμ„  path와 λ™μΌν•œ pchar ν˜•μ‹μ„ κ³΅μœ ν•˜λ„€μš”!

query = *(pchar / "/" / "?")
fragment = *(pchar / "/" / "?")

 

🧩 path, query μ •κ·œμ‹ κ΅¬ν˜„ν•˜κΈ°

μœ„μ—μ„œ μ •λ¦¬ν•œ λ‚΄μš©μ„ λ°”νƒ•μœΌλ‘œ path 와 query 에 λŒ€μ‘ν•˜λŠ” μ •κ·œμ‹μ„ μž‘μ„±ν•©λ‹ˆλ‹€.

path: ([a-zA-Z0-9@;:%.,_+~#=/?&-]*)
query: λ™μΌν•©λ‹ˆλ‹€.

 

μ •κ·œμ‹μ˜ 문자 클래슀 λ₯Ό μ΄μš©ν•΄μ„œ κ°„μ†Œν™”ν•˜λ©΄ λ‹€μŒκ³Ό 같이 ν‘œν˜„ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

path: ([\w@;:%.,+~#=/?&-]*)
query: λ™μΌν•©λ‹ˆλ‹€.

 

πŸ“Œ μ’…ν•©ν•˜λ©΄

(http|https):\/\/([a-z\d]([a-z\d-]{0,61}[a-z\d])?)(\.[a-z\d]([a-z\d-]{0,61}[a-z\d])?)*(:\d*)?\b([\w@;:%.,+~#=/?&-]*)

 

πŸ“Œ URL 클래슀 μ‚¬μš©ν•˜λŠ” 방법

νŠΉμ • λ¬Έμžμ—΄μ΄ μœ νš¨ν•œ μ£Όμ†Œ ν˜•μ‹μΈμ§€ νŒλ‹¨ν•˜λŠ” μ•„μ£Ό κ°„λ‹¨ν•œ 방법이 μžˆλŠ”λ°, λ°”λ‘œ URL 클래슀λ₯Ό μ΄μš©ν•˜λŠ” κ²ƒμž…λ‹ˆλ‹€.

λ‹€λ§Œ URL μƒμ„±μžκ°€ safari 14.1λΆ€ν„° μ§€μ›ν•˜κΈ° λ•Œλ¬Έμ— 이전 λ²„μ „κΉŒμ§€ λŒ€μ‘μ΄ ν•„μš”ν•˜λ‹€λ©΄ core-jsλ₯Ό ν†΅ν•œ polyfill μ£Όμž…μ΄ ν•„μš”ν•©λ‹ˆλ‹€.

function checkUrl(url) {
  try {
     new URL(url);
  } catch (error) {
     return false; 
  }

  return true;
}

 

λ‹€λ§Œ μ €μ˜ κ²½μš°μ—λŠ” νŠΉμ • 문자둜 λ¬Έμžμ—΄μ΄ κ΅¬λΆ„λ˜μ§€ μ•Šμ€ μƒν™©μ—μ„œ(ex. 곡백문자) URL을 식별해야 ν–ˆκΈ° λ•Œλ¬Έμ—
이 방법을 μ‚¬μš©ν•  수 μ—†μ–΄μ„œ μ •κ·œμ‹μ„ μ΄μš©ν•˜λŠ” λ°©μ‹μœΌλ‘œ μ ‘κ·Όν–ˆμŠ΅λ‹ˆλ‹€.

 

πŸ“Œ 참고자료

URL: URL() constructor - Web APIs | MDN

encodeURIComponent() - JavaScript | MDN

What is a URL? - Learn web development | MDN

What's valid and what's not in a URI query?

Query string

Regular expression to match DNS hostname or IP Address?

Regular Expression for Checking if a String Is a Valid URL | Baeldung on Computer Science

λ°˜μ‘ν˜•

πŸ’¬ λŒ“κΈ€