Html URL 中哪些字符是有效的?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7109143/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-29 10:09:03  来源:igfitidea点击:

What characters are valid in a URL?

htmlurl

提问by blez

Possible Duplicate:
Which characters make a URL invalid?

可能的重复:
哪些字符使 URL 无效?

I'm trying to remove the non-URL part of a big string. Most of the regexes I found are like [A-Za-z0-9-_.!~*'()], but there are more things that can a url contain. Like http://127.0.0.1:8080/test?v=123#thisfor example

我正在尝试删除大字符串的非 URL 部分。我发现的大多数正则表达式都类似于[A-Za-z0-9-_.!~*'()],但 url 可以包含更多内容。像http://127.0.0.1:8080/test?v=123#this例如

So what are the latest characters for a valid URL?

那么有效 URL 的最新字符是什么?

回答by ckittel

All the gory details can be found in the current RFC on the topic: RFC 3986 (Uniform Resource Identifier (URI): Generic Syntax)

所有血腥细节都可以在当前的 RFC 主题中找到: RFC 3986(统一资源标识符(URI):通用语法)

Based on this related answer, you are looking at a list that looks like: A-Z, a-z, 0-9, -, ., _, ~, :, /, ?, #, [, ], @, !, $, &, ', (, ), *, +, ,, ;, %, and =. Everything else must be url-encoded. Also, some of these characters can only exist in very specific spots in a URI and outside of those spots must be url-encoded (e.g. %can only be used in conjunction with url encoding as in %20), the RFC has all of these specifics.

基于此相关的答案,你正在寻找一个列表,看起来像:A-Za-z0-9-._~:/?#[]@!$&'()*+,;%,和=。其他所有内容都必须是url-encoded。此外,其中一些字符只能存在于 URI 中非常特定的位置,并且在这些位置之外必须进行 url 编码(例如%,只能与 url 编码结合使用%20),RFC 具有所有这些细节。