C# 仅从 URL 获取域名?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2154167/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get just the domain name from a URL?
提问by mark smith
I am trying to extract just the domain name from a URL string. I almost have it... I am using URI
我试图从 URL 字符串中提取域名。我几乎拥有它...我正在使用 URI
I have a string.. my first thought was to use Regex but then i decided to use URI class
我有一个字符串..我的第一个想法是使用 Regex 但后来我决定使用 URI 类
I need to convert the above to google.com and google without the www
我需要将上述内容转换为 google.com 和 google 而不带 www
I did the following
我做了以下
Uri test = new Uri(referrer);
log.Info("Domain part : " + test.Host);
Basically this returns www.google.com .... i would like to try and return 2 forms if possible... as mentioned...
基本上这会返回 www.google.com .... 如果可能的话,我想尝试返回 2 个表单......如上所述......
google.com and google
google.com 和谷歌
Is this possible with URI?
这可以通过 URI 实现吗?
回答by Dewfy
Yes, it is possible use:
是的,可以使用:
Uri.GetLeftPart( UriPartial.Authority )
回答by naivists
google.com is not guaranteed to be the same as www.google.com (well, for this example it technically is, but may be otherwise).
google.com 不能保证与 www.google.com 相同(好吧,对于这个例子,它在技术上是一样的,但可能不是)。
maybe what you need is actually remove the "top level" domain and the "www" subodmain? Then just split('.')
and take the part before the last part!
也许您需要的是删除“顶级”域和“www”子域?然后split('.')
就拿最后一部分之前的那一部分吧!
回答by David_001
I think you are displaying a misunderstanding of what constitutes a "domain name" - there is no such thing as a "pure domain name" in common usage - this is something you will need to define if you want consistent results.
Do you just want to strip off the "www" part?
And then have another version which strips off the top level domain (eg. strip off the ".com" or the ".co.uk" etc parts?)
Another answer mentions split(".") - you will need to use something like this if you want to exclude specific parts of the hostname manually, there's nothing within the .NET framework to meet your requirements exactly - you'll need to implement these things yourself.
我认为您对“域名”的构成存在误解 - 没有常见的“纯域名”这样的东西 - 如果您想要一致的结果,您需要定义这一点。
你只是想去掉“www”部分吗?然后有另一个版本剥离顶级域(例如剥离“.com”或“.co.uk”等部分?)另一个答案提到 split(".") - 你需要使用一些东西像这样,如果您想手动排除主机名的特定部分,.NET 框架中没有任何内容可以完全满足您的要求 - 您需要自己实现这些内容。
回答by Chris
Because of the numerous variations in domain names and the non-existence of any real authoritative list of what constitutes a "pure domain name" as you describe, I've just resorted to using Uri.Host in the past. To avoid cases where www.google.com and google.com show up as two different domains, I've often resorted to stripping the www. from all domains that contain it, since it's almost guaranteed (ALMOST) to point to the same site. It's really the only simple way to do it without risking losing some data.
由于域名的众多变化以及您所描述的构成“纯域名”的任何真正权威列表的不存在,我过去只是使用 Uri.Host。为避免 www.google.com 和 google.com 显示为两个不同的域的情况,我经常采取去除 www 的方法。来自包含它的所有域,因为几乎可以保证(几乎)指向同一个站点。这确实是唯一一种不会丢失一些数据的简单方法。
回答by Mark Schultheiss
See Rick Strahl's blog recently as Reference for some c# and .net centric:
最近查看 Rick Strahl 的博客作为一些 c# 和 .net 中心的参考:
回答by servermanfail
@Dewfy: flaw is that your method returns "uk" for "www.test.co.uk" but the domain here is clearly "test.co.uk".
@Dewfy:缺陷是您的方法为“www.test.co.uk”返回“uk”,但这里的域显然是“test.co.uk”。
@naivists: flaw is that your method returns "beta.microsoft.com" for "www.beta.microsoft.com" but the domain here is clearly "microsoft.com"
@naivists:缺陷是您的方法为“www.beta.microsoft.com”返回“beta.microsoft.com”,但此处的域显然是“microsoft.com”
I needed the same, so I wrote a class that you can copy and paste into your solution. It uses a hard coded string array of tld's. http://pastebin.com/raw.php?i=VY3DCNhp
我需要同样的东西,所以我写了一个类,您可以将其复制并粘贴到您的解决方案中。它使用 tld 的硬编码字符串数组。http://pastebin.com/raw.php?i=VY3DCNhp
Console.WriteLine(GetDomain.GetDomainFromUrl("http://www.beta.microsoft.com/path/page.htm"));
outputs microsoft.com
产出 microsoft.com
and
和
Console.WriteLine(GetDomain.GetDomainFromUrl("http://www.beta.microsoft.co.uk/path/page.htm"));
outputs microsoft.co.uk
产出 microsoft.co.uk
回答by maxp
Yes, ive posted the solution here: http://pastebin.com/raw.php?i=raxNQkCF
是的,我在这里发布了解决方案:http: //pastebin.com/raw.php?i=raxNQkCF
If you want to remove the extension just add
如果您想删除扩展名,只需添加
if (url.indexof(".")>-1) {url = url.substring(0, url.indexof("."))}
if (url.indexof(".")>-1) {url = url.substring(0, url.indexof("."))}
回答by anoordende
Below is some code that will give just the SLD plus gTLD or ccTLD extension (note the exception below). I do not care about DNS.
下面是一些仅提供 SLD 加 gTLD 或 ccTLD 扩展名的代码(请注意下面的例外情况)。我不在乎 DNS。
The theory is as follows:
该理论如下:
- Anything under 3 tokens stays as is e.g. "localhost", "domain.com", otherwise: The last token must be a gTLD or ccTLD extension.
- The penultimate token is considered part of the extension if it's length < 3 OR if included in a list of exceptions.
- Finally the token before that one is considered the SLD. Anything before that is considered a subdomain or a host qualifier, e.g. Www.
- 3 个令牌以下的任何内容保持原样,例如“localhost”、“domain.com”,否则:最后一个令牌必须是 gTLD 或 ccTLD 扩展名。
- 如果倒数第二个标记的长度 < 3 或包含在异常列表中,则将其视为扩展的一部分。
- 最后,在那个之前的令牌被认为是 SLD。在此之前的任何内容都被视为子域或主机限定符,例如 Www。
As for the code, short & sweet:
至于代码,简短而甜蜜:
private static string GetDomainName(string url)
{
string domain = new Uri(url).DnsSafeHost.ToLower();
var tokens = domain.Split('.');
if (tokens.Length > 2)
{
//Add only second level exceptions to the < 3 rule here
string[] exceptions = { "info", "firm", "name", "com", "biz", "gen", "ltd", "web", "net", "pro", "org" };
var validTokens = 2 + ((tokens[tokens.Length - 2].Length < 3 || exceptions.Contains(tokens[tokens.Length - 2])) ? 1 : 0);
domain = string.Join(".", tokens, tokens.Length - validTokens, validTokens);
}
return domain;
}
The obvious exception is that this will not deal with 2-letter domain names. So if you're lucky enough to own ab.com you'll need to adapt the code slightly. For us mere mortals this code will cover just about every gTLD and ccTLD, minus a few very exotic ones.
明显的例外是这不会处理 2 个字母的域名。因此,如果您有幸拥有 ab.com,则需要稍微调整代码。对于我们这些凡人来说,此代码将涵盖几乎所有 gTLD 和 ccTLD,减去一些非常奇特的。
回答by Andy
Uri's Host always returns domain (www.google.com), including a label (www) and a top-level domain (com). But often you would want to extract the middle bit. Simply I do
Uri 的主机总是返回域 (www.google.com),包括一个标签 (www) 和一个顶级域 (com)。但通常你会想要提取中间位。只是我做
Uri uri;
bool result = Uri.TryCreate(returnUri, UriKind.Absolute, out uri);
if (result == false)
return false;
//if you are sure it's not "localhost"
string domainParts = uri.Host.Split('.');
string topLevel = domainParts[domainParts.Length - 1]
string hostBody = domainParts[domainParts.Length - 2]
string label = domainParts[domainParts.Length - 3]
But you do need to check domainParts.length, as often the given uri is like "google.com".
但是您确实需要检查 domainParts.length,因为给定的 uri 通常类似于“google.com”。
回答by craig
string domain = new Uri(HttpContext.Current.Request.Url.AbsoluteUri).GetLeftPart(UriPartial.Authority);