Linux 如何从 Bash 中的数组中获取唯一值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13648410/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I get unique values from an array in Bash?
提问by Jetse
I've got almost the same question as here.
我和这里有几乎相同的问题。
I have an array which contains aa ab aa ac aa ad
, etc.
Now I want to select all unique elements from this array.
Thought, this would be simple with sort | uniq
or with sort -u
as they mentioned in that other question, but nothing changed in the array...
The code is:
我有一个包含aa ab aa ac aa ad
等的数组。现在我想从这个数组中选择所有唯一的元素。思想,这将是简单的用sort | uniq
或sort -u
因为他们在其他问题中提到,但没有在数组中改变...的代码是:
echo `echo "${ids[@]}" | sort | uniq`
What am I doing wrong?
我究竟做错了什么?
采纳答案by sampson-chen
A bit hacky, but this should do it:
有点hacky,但这应该可以做到:
echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '
To save the sorted unique results back into an array, do Array assignment:
要将排序后的唯一结果保存回数组,请执行Array assignment:
sorted_unique_ids=($(echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '))
If your shell supports herestrings(bash
should), you can spare an echo
process by altering it to:
如果您的 shell 支持herestrings( bash
should),您可以echo
通过将其更改为以下内容来节省进程:
tr ' ' '\n' <<< "${ids[@]}" | sort -u | tr '\n' ' '
Input:
输入:
ids=(aa ab aa ac aa ad)
Output:
输出:
aa ab ac ad
Explanation:
解释:
"${ids[@]}"
- Syntax for working with shell arrays, whether used as part ofecho
or a herestring. The@
part means "all elements in the array"tr ' ' '\n'
- Convert all spaces to newlines. Because your array is seen by shell as elements on a single line, separated by spaces; and because sort expects input to be on separate lines.sort -u
- sort and retain only unique elementstr '\n' ' '
- convert the newlines we added in earlier back to spaces.$(...)
- Command Substitution- Aside:
tr ' ' '\n' <<< "${ids[@]}"
is a more efficient way of doing:echo "${ids[@]}" | tr ' ' '\n'
"${ids[@]}"
- 使用 shell 数组的语法,无论是用作echo
herestring 的一部分还是用作 herestring。该@
部分的意思是“数组中的所有元素”tr ' ' '\n'
- 将所有空格转换为换行符。因为你的数组被 shell 看作是一行上的元素,用空格分隔;并且因为 sort 期望输入在不同的行上。sort -u
- 仅排序并保留唯一元素tr '\n' ' '
- 将我们之前添加的换行符转换回空格。$(...)
-命令替换- 旁白:
tr ' ' '\n' <<< "${ids[@]}"
是一种更有效的做法:echo "${ids[@]}" | tr ' ' '\n'
回答by ghoti
If you're running Bash version 4 or above (which should be the case in any modern version of Linux), you can get unique array values in bash by creating a new associative array that contains each of the values of the original array. Something like this:
如果您运行的是 Bash 版本 4 或更高版本(在任何现代 Linux 版本中都应该如此),您可以通过创建一个包含原始数组的每个值的新关联数组,在 bash 中获得唯一的数组值。像这样的东西:
$ a=(aa ac aa ad "ac ad")
$ declare -A b
$ for i in "${a[@]}"; do b["$i"]=1; done
$ printf '%s\n' "${!b[@]}"
ac ad
ac
aa
ad
This works because in an array, each key can only appear once. When the for
loop arrives at the second value of aa
in a[2]
, it overwrites b[aa]
which was set originally for a[0]
.
这是有效的,因为在数组中,每个键只能出现一次。当for
循环到达aa
in的第二个值时a[2]
,它会覆盖b[aa]
最初为 设置的值a[0]
。
Doing things in native bash can be faster than using pipes and external tools like sort
and uniq
.
在本机 bash 中执行操作比使用管道和外部工具(如sort
和 )更快uniq
。
If you're feeling confident, you can avoid the for
loop by using printf
's ability to recycle its format for multiple arguments, though this seems to require eval
. (Stop reading now if you're fine with that.)
如果您有信心,您可以for
通过使用printf
的功能为多个参数回收其格式来避免循环,尽管这似乎需要eval
. (如果您对此感到满意,请立即停止阅读。)
$ eval b=( $(printf ' ["%s"]=1' "${a[@]}") )
$ declare -p b
declare -A b=(["ac ad"]="1" [ac]="1" [aa]="1" [ad]="1" )
The reason this solution requires eval
is that array values are determined before word splitting. That means that the output of the command substitution is considered a single wordrather than a set of key=value pairs.
此解决方案需要的原因eval
是在分词之前确定数组值。这意味着命令替换的输出被视为单个单词而不是一组键=值对。
While this uses a subshell, it uses only bash builtins to process the array values. Be sure to evaluate your use of eval
with a critical eye. If you're not 100% confident that chepner or glenn Hymanman or greycat would find no fault with your code, use the for loop instead.
虽然这使用了一个子 shell,但它只使用 bash 内置函数来处理数组值。请务必以eval
挑剔的眼光评估您的使用。如果您不是 100% 确信 chepner 或glenn Hymanman 或greycat 不会发现您的代码有问题,请改用for 循环。
回答by das.cyklone
I realize this was already answered, but it showed up pretty high in search results, and it might help someone.
我意识到这已经得到了回答,但它在搜索结果中的显示非常高,并且可能对某人有所帮助。
printf "%s\n" "${IDS[@]}" | sort -u
Example:
例子:
~> IDS=( "aa" "ab" "aa" "ac" "aa" "ad" )
~> echo "${IDS[@]}"
aa ab aa ac aa ad
~>
~> printf "%s\n" "${IDS[@]}" | sort -u
aa
ab
ac
ad
~> UNIQ_IDS=($(printf "%s\n" "${IDS[@]}" | sort -u))
~> echo "${UNIQ_IDS[@]}"
aa ab ac ad
~>
回答by vontrapp
If your array elements have white space or any other shell special character (and can you be sure they don't?) then to capture those first of all (and you should just always do this) express your array in double quotes! e.g. "${a[@]}"
. Bash will literally interpret this as "each array element in a separate argument". Within bash this simply always works, always.
如果你的数组元素有空格或任何其他 shell 特殊字符(你能确定它们没有吗?)然后首先捕获那些(你应该总是这样做)用双引号表达你的数组!例如"${a[@]}"
。Bash 会将其逐字解释为“每个数组元素在一个单独的参数中”。在 bash 中,这总是有效,总是有效。
Then, to get a sorted (and unique) array, we have to convert it to a format sort understands and be able to convert it back into bash array elements. This is the best I've come up with:
然后,为了得到一个排序的(和唯一的)数组,我们必须将它转换为排序可以理解的格式,并且能够将其转换回 bash 数组元素。这是我想出的最好的:
eval a=($(printf "%q\n" "${a[@]}" | sort -u))
Unfortunately, this fails in the special case of the empty array, turning the empty array into an array of 1 empty element (because printf had 0 arguments but still prints as though it had one empty argument - see explanation). So you have to catch that in an if or something.
不幸的是,这在空数组的特殊情况下失败,将空数组转换为 1 个空元素的数组(因为 printf 有 0 个参数,但仍然打印,好像它有一个空参数 - 请参阅解释)。所以你必须在 if 或其他东西中捕捉到它。
Explanation: The %q format for printf "shell escapes" the printed argument, in just such a way as bash can recover in something like eval! Because each element is printed shell escaped on it's own line, the only separator between elements is the newline, and the array assignment takes each line as an element, parsing the escaped values into literal text.
说明:printf 的 %q 格式“shell 转义”了打印的参数,就像 bash 可以在 eval 之类的东西中恢复一样!因为每个元素都在它自己的行上打印外壳转义,元素之间的唯一分隔符是换行符,数组赋值将每一行作为一个元素,将转义值解析为文字文本。
e.g.
例如
> a=("foo bar" baz)
> printf "%q\n" "${a[@]}"
'foo bar'
baz
> printf "%q\n"
''
The eval is necessary to strip the escaping off each value going back into the array.
必须使用 eval 来剥离每个返回数组的值。
回答by estani
Without loosing the original ordering:
在不丢失原始顺序的情况下:
uniques=($(tr ' ' '\n' <<<"${original[@]}" | awk '!u[echo ${ARRAY[@]} | tr [:space:] '\n' | awk '!a[ARRAY=($(echo ${ARRAY[@]} | tr [:space:] '\n' | awk '!a[readarray -t NewArray < <(printf '%s\n' "${OriginalArray[@]}" | sort -u)
]++'))
]++'
]++' | tr '\n' ' '))
回答by faustus
this one will also preserve order:
这也将保持秩序:
readarray -t NewArray < <(printf '%s\n' "${OriginalArray[@]}" | awk '!x[for i in ${ids[@]}; do echo $i; done | sort
]++')
and to modify the original array with the unique values:
并使用唯一值修改原始数组:
for i in ${ids[@]}; do echo $i; done | sort -u
回答by Six
To create a new array consisting of unique values, ensure your array is not empty then do one of the following:
要创建由唯一值组成的新数组,请确保您的数组不为空,然后执行以下操作之一:
Remove duplicate entries (with sorting)
删除重复条目(带排序)
ids=( `for i in ${ids[@]}; do echo $i; done | sort -u` )
Remove duplicate entries (without sorting)
删除重复条目(不排序)
1 2 3 4 4 3 2 5 6
Warning: Do not try to do something like NewArray=( $(printf '%s\n' "${OriginalArray[@]}" | sort -u) )
. It will break on spaces.
警告:不要尝试做类似的事情NewArray=( $(printf '%s\n' "${OriginalArray[@]}" | sort -u) )
。它会在空格处中断。
回答by corbyn42
'sort' can be used to order the output of a for-loop:
'sort' 可用于对 for 循环的输出进行排序:
1
2
3
4
4
3
2
5
6
and eliminate duplicates with "-u":
并使用“-u”消除重复项:
4
3
2
Finally you can just overwrite your array with the unique elements:
最后,您可以使用唯一元素覆盖您的数组:
1
2
3
4
5
6
回答by VIPIN KUMAR
cat number.txt
猫号.txt
1
5
6
print line into column:
cat number.txt | awk '{for(i=1;i<=NF;i++) print $i}'
将行打印到列中:
cat number.txt | awk '{for(i=1;i<=NF;i++) print $i}'
awk -F, '{a[];}END{for (i in a)print i;}'
find the duplicate records:
cat number.txt | awk '{for(i=1;i<=NF;i++) print $i}' |awk 'x[$0]++'
##代码##找到重复的记录:
cat number.txt | awk '{for(i=1;i<=NF;i++) print $i}' |awk 'x[$0]++'
Replace duplicate records:
cat number.txt | awk '{for(i=1;i<=NF;i++) print $i}' |awk '!x[$0]++'
##代码##替换重复记录:
cat number.txt | awk '{for(i=1;i<=NF;i++) print $i}' |awk '!x[$0]++'
Find only Uniq records:
cat number.txt | awk '{for(i=1;i<=NF;i++) print $i|"sort|uniq -u"}
##代码##仅查找 Uniq 记录:
cat number.txt | awk '{for(i=1;i<=NF;i++) print $i|"sort|uniq -u"}
回答by Suresh Aitha
Try this to get uniq values for first column in file
试试这个来获取文件中第一列的 uniq 值
##代码##