Linux 使用 POSIX API 读取文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13322299/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 17:45:09  来源:igfitidea点击:

File read using POSIX API's

clinuxstringfile-ioposix

提问by Vivek Maran

Consider the following piece of code for reading the contents of the file into a buffer

考虑以下用于将文件内容读入缓冲区的代码

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#define BLOCK_SIZE 4096

int main()
{
   int fd=-1;
   ssize_t bytes_read=-1;
   int i=0;
   char buff[50];
   //Arbitary size for the buffer?? How to optimise.
   //Dynamic allocation is a choice but what is the
   //right way to relate the file size to bufffer size.

   fd=open("./file-to-buff.txt",O_RDONLY);
   if(-1 == fd)
   {
      perror("Open Failed");
      return 1;
   }

   while((bytes_read=read(fd,buff,BLOCK_SIZE))>0)
   {
      printf("bytes_read=%d\n",bytes_read);
   }

   //Test to characters read from the file to buffer.The file contains "Hello"
   while(buff[i]!='
// Get size.
off_t size = lseek(fd, 0, SEEK_END); // You should check for an error return in real code
// Seek back to the beginning.
lseek(fd, 0, SEEK_SET);
// Allocate enough to hold the whole contents plus a '
struct stat fileStat;
fstat(fd, &fileStat); // Don't forget to check for an error return in real code
// Allocate enough to hold the whole contents plus a '
#include <sys/stat.h> // For fstat()
#include <unistd.h>   // For lseek()
' char. char *buff = malloc(fileStat.st_size + 1);
' char. char *buff = malloc(size + 1);
') { printf("buff[%d]=%d\n",i,buff[i]); i++; //buff[5]=\n-How? } //buff[6]=`
free(buff);
`-How? close(fd); return 0; }

Code Description:

代码说明:

  • The input file contains a string "Hello"
  • This content needs to be copied into the buffer.
  • The objective is acheived by openand readPOSIX API's.
  • The read API uses a pointer to a buffer of an*arbitary size* to copy the data in.
  • 输入文件包含一个字符串“Hello”
  • 需要将此内容复制到缓冲区中。
  • 其目的是通过来达到的openreadPOSIX API的。
  • 读取 API 使用一个指向 *任意大小*的缓冲区的指针来复制数据。

Questions:

问题:

  • Dynamic allocation is the method that must be used to optimize the size of the buffer.What is the right procedure to relate/derive the buffer size from the input file size?
  • I see at the end of the readoperation the read has copied a new line characterand a NULLcharacter in addition to the characters "Hello". Please elaborate more on this behavior of read.
  • 动态分配是必须用于优化缓冲区大小的方法。从输入文件大小关联/导出缓冲区大小的正确程序是什么?
  • 我看到在read操作结束时,除了字符"Hello"之外,read 还复制了一个new line character和一个NULL字符。请详细说明这种读取行为。

Sample Output

样本输出

bytes_read=6

buff[0]=H

buff[1]=e

buff[2]=l

buff[3]=l

buff[4]=o

buff[5]=

bytes_read=6

增益[0]=H

增益[1]=e

增益[2]=l

增益[3]=l

增益[4]=o

增益[5]=

PS: Input file is user created file not created by a program (using writeAPI). Just to mention here, in case if it makes any difference.

PS:输入文件是用户创建的文件,不是由程序(使用writeAPI)创建的。只是在这里提一下,以防万一。

采纳答案by Nikos C.

Since you want to read the whole file, the best way is to make the buffer as big as the file size. There's no point in resizing the buffer as you go. That just hurts performance without good reason.

既然要读取整个文件,最好的办法就是让缓冲区和文件大小一样大。随时调整缓冲区大小是没有意义的。这只会在没有充分理由的情况下损害性能。

You can get the file size in several ways. The quick-and-dirty way is to lseek()to the end of the file:

您可以通过多种方式获取文件大小。快速而肮脏的方法是到lseek()文件的末尾:

##代码##

The other way is to get the information using fstat():

另一种方法是使用fstat()以下方法获取信息:

##代码##

To get all the needed types and function prototypes, make sure you include the needed header:

要获得所有需要的类型和函数原型,请确保包含所需的标头:

##代码##

Note that read()does not automatically terminate the data with \0. You need to do that manually, which is why we allocate an extra character (size+1) for the buffer. The reason why there's already a \0character there in your case is pure random chance.

请注意,read()不会自动终止数据\0。您需要手动执行此操作,这就是我们为缓冲区分配一个额外字符(大小+1)的原因。\0在你的案例中已经有一个角色的原因是纯粹的随机机会。

Of course, since bufis now a dynamically allocated array, don't forget to free it again when you don't need it anymore:

当然,既然buf现在是动态分配的数组,当你不再需要它的时候不要忘记再次释放它:

##代码##

Be aware though, that allocating a buffer that's as large as the file you want to read into it can be dangerous. Imagine if (by mistake or on purpose, doesn't matter) the file is several GB big. For cases like this, it's good to have a maximum allowable size in place. If you don't want any such limitations, however, then you should switch to another method of reading from files: mmap(). With mmap(), you can map parts of a file to memory. That way, it doesn't matter how big the file is, since you can work only on parts of it at a time, keeping memory usage under control.

但是请注意,分配与要读入的文件一样大的缓冲区可能很危险。想象一下,如果(错误地或故意地,无关紧要)文件有几 GB 大。对于这种情况,最好设置一个最大允许尺寸。但是,如果您不想要任何此类限制,那么您应该切换到另一种读取文件的方法:mmap(). 使用mmap(),您可以将文件的一部分映射到内存。这样,文件有多大并不重要,因为您一次只能处理其中的一部分,从而控制内存使用。

回答by Will

You could consider allocating the buffer dynamically by first creating a buffer of a fixed size using mallocand doubling (with realloc) the size when you fill it up. This would have a good time complexity and space trade off.

您可以考虑通过首先创建一个固定大小的缓冲区来动态分配缓冲区,mallocrealloc在填充时将大小加倍(使用)。这将具有良好的时间复杂度和空间权衡。

At the moment you repeatedly read into the same buffer. You should increase the point in the buffer after each read otherwise you will overwrite the buffer contents with the next section of the file.

此时您反复读入同一个缓冲区。您应该在每次读取后增加缓冲区中的点,否则您将用文件的下一部分覆盖缓冲区内容。

The code you supply allocates 50 bytes for the buffer yet you pass 4096 as the size to the read. This could result in a buffer overflow for any files over the size of 50 bytes.

您提供的代码为缓冲区分配了 50 个字节,但您将 4096 作为大小传递给read. 这可能会导致大小超过 50 字节的任何文件的缓冲区溢出。

As for the `\n' and '\0'. The newline is probably in the file and the '\0' was just already in the buffer. The buffer is allocated on the stack in your code and if that section of the stack had not been used yet it would probably contain zeros, placed there by the operating system when your program was loaded.

至于'\n'和'\0'。换行符可能在文件中,而 '\0' 刚刚在缓冲区中。缓冲区在代码中的堆栈上分配,如果堆栈的该部分尚未使用,它可能包含零,在加载程序时由操作系统放置在那里。

The operating system makes no attempt to terminate the data read from the file, it might be binary data or in a character set that it doesn't understand. Terminating the string, if needed, is up to you.

操作系统不会尝试终止从文件中读取的数据,它可能是二进制数据或它不理解的字符集。如果需要,终止字符串由您决定。

A few other points that are more a matter of style:

其他一些与风格有关的要点:

  • You could consider using a for (i = 0; buff[i]; ++i)loop instead of a while for the printing out at the end. This way if anyone messes with the index variable iyou will be unaffected.
  • You could close the file earlier, after you finish reading from it, to avoid having the file open for an extended period of time (and maybe forgetting to close it if some kind of error happens).
  • 您可以考虑使用for (i = 0; buff[i]; ++i)循环而不是一段时间进行最后的打印。这样,如果有人弄乱了索引变量,i您将不受影响。
  • 您可以在完成读取后提前关闭文件,以避免文件打开时间过长(如果发生某种错误,可能会忘记关闭它)。

回答by BAK

For your second question, readdon't add automatically a character '\0'. If you consider that your file is a textual file, your must add a '\0'after calling read, for indicate the end of string.

对于你的第二个问题,read不要自动添加一个字符'\0'。如果您认为您的文件是文本文件,则必须'\0'在调用后添加一个read, 以指示字符串的结尾。

In C, the end of string is represented by this caracter. If readset 4 characters, printfwill read these 4 characters, and will test the 5th: if it's not '\0', it will continue to print until next '\0'. It's also a source of buffer overflow

在 C 中,字符串的结尾由这个字符表示。如果read设置4个字符,printf将读取这4个字符,并测试第5个:如果不是'\0',则继续打印直到下一个'\0'。它也是缓冲区溢出的来源

For the '\n', it is probably in the input file.

对于'\n',它可能在输入文件中。

回答by jason.foo

1, you can get the file size with stat(filename, &stat), but define the buffer to page size is just fine

1、可以用stat(filename, &stat)来获取文件大小,但是把buffer定义为page size就好了

2, first, there is no NULL character after "Hello", it must be accident that the stack area you allocated was 0 before your code executed, please refer to APUE chapter 7.6. In fact you must initialize the local variable before using it.

2、首先,“Hello”后面没有NULL字符,一定是你的代码执行前分配的栈区为0是偶然的,请参考APUE章节7.6。实际上,您必须在使用局部变量之前对其进行初始化。

I tried to generate the text file with vim, emacs and echo -n Hello > file-to-buff.txt, only vim adds a line break automatically

我尝试用vim、emacs和echo -n Hello > file-to-buff.txt生成文本文件,只有vim会自动添加换行符