C# OleDb 连接到 Excel;如何选择固定宽度,无限高度?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1138197/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
OleDb connection to Excel; how do I select fixed width, unbounded height?
提问by Steve Cooper
I'm using OleDb to select data from excel spreadsheets. Each spreadsheet can contain many small tables, and possibly furniture like titles and labels. So it might look like this, where we have two tables and some titles;
我正在使用 OleDb 从 Excel 电子表格中选择数据。每个电子表格可以包含许多小表格,并且可能包含标题和标签等家具。所以它可能看起来像这样,我们有两个表格和一些标题;
A B C D 1 . . . . 2 . . . . 3 Table1 . . . 4 Header1 HEADER2 . . 5 h huey . . 6 d dewey . . 7 l loius . . 8 s scrooge . . 9 . . . . 10 . . . . 11 . . . . 12 . . . . 13 . Table 2 . . 14 . HEADER1 HEADER2 HEADER3 15 . 1 foo x 16 . 2 bar y 17 . 3 baz z 18 . . . . 19 . . . .
In a previous step, the user has selected the headers of the table they are interested in; in this case, looking at table 2 they will have selected the range B14:D14
.
在上一步中,用户选择了他们感兴趣的表的标题;在这种情况下,查看表 2,他们将选择范围B14:D14
。
These settings are saved, and then I need to query that table. It may happen over and over, as the spreadsheet data is updated; more rows may be added at any time, but the headers are always fixed. There is a sentinel (blank row) marking the end of data
保存这些设置,然后我需要查询该表。随着电子表格数据的更新,它可能会一遍又一遍地发生;可以随时添加更多行,但标题始终是固定的。有一个哨兵(空行)标记数据结束
To select the data in the table, I'm writing a query like this;
为了选择表中的数据,我正在编写这样的查询;
SELECT * FROM [Sheet1$B14:D65535]
to select the data in table 2, and then manually checking for the sentinel row, but this seems unsatisfying. Excel 2003 can only read 65,535 rows (uint16), but excel 2007 can read many more (uint32), so I have to write code which gives a different query for Excel 2003 and 2007 based on the extension of the file (.xls vs .xls?).
选择表2中的数据,然后手动检查哨兵行,但这似乎并不令人满意。Excel 2003 只能读取 65,535 行 (uint16),但 excel 2007 可以读取更多行 (uint32),因此我必须编写代码,根据文件的扩展名(.xls 与 .xls)为 Excel 2003 和 2007 提供不同的查询。 xls?)。
Does anyone know of a way to write a query that says either;
有没有人知道写一个查询的方法,要么说;
- 'select everything down and right of B14'?
- 'select everything in columns B->D'
- 'select B12:D*' where * means 'everything you can'
- “选择 B14 下方和右侧的所有内容”?
- '选择 B->D 列中的所有内容'
- '选择 B12:D*' 其中 * 表示'你能做的一切'
采纳答案by John Machin
Pre-requisite: you can easily determine in your code what the maximum number number of rows is.
先决条件:您可以在代码中轻松确定最大行数是多少。
Assuming (1) there's a big overhead per SELECT, so SELECTing a row at a time is slow (2) SELECTing 64K or 8M rows (even if blank) is slow ... so you want to see if somewhere in the middle can be faster. Try this:
假设 (1) 每个 SELECT 有很大的开销,所以一次选择一行很慢 (2) 选择 64K 或 8M 行(即使是空白)很慢......所以你想看看中间的某个地方是否可以快点。尝试这个:
Select CHUNKSIZE (e.g. 100 or 1000) rows at a time (less when you would otherwise over-run MAX_ROWS). Scan each chunk for the blank row that marks end-of-data.
一次选择 CHUNKSIZE(例如 100 或 1000)行(否则会超出 MAX_ROWS 时会减少)。扫描每个块以查找标记数据结束的空白行。
UPDATE: Actually answering the explicit questions:
更新:实际回答明确的问题:
Q: Does anyone know of a way to write a query that says either;
问:有没有人知道一种编写查询的方法,其中包括:
Q1: 'select everything down and right of B14'?
Q1:“选择 B14 下方和右侧的所有内容”?
A1: select * from [Sheet1$B12:]
doesn't work. You would have to do ...B12:IV
in Excel 2003 and whatever it is in Excel 2007. However you don't need that because you know what your rightmost column is; see below.
A1:select * from [Sheet1$B12:]
不起作用。您必须...B12:IV
在 Excel 2003 以及 Excel 2007 中执行任何操作。但是您不需要这样做,因为您知道最右边的列是什么;见下文。
Q2: 'select everything in columns B->D'
Q2:“选择 B->D 列中的所有内容”
A2: select
* from [Sheet1$B:D]
A2: select
* from [Sheet1$B:D]
Q3: 'select B12:D*
' where *
means 'everything you can'
Q3:“选择 B12:D *
”,其中的*
意思是“尽你所能”
A3: select * from [Sheet1$B12:D]
A3:从 [Sheet1$B12:D] 中选择 *
Tested with Python 2.5 using the following code:
使用以下代码使用 Python 2.5 进行测试:
import win32com.client
import sys
filename, sheetname, range = sys.argv[1:4]
DSN= """
PROVIDER=Microsoft.Jet.OLEDB.4.0;
DATA SOURCE=%s;
Extended Properties='Excel 8.0;READONLY=true;IMEX=1';
""" % filename
conn = win32com.client.Dispatch("ADODB.Connection")
conn.Open(DSN)
rs = win32com.client.Dispatch("ADODB.Recordset")
sql = (
"SELECT * FROM [Excel 8.0;HDR=NO;IMEX=1;Database=%s;].[%s$%s]"
% (filename, sheetname, range)
)
rs.Open(sql, conn)
nrows = 0
while not rs.EOF:
nrows += 1
nf = rs.Fields.Count
values = [rs.Fields.Item(i).Value for i in xrange(nf)]
print nrows, values
if not any(value is not None for value in values):
print "sentinel found"
break
rs.MoveNext()
rs.Close()
conn.Close()
回答by devuxer
Couple possible solutions:
几个可能的解决方案:
- Put your tables on separate worksheets, then simply query the whole worksheet.
- Give each table in Excel a name (in Excel 2007, select the table, right-click, and choose Name a range...), then in your query, use this name instead of "Sheet1$B14:D65535".
- 将您的表格放在单独的工作表上,然后只需查询整个工作表。
- 为 Excel 中的每个表命名(在 Excel 2007 中,选择该表,单击鼠标右键,然后选择“命名范围...”),然后在您的查询中使用此名称而不是“Sheet1$B14:D65535”。
Hope that helps.
希望有帮助。
EDIT
编辑
Here's a third idea:
这是第三个想法:
I'm not sure what you're using to query your database, but if your query engine supports variables (like Sql Server, for example) you could store the result of...
我不确定您使用什么来查询数据库,但是如果您的查询引擎支持变量(例如 Sql Server),您可以存储...
SELECT COUNT(*) FROM NameOfServer...Sheet1$
SELECT COUNT(*) FROM NameOfServer...Sheet1$
...in a variable called @UsedRowCount, that will give you the number of rows actually used in the worksheet. So, @UsedRowCount = LastRowUsed - InitialBlankRows.
...在名为@UsedRowCount 的变量中,它将为您提供工作表中实际使用的行数。所以,@UsedRowCount = LastRowUsed - InitialBlankRows。
You might then be able to use string concatenation to replace "65535" with @UsedRowCount + @InitialBlankRows. You would have to set @InitialBlankRows to a constant (in your example, it would be 3, since the heading row of the first table is located at Row 4).
然后,您可以使用字符串连接将“65535”替换为 @UsedRowCount + @InitialBlankRows。您必须将 @InitialBlankRows 设置为常量(在您的示例中,它将是 3,因为第一个表的标题行位于第 4 行)。
回答by John Machin
You say that in a previous step, the users have selected the headers. Who's to say that below the region of current interest there aren't a few blank rows followed by another unrelated table? I suggest that you get them to select the whole range that they are interested in-- that should fix both problems.
您说在上一步中,用户选择了标题。谁能说在当前感兴趣的区域下面没有几个空白行,然后是另一个不相关的表?我建议你让他们选择他们感兴趣的整个范围——这应该可以解决这两个问题。
回答by ErvinS
I would go with the solution from John ( reading 1000 rows at a time ).
我会采用 John 的解决方案(一次读取 1000 行)。
If you have Excel installed you could also use OLE automation.
如果您安装了 Excel,您还可以使用 OLE 自动化。
I have recorded a simple macro in Excel which select the last cell in the current table.
我在 Excel 中录制了一个简单的宏,它选择了当前表格中的最后一个单元格。
Sub Macro2()
Range("B14").Select
Selection.End(xlDown).Select
//MsgBox ActiveCell.Address, vbOKOnly
End Sub
Now you just need to translate this in C# and read the address of the active cell.
现在您只需要在 C# 中转换它并读取活动单元格的地址。
回答by Phil
We read the entire spreadsheet (ie: SELECT * FROM [Sheet1$]) and handle everything else in our application code. It's easy enough to race through the resultant OleDbDataReader to get to the starting point of your data and start processing.
我们读取整个电子表格(即:SELECT * FROM [Sheet1$])并处理我们应用程序代码中的所有其他内容。很容易通过生成的 OleDbDataReader 到达数据的起点并开始处理。
It may not be the absolutely fastest way to suck data from Excel, but it is reliable.
它可能不是从 Excel 中获取数据的绝对最快的方法,但它是可靠的。