Quantcast
Channel: CodeSection,代码区,Python开发技术文章_教程 - CodeSec
Viewing all articles
Browse latest Browse all 9596

golang encoding/csv: Reading is slower than python

$
0
0
$ go version go version go1.7 linux/amd64

Reading of csv files is, out of the box, quite slow (tl;dr: 3x slower than a simple Java program, 1.5x slower than the obvious python code). A typical example:

package main import ( "bufio" "encoding/csv" "fmt" "io" "os" ) func main() { f, _ := os.Open("mock_data.csv") defer f.Close() r := csv.NewReader(f) for { line, err := r.Read() if err == io.EOF { break } if line[0] == "42" { fmt.Println(line) } } }

Python3 equivalent:

import csv with open('mock_data.csv') as f: r = csv.reader(f) for row in r: if row[0] == "42": print(row)

Equivalent Java code:

import java.io.BufferedReader; import java.io.FileReader; public class ReadCsv { public static void main(String[] args) { BufferedReader br; String line; try { br = new BufferedReader(new FileReader("mock_data.csv")); while ((line = br.readLine()) != null) { String[] data = line.split(","); if (data[0].equals("42")) { System.out.println(line); } } } catch (Exception e) {} } }

Tested on a 50MB, 1'000'002 lines csv file generated as:

data = ",Carl,Gauss,cgauss@unigottingen.de,Male,30.4.17.77\n" with open("mock_data.csv", "w") as f: f.write("id,first_name,last_name,email,gender,ip_address\n") f.write(("1"+data)*int(1e6)) f.write("42"+data);

Results:

Go: avg 1.489 secs Python: avg 0.933 secs (1.5x faster) Java: avg 0.493 secs (3.0x faster)

Go error reporting is obviously better than the one you can have with that Java code, and I'm not sure about Python, but people has been complaining about encoding/csv slowness, so it's probably worth investigating whether the csv package can be made faster.


Viewing all articles
Browse latest Browse all 9596

Trending Articles