This article was originally published on DigitalOcean.
Introduction
A CSV is a plain text file format for storing tabular data. The CSV file uses a comma delimiter to separate values in table cells, and a new line delineates where rows begin and end. Most spreadsheet programs and databases can export and import CSV files. Because CSV is a plain-text file, any programming language can parse and write to a CSV file. Node.js has many modules that can work with CSV files, such as node-csv
, fast-csv
, and papaparse
.
In this tutorial, you will use the node-csv
module to read a CSV file using Node.js streams, which lets you read large datasets without consuming a lot of memory. You will modify the program to move data parsed from the CSV file into a SQLite database. You will also retrieve data from the database, parse it with node-csv
, and use Node.js streams to write it to a CSV file in chunks.
Prerequisites
To follow this tutorial, you will need:
-
Node.js installed on your local or server environment. Follow How to Install Node.js and Create a Local Development Environment to install Node.js.
-
SQLite installed on your local or server environment, which you can install by following step 1 in How To Install and Use SQLite on Ubuntu 20.04. Knowledge on how to use SQLite is helpful and can be learned in steps 2-7 of the installation guide.
-
Familiarity with writing a Node.js program. See How To Write and Run Your First Program in Node.js.
-
Familiarity with Node.js streams. See How To Work with Files Using Streams in Node.js.
Step 1 — Setting Up the Project Directory
In this section, you will create the project directory and download packages for your application. You will also download a CSV dataset from Stats NZ, which contains international migration data in New Zealand.
To get started, make a directory called csv_demo
and navigate into the directory:
- mkdir csv_demo
- cd csv_demo
Next, initialize the directory as an npm project using the npm init
command:
- npm init -y
The -y
option notifies npm init
to say "yes" to all the prompts. This command creates a package.json
with default values that you can change anytime.
With the directory initialized as an npm project, you can now install the necessary dependencies: node-csv
and node-sqlite3
.
Enter the following command to install node-csv
:
- npm install csv
The node-csv
module is a collection of modules that allows you to parse and write data to a CSV file. The command installs all four modules that are part of the node-csv
package: csv-generate
, csv-parse
, csv-stringify
, and stream-transform
. You will use the csv-parse
module to parse a CSV file and the csv-stringify
module to write data to a CSV file.
Next, install the node-sqlite3
module:
- npm install sqlite3
The node-sqlite3
module allows your app to interact with the SQLite database.
After installing the packages in your project, download the New Zealand migration CSV file with the wget
command:
- wget https://www.stats.govt.nz/assets/Uploads/International-migration/International-migration-September-2021-Infoshare-tables/Download-data/international-migration-September-2021-estimated-migration-by-age-and-sex-csv.csv
The CSV file you downloaded has a long name. To make it easier to work with, rename the file name to a shorter name using the mv
command:
- mv international-migration-September-2021-estimated-migration-by-age-and-sex-csv.csv migration_data.csv
The new CSV filename, migration_data.csv
, is shorter and easier to work with.
Using nano
, or your favorite text editor, open the file:
- nano migration_data.csv
Once open, you will see contents similar to this:
year_month,month_of_release,passenger_type,direction,sex,age,estimate,standard_error,status
2001-01,2020-09,Long-term migrant,Arrivals,Female,0-4 years,344,0,Final
2001-01,2020-09,Long-term migrant,Arrivals,Male,0-4 years,341,0,Final
...
The first line contains the column names, and all subsequent lines have the data corresponding to each column. A comma separates each piece of data. This character is known as a delimiter because it delineates the fields. You are not limited to using commas. Other popular delimiters include colons(:
), semicolons(;
), and tabs(\td
). You need to know which delimiter is used in the file since most modules require it to parse the files.
After reviewing the file and identifying the delimiter, exit your migration_data.csv
file using CTRL+X
.
You have now installed the necessary dependencies for your project. In the next section, you will read a CSV file.
Step 2 — Reading CSV Files
In this section, you will use node-csv
to read a CSV file and log its content in the console. You will use the fs
module's createReadStream()
method to read the data from the CSV file and create a readable stream. You will then pipe the stream to another stream initialized with the csv-parse
module to parse the chunks of data. Once the chunks of data have been parsed, you can log them in the console.
Create and open a readCSV.js
file in your preferred editor:
- nano readCSV.js
In your readCSV.js
file, import the fs
and csv-parse
modules by adding the following lines:
const fs = require("fs");
const { parse } = require("csv-parse");
In the first line, you define the fs
variable and assign it the fs
object that the Node.js require()
method returns when it imports the module.
In the second line, you extract the parse
method from the object returned by the require()
method into the parse
variable using the destructuring syntax.
Add the following lines to read the CSV file:
...
fs.createReadStream("./migration_data.csv")
.pipe(parse({ delimiter: ",", from_line: 2 }))
.on("data", function (row) {
console.log(row);
})
The createReadStream()
method from the fs
module accepts an argument of the filename you want to read, which is migration_data.csv
here. Then, it creates a readable stream, which takes a large file and breaks it into smaller chunks. A readable stream allows you to only read data from it and not write to it.
After creating the readable stream, Node's pipe()
method forwards chunks of data from the readable stream to another stream. The second stream is created when the csv-parse
module's parse()
method is invoked inside the pipe()
method. The csv-parse
module implements a transform stream (a readable and writable stream), taking a data chunk and transforming it to another form. For example, when it receives a chunk like 2001-01,2020-09,Long-term migrant,Arrivals,Female,0-4 years,344
, the parse()
method will transform it into an array.
The parse()
method takes an object that accepts properties. The object then configures and provides more information about the data the method will parse. The object takes the following properties:
-
delimiter
defines the character that separates each field in the row. The value,
tells the parser that commas demarcate the fields. -
from_line
defines the line where the parser should start parsing the rows. With the value2
, the parser will skip line 1 and start at line 2. Because you will insert the data in the database later, this property helps you avoid inserting the column names in the first row of the database.
Next, you attach a streaming event using the Node.js on()
method. A streaming event allows the method to consume a chunk of data if a certain event is emitted. The data
event is triggered when data transformed from the parse()
method is ready to be consumed. To access the data, you pass a callback to the on()
method, which takes a parameter named row
. The row
parameter is a data chunk transformed into an array. Within the callback, you log the data in the console using the console.log()
method.
Before running the file, you will add more stream events. These stream events handle errors and write a success message to the console when all the data in the CSV file has been consumed.
Still in your readCSV.js
file, add the highlighted code:
...
fs.createReadStream("./migration_data.csv")
.pipe(parse({ delimiter: ",", from_line: 2 }))
.on("data", function (row) {
console.log(row);
})
.on("end", function () {
console.log("finished");
})
.on("error", function (error) {
console.log(error.message);
});
The end
event is emitted when all the data in the CSV file has been read. When that happens, the callback is invoked and logs a message that says it has finished.
If an error occurs anywhere while reading and parsing the CSV data, the error
event is emitted, which invokes the callback and logs the error message in the console.
Your complete file should now look like the following:
const fs = require("fs");
const { parse } = require("csv-parse");
fs.createReadStream("./migration_data.csv")
.pipe(parse({ delimiter: ",", from_line: 2 }))
.on("data", function (row) {
console.log(row);
})
.on("end", function () {
console.log("finished");
})
.on("error", function (error) {
console.log(error.message);
});
Save and exit out of your readCSV.js
file using CTRL+X
.
Next, run the file using the node
command:
- node readCSV.js
The output will look similar to this (edited for brevity):
Output[
'2001-01',
'2020-09',
'Long-term migrant',
'Arrivals',
'Female',
'0-4 years',
'344',
'0',
'Final'
]
...
[
'2021-09',
...
'70',
'Provisional'
]
finished
All the rows in the CSV file have been transformed into arrays using the csv-parse
transform stream. Because logging happens each time a chunk is received from the stream, the data appears as though it is being downloaded rather than being displayed all at once.
In this step, you read data in a CSV file and transformed it into arrays. Next, you will insert data from a CSV file into the database.
Step 3 — Inserting Data into the Database
Inserting data from a CSV file into the database using Node.js gives you access to a vast library of modules that you can use to process, clean, or enhance the data before inserting it into the database.
In this section, you will establish a connection with the SQLite database using the node-sqlite3
module. You will then create a table in the database, copy the readCSV.js
file, and modify it to insert all the data read from the CSV file into the database.
Create and open a db.js
file in your editor:
- nano db.js
In your db.js
file, add the following lines to import the fs
and node-sqlite3
modules:
const fs = require("fs");
const sqlite3 = require("sqlite3").verbose();
const filepath = "./population.db";
...
In the third line, you define the path of the SQLite database and store it in the variable filepath
. The database file doesn't exist yet, but it will be needed for node-sqlite3
to establish a connection with the database.
In the same file, add the following lines to connect Node.js to a SQLite database:
...
function connectToDatabase() {
if (fs.existsSync(filepath)) {
return new sqlite3.Database(filepath);
} else {
const db = new sqlite3.Database(filepath, (error) => {
if (error) {
return console.error(error.message);
}
console.log("Connected to the database successfully");
});
return db;
}
}
Here, you define a function named connectToDatabase()
to establish a connection to the database. Within the function, you invoke the fs
module's existsSync()
method in an if
statement, which checks if the database file exists in the project directory. If the if
condition evaluates to true
, you instantiate the SQLite's Database()
class of the node-sqlite3
module with the database filepath. Once the connection is established, the function returns the connection object and exits.
However, if the if
statement evaluates to false
(if the database file doesn't exist), execution will skip to the else
block. In the else
block, you instantiate the Database()
class with two arguments: the database file path and a callback.
The first argument is the path of the SQLite database file, which is ./population.db
. The second argument is a callback that will be invoked automatically when the connection with the database has been established successfully or if an error occurred. The callback takes an error
object as a parameter, which is null
if the connection is successful. Within the callback, the if
statement checks if the error
object is set. If it evaluates to true
, the callback logs an error message and returns. If it evaluates to false
, you log a success message confirming that the connection has been established.
Currently, the if
and else
blocks establish the connection object. You pass a callback when invoking the Database
class in the else
block to create a table in the database, but only if the database file does not exist. If the database file already exists, the function will execute the if
block, connect with the database, and return the connection object.
To create a table if the database file doesn't exist, add the highlighted code:
const fs = require("fs");
const sqlite3 = require("sqlite3").verbose();
const filepath = "./population.db";
function connectToDatabase() {
if (fs.existsSync(filepath)) {
return new sqlite3.Database(filepath);
} else {
const db = new sqlite3.Database(filepath, (error) => {
if (error) {
return console.error(error.message);
}
createTable(db);
console.log("Connected to the database successfully");
});
return db;
}
}
function createTable(db) {
db.exec(`
CREATE TABLE migration
(
year_month VARCHAR(10),
month_of_release VARCHAR(10),
passenger_type VARCHAR(50),
direction VARCHAR(20),
sex VARCHAR(10),
age VARCHAR(50),
estimate INT
)
`);
}
module.exports = connectToDatabase();
Now the connectToDatabase()
invokes the createTable()
function, which accepts the connection object stored in the db
variable as an argument.
Outside the connectToDatabase()
function, you define the createTable()
function, which accepts the connection object db
as a parameter. You invoke the exec()
method on the db
connection object that takes a SQL statement as an argument. The SQL statement creates a table named migration
with 7 columns. The column names match the headings in the migration_data.csv
file.
Finally, you invoke the connectToDatabase()
function and export the connection object returned by the function so that it can be reused in other files.
Save and exit your db.js
file.
With the database connection established, you will now copy and modify the readCSV.js
file to insert the rows that the csv-parse
module parsed into the database.
Copy and rename the file to insertData.js
with the following command:
- cp readCSV.js insertData.js
Open the insertData.js
file in your editor:
- nano insertData.js
Add the highlighted code:
const fs = require("fs");
const { parse } = require("csv-parse");
const db = require("./db");
fs.createReadStream("./migration_data.csv")
.pipe(parse({ delimiter: ",", from_line: 2 }))
.on("data", function (row) {
db.serialize(function () {
db.run(
`INSERT INTO migration VALUES (?, ?, ? , ?, ?, ?, ?)`,
[row[0], row[1], row[2], row[3], row[4], row[5], row[6]],
function (error) {
if (error) {
return console.log(error.message);
}
console.log(`Inserted a row with the id: ${this.lastID}`);
}
);
});
});
In the third line, you import the connection object from the db.js
file and store it in the variable db
.
Inside the data
event callback attached to the fs
module stream, you invoke the serialize()
method on the connection object. The method ensures that a SQL statement finishes executing before another one starts executing, which can help prevent database race conditions where the system runs competing operations simultaneously.
The serialize()
method takes a callback. Within the callback, you invoke the run
method on the db
connection object. The method accepts three arguments:
-
The first argument is a SQL statement that will be passed and executed in the SQLite database. The
run()
method only accepts SQL statements that don't return results. TheINSERT INTO migration VALUES (?, ..., ?
statement inserts a row in the tablemigration
, and the?
are placeholders that are later substituted with the values in therun()
method second argument. -
The second argument is an array
[row[0], ... row[5], row[6]]
. In the previous section, theparse()
method receives a chunk of data from the readable stream and transforms it into an array. Since the data is received as an array, to get each field value, you must use array indexes to access them like[row[1], ..., row[6]]
, etc. -
The third argument is a callback that runs when the data has been inserted or if an error occurred. The callback checks if an error occurred and logs the error message. If there are no errors, the function logs a success message in the console using the
console.log()
method, letting you know that a row has been inserted along with the id.
Finally, remove the end
and error
events from your file. Due to the asynchronous nature of the node-sqlite3
methods, the end
and error
events execute before the data is inserted into the database, so they are no longer required.
Save and exit your file.
Run the insertData.js
file using node
:
- node insertData.js
Depending on your system, it may take some time, but node
should return the output below:
OutputConnected to the database successfully
Inserted a row with the id: 1
Inserted a row with the id: 2
...
Inserted a row with the id: 44308
Inserted a row with the id: 44309
Inserted a row with the id: 44310
The message, especially the ids, proves that the row from the CSV file has been saved into the database.
You can now read a CSV file and insert its content into the database. Next, you will write a CSV file.
Step 4 — Writing CSV Files
In this section, you will retrieve data from the database and write it into a CSV file using streams.
Create and open writeCSV.js
in your editor:
- nano writeCSV.js
In your writeCSV.js
file, add the following lines to import the fs
and csv-stringify
modules and the database connection object from db.js
:
const fs = require("fs");
const { stringify } = require("csv-stringify");
const db = require("./db");
The csv-stringify
module transforms data from an object or array into a CSV text format.
Next, add the following lines to define a variable that contains the name of the CSV file you want to write data to and a writable stream that you will write data to:
...
const filename = "saved_from_db.csv";
const writableStream = fs.createWriteStream(filename);
const columns = [
"year_month",
"month_of_release",
"passenger_type",
"direction",
"sex",
"age",
"estimate",
];
The createWriteStream
method takes an argument of the filename you want to write your stream of data to, which is the saved_from_db.csv
file name stored in the filename
variable.
In the fourth line, you define a columns
variable, which stores an array containing the names of the headers for the CSV data. These headers will be written in the first line of the CSV file when you start writing the data to the file.
Still in your writeCSV.js
file, add the following lines to retrieve data from the database and write each row in the CSV file:
...
const stringifier = stringify({ header: true, columns: columns });
db.each(`select * from migration`, (error, row) => {
if (error) {
return console.log(error.message);
}
stringifier.write(row);
});
stringifier.pipe(writableStream);
console.log("Finished writing data");
First, you invoke the stringify
method with an object as an argument, which creates a transform stream. The transform stream converts the data from an object into CSV text. The object passed into the stringify()
method has two properties:
header
accepts a boolean value and generates a header if the boolean value is set totrue
.columns
takes an array containing the names of the columns that will be written in the first line of the CSV file if theheader
option is set totrue
.
Next, you invoke the each()
method from the db
connection object with two arguments. The first argument is the SQL statement select * from migration
that retrieves the rows one by one in the database. The second argument is a callback invoked each time a row is retrieved from the database. The callback takes two parameters: an error
object and a row
object containing data retrieved from a single row in the database. Within the callback, you check if the error
object is set in the if
statement. If the condition evaluates to true
, an error message is logged in the console using the console.log()
method. If there is no error, you invoke the write()
method on stringifier
, which writes the data into the stringifier
transform stream.
When the each()
method finishes iterating, the pipe()
method on the stringifier
stream starts sending data in chunks and writing it in the writableStream
. The writable stream will save each chunk of data in the saved_from_db.csv
file. Once all the data has been written to the file, console.log()
will log a success message.
The complete file will now look like the following:
const fs = require("fs");
const { stringify } = require("csv-stringify");
const db = require("./db");
const filename = "saved_from_db.csv";
const writableStream = fs.createWriteStream(filename);
const columns = [
"year_month",
"month_of_release",
"passenger_type",
"direction",
"sex",
"age",
"estimate",
];
const stringifier = stringify({ header: true, columns: columns });
db.each(`select * from migration`, (error, row) => {
if (error) {
return console.log(error.message);
}
stringifier.write(row);
});
stringifier.pipe(writableStream);
console.log("Finished writing data");
Save and close your file, then run the writeCSV.js
file in the terminal:
- node writeCSV.js
You will receive the following output:
OutputFinished writing data
To confirm that the data has been written, inspect the contents in the file using the cat
command:
- cat saved_from_db.csv
cat
will return all the rows written in the file (edited for brevity):
Outputyear_month,month_of_release,passenger_type,direction,sex,age,estimate
2001-01,2020-09,Long-term migrant,Arrivals,Female,0-4 years,344
2001-01,2020-09,Long-term migrant,Arrivals,Male,0-4 years,341
2001-01,2020-09,Long-term migrant,Arrivals,Female,10-14 years,
...
You can now retrieve data from the database and write each row in a CSV file using streams.
Conclusion
In this article, you read a CSV file and inserted its data into a database using the node-csv
and node-sqlite3
modules. You then retrieved data from the database and wrote it to another CSV file.
You can now read and write CSV files. As a next step, you can now work with large CSV datasets using the same implementation with memory-efficient streams, or you might look into a package like event-stream
that make working with streams much easier.
To explore more about node-csv
, visit their documentation CSV Project - Node.js CSV package. To learn more about node-sqlite3
, visit their Github documentation. To continue growing your Node.js skills, see the How To Code in Node.js series.