LAS Reading and Writing with PDAL

Author:

Howard Butler

Contact:

howard@hobu.co

Date:

3/27/2017

This tutorial will describe reading and writing ASPRS LAS data with PDAL, discuss the capabilities that PDAL readers.las and writers.las can provide for this format.

Introduction

ASPRS LAS is probably the most commonly used LiDAR format, and PDAL’s support of LAS is important for many users of the library. This tutorial describes and demonstrates some of the capabilities the drivers provide, points out items to be aware of when using the drivers, and hopefully provides some examples you can use to get what you need out of the LAS drivers.

LAS Versions

There are five LAS versions – 1.0 to 1.4. Each iteration added some complexity to the format in terms of capabilities it supports, possible data types it stores, and metadata. Users of LAS must balance the features they need with the use of the data by downstream applications. While LAS support in some form is quite widespread throughout the industry, most applications do not support every feature of each version. PDAL works to provide many of these features, but it is also incomplete. Specifically, PDAL doesn’t support point formats that store waveform data.

Version Example

We can use the minor_version option of writers.las to set the version PDAL should output. The following example will write a 1.1 version LAS file. Depending on the features you need, this may or may not be what you want.

 1[
 2    {
 3        "type" : "readers.las",
 4        "filename" : "input.las"
 5    },
 6    {
 7        "type" : "writers.las",
 8        "minor_version": 1,
 9        "filename" : "output.las"
10    }
11]

Note

PDAL defaults to writing a LAS 1.2 version if no minor_version is specified or the forward option of writers.las is not used to carry along a version from a previously read file.

Spatial Reference System

LAS 1.0 to 1.3 use GeoTIFF keys for storing coordinate system information, while LAS 1.4 uses Well Known Text. GeoTIFF is well-supported by most software that read LAS, but it is not possible to express some coordinate system specifics with GeoTIFF. WKT is supports more coordinate systems than GeoTIFF, but vendor-specific and later versions (WKT 2) may not be handled well.

Assignment Example

The PDAL writers.las allows you to override or assign the coordinate system to an explicit value if you need. Often the coordinate system defined by a file might be incorrect or non-existent, and you can set this with PDAL.

The following example sets the a_srs option of the writers.las to EPSG:4326.

 1[
 2    {
 3        "type" : "readers.las",
 4        "filename" : "input.las"
 5    },
 6    {
 7        "type" : "writers.las",
 8        "a_srs": "EPSG:4326",
 9        "filename" : "output.las"
10    }
11]

Note

Remember to set offset_x, offset_y, scale_x, and scale_y values to something appropriate if your are storing decimal degree data in LAS files. The special value auto can be used for the offset values, but you should set an explicit value for the scale values to prevent overdriving the precision of the data and disrupting Compression with LASzip.

Vertical Datum Example

Vertical coordinate control is important in LiDAR and PDAL supports assignment and reprojection/transform of vertical coordinates using Proj.4 and GDAL. The coordinate system description magic happens in GDAL, and you assign a compound coordinate system (both vertical and horizontal definitions) using the following syntax:

EPSG:4326+3855

This assignment states typical 4326 horizontal coordinate system plus a vertical one that represents EGM08. In Well Known Text, this coordinate system is described by:

$ gdalsrsinfo "EPSG:4326+3855"
COMPD_CS["WGS 84 + EGM2008 geoid height",
    GEOGCS["WGS 84",
        DATUM["WGS_1984",
            SPHEROID["WGS 84",6378137,298.257223563,
                AUTHORITY["EPSG","7030"]],
            AUTHORITY["EPSG","6326"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4326"]],
    VERT_CS["EGM2008 geoid height",
        VERT_DATUM["EGM2008 geoid",2005,
            AUTHORITY["EPSG","1027"],
            EXTENSION["PROJ4_GRIDS","egm08_25.gtx"]],
        UNIT["metre",1,
            AUTHORITY["EPSG","9001"]],
        AXIS["Up",UP],
        AUTHORITY["EPSG","3855"]]

As in Assignment Example, it is common to need to reassign the coordinate system. The following example defines both the horizontal and vertical coordinate system for a file to UTM Zone 15N NAD83 for horizontal and NAVD88 for the vertical.

 1[
 2    {
 3        "type" : "readers.las",
 4        "filename" : "input.las"
 5    },
 6    {
 7        "type" : "writers.las",
 8        "a_srs": "EPSG:26915+5703",
 9        "filename" : "output.las"
10    }
11]

Note

Any coordinate system description format supported by GDAL’s SetFromUserInput method can be used to assign or set the coordinate system in PDAL. This includes WKT, Proj.4 definitions, or OGC URNs. It is your responsibility, however, to escape or massage any input data to make it be valid JSON.

Reprojection Example

A common desire is to transform the coordinates of an ASPRS LAS file from one coordinate system to another. The mechanism to do that with PDAL is filters.reprojection.

 1[
 2    {
 3        "type" : "readers.las",
 4        "filename" : "input.las"
 5    },
 6    {
 7        "type":"filters.reprojection",
 8        "out_srs":"EPSG:26915"
 9    },
10    {
11        "type" : "writers.las",
12        "filename" : "output.las"
13    }
14]

Note

If the input data doesn’t specify a projection, you must specify the in_srs option of filters.reprojection. in_srs can also be used to override an existing spatial reference attached to the input point set.

Point Formats

As each revision of LAS was released, more point formats were added. A point format is the fixed set of dimensions that a LAS file stores for each point in the file. For any point format, the size and composition of dimensions is consistent across versions, but users should be aware of some minor interpretation changes based on LAS file version. For example, a classification value of 11 in version 1.4 indicates “Road Surface”, while that value is reserved in version 1.1.

Point Format Example

Point format or dataformat_id is an integer that defines the set of fixed dimensions stored for each point in a LAS file. All point formats specify the following dimensions as part of a point record:

Base LAS Dimensions

X

Y

Z

Intensity

ReturnNumber

NumberOfReturns

ScanDirectionFlag

EdgeOfFlightLine

Classification

ScanAngleRank

UserData

PointSourceId

Because LAS files have no built-in compression, it’s important to use a point format that stores the fewest fields possible that store the desired data. For example, point format 10 uses 45 more bytes per point than point format zero.

If one wanted remove the Red/Green/Blue fields from a LAS file (one using point format 2), one could simply set the dataformat_id option to 0. The forward option can also be set to carry forward all possible header values from the source file to the new, smaller file.

 1[
 2    {
 3        "type" : "readers.las",
 4        "filename" : "input.las"
 5    },
 6    {
 7        "type" : "writers.las",
 8        "forward": "all",
 9        "dataformat_id": 0,
10        "filename" : "output.las"
11    }
12]

Note

The LASzip storage of GPSTime and Red/Green/Blue fields with no data is perfectly efficient.

Extra Dimensions

A LAS Point Format ID defines the fixed set of dimensions a file must store, but programs are allowed to store extra data beyond that fixed set. This feature of the format was regularized in LAS 1.4 as something called “extra bytes” or “extra dims”, but previous versions can also store these extra per-point attributes.

Extra Dimension Example

LAS 1.4 provides for the storage of dimensions not part of the chosen point format by appending them to each point record. PDAL supports this feature when writing files with the “extra_dims” option. The following example will store all source dimensions in the output file and place a description of the dimensions that aren’t part of the point format in an “extra bytes” VLR:

1[
2    "some_non_las_file",
3    {
4        "type" : "writers.las",
5        "extra_dims": "all",
6        "minor_version" : "4",
7        "filename" : "output.las"
8    }
9]

Required Header Fields

Readers of the ASPRS LAS Specification will see there are many fields that softwares are required to write, with their content mandated by various options and configurations in the format. PDAL does not assume responsibility for writing these fields and coercing meaning from the content to fit the specification. It is the PDAL users’ responsibility to do so. Fields where this might matter include:

  • project_id

  • global_encoding

  • system_id

  • software_id

  • filesource_id

Header Fields Example

The “forward” option of writers.las is the easiest way to get most of what you might want in terms of header settings copied from an input to an output file upon processing. Imagine the scenario of zero’ing out the classification values for an LAS file in preparation for using filters.pmf to reassign them. During this scenario, we’d like to keep all of the other LAS header information, such as Variable Length Records, extent information, and format settings.

 1[
 2    {
 3        "type" : "readers.las",
 4        "filename" : "input.las"
 5    },
 6    {
 7        "type" : "filters.assign",
 8        "assignment" : "Classification[0:32]=0"
 9    },
10    {
11        "type" : "filters.pmf",
12        "cell_size" : 2.5,
13        "approximate" : false,
14        "max_distance" : 25
15    },
16    {
17        "type" : "writers.las",
18        "forward": "all",
19        "filename" : "output.las"
20    }
21]

Note

If multiple input LAS files are being written to an output file, the forward option can only preserve values when they are the same in all input files. If the values differ, a default will be used (as it would if the forward option weren’t supplied). You can specify specific option values for output that will also override any forwarded data.

Coordinate Scaling

LAS stores coordinates as 32 bit integers. It is the user’s responsibility to ensure that the coordinate domain required by the data in the file fits within the 32 bit integer domain. Most coordinate values have digits to the right of the decimal point that must be preserved for sufficient accuracy. Using the scale factor allows for integers to be interpreted as floating point values when read by software.

When writing data to LAS, choosing an appropriate scale factor should take into account not just the maximum precision that can be accommodated by the format, but the actual precision of the data. Using a precision greater than the resolution of the data collection can mislead users as to the actual measurement precision of the data. In addition, it can lead to larger files when writing compressed data with LASzip.

Auto Offset Example

Users can allow PDAL select scale and offset values for data with the auto option. This can have some detrimental effects on downstream processing. auto for scale values will use the entire 32-bit integer domain. This maximizes the precision available to store the data, but this will have a detrimental effect on LASzip storage efficiency. auto for offset calculation is just fine, however. When given the option, choose to store ASPRS LAS data with an explicit scale for the X, Y, and Z dimensions that represents actual expected data precision, not artificial storage precision or maximal storage precision.

 1[
 2    {
 3        "type" : "readers.las",
 4        "filename" : "input.las"
 5    },
 6    {
 7        "type" : "writers.las",
 8        "scale_x":"0.0000001",
 9        "scale_y":"0.0000001",
10        "scale_z":"0.01",
11        "offset_x":"auto",
12        "offset_y":"auto",
13        "offset_z":"auto",
14        "filename" : "output.las"
15    }
16]

Compression

LASzip is an open source, lossless compression technique for ASPRS LAS data. It is supported by two different software libraries, and it can be used in both the C/C++ and the JavaScript execution environments. LAZ support is provided by both readers.las and writers.las. It can be enabled by setting the compression option to laszip.

Compression Example

Providing a filename with a .laz extension will write compressed data. Compression can be turned on explicitly as well:

 1[
 2    {
 3        "type" : "readers.las",
 4        "filename" : "input.las"
 5    },
 6    {
 7        "type" : "writers.las",
 8        "compression":"laszip",
 9        "filename" : "output.laz"
10    }
11]

Variable Length Records

Variable Length Records, or VLRs, are binary data that the LAS format supports to allow applications to store their own data. Coordinate system information is one type of data stored in VLRs, and many different LAS-using applications store data and metadata with this format capability. PDAL allows users to access VLR information, forward it along to newly written files, and create VLRs that store processing history information.

Common VLR data include:

  • Coordinate system

  • Metadata

  • Processing history

  • Indexing

Note

There are VLRs that are defined by the specification, and they have the VLR user_id of LASF_Spec or LASF_Projection. LASF_Spec VLRs provide a description of the data beyond that available in the header. LASF_Projection VLRs store the spatial coordinate system of the data.

For LAS 1.0-1.3, the VLR length could be no larger than 65535 bytes. Version 1.4 introduced extended VLRs, stored at the end of the file, which could be up to 4gb in size.

VLR Example

You can add your own VLRs to files to store processing information or whatever you want by providing a JSON block via writers.las vlrs option that defines the user_id and data items for the VLR. The data option must be base64-encoded string output. The data will be converted to binary information and stored in the VLR when the file is written.

[
    "input.las",
    {
        "type":"writers.las",
        "filename":"output.las",
        "vlrs": [   {
                      "description": "A description under 32 bytes",
                      "record_id": 42,
                      "user_id": "hobu",
                      "data": "dGhpcyBpcyBzb21lIHRleHQ="
                     },
                     {
                      "description": "A description under 32 bytes",
                      "record_id": 43,
                      "user_id": "hobu",
                      "data": "dGhpcyBpcyBzb21lIG1vcmUgdGV4dA=="
                      }
                    ]
    }
]

PDAL Metadata

The writers.las driver supports an option, pdal_metadata, that writes two PDAL VLRs to LAS files. The first is the equivalent of info’s --metadata output. The second is a copy of the output of the --pipeline serialization option that describes all stages and options of the pipeline that created the file. These two VLRs may be useful in tracking down processing history of data, allow you to determine which versions of PDAL may have written a file and what filter options were set when it was written, and give you the ability to store metadata and other information via pipeline user_data from your own applications.

Metadata Example

The pipeline used to construct the file and all of its Metadata can be written into VLRs in ASPRS LAS files under the PDAL VLR key.

 1[
 2    {
 3        "type" : "readers.las",
 4        "filename" : "input.las"
 5    },
 6    {
 7        "type" : "writers.las",
 8        "pdal_metadata":"true",
 9        "filename" : "output.laz"
10    }
11]

Warning

LAS versions prior to 1.4 only support VLRs of at most 64K of information. It is possible, though improbable, that the metadata or pipeline stored in the VLRs will not fit in that space.