Skip to content

Commit e98cf74

Browse files
committed
Add project around file parsing
Fixes #82
1 parent b4ae9be commit e98cf74

File tree

7 files changed

+76
-0
lines changed

7 files changed

+76
-0
lines changed

projects/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ Most exercises finish with a list of optional extension tasks. It's highly recom
2020

2121
1. [CLI & Files](./cli-files)
2222
<br>An introduction to building things with Go by replicating the unix tools `cat` and `ls`.
23+
1. [File Parsing](./file-parsing)
24+
<br>Practice parsing different formats of files, both standard and custom.
2325
1. [Servers & HTTP requests](./http-auth)
2426
<br>Learn about long-running processes, HTTP and `curl`
2527
1. [Servers & Databases](./server-database)

projects/file-parsing/README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
<!--forhugo
2+
+++
3+
title="File Parsing"
4+
+++
5+
forhugo-->
6+
7+
In this project we'll practice parsing data from files in different formats. Often times it's convenient to write a small program to understand or process some data, and being comfortable quickly doing so can be very helpful.
8+
9+
Timebox: 2 days
10+
11+
## Objectives:
12+
13+
- Write code to parse data in non-standard formats.
14+
- Become comfortable leveraging libraries to parse standard formats in slightly non-standard ways.
15+
16+
## Project
17+
18+
The `examples` sub-directory in this directory contains a number of data files, each of which contains the same data in a different format. Descriptions of each format can be found below. Each data file contains a data-set of player names, and their high scores, for some game.
19+
20+
We are going to write a program to analyse the data, and print out the names of the players with the highest and lowest scores.
21+
22+
### Data formats
23+
24+
### JSON
25+
26+
json.txt contains an array of JSON objects, each with a "name" and "high_score" key.
27+
28+
### Repeated JSON
29+
30+
repeated-json.txt contains lines of data stored in JSON format. Each line contains exactly one record, stored as an object. Lines starting with # are comments and should be ignored.
31+
32+
### CSV
33+
34+
data.csv is a standard CSV file. The format is well-documented online, and there are many libraries which support parsing it.
35+
36+
### Custom Binary
37+
38+
There are two files in a custom binary serialisation format. The format is as follows:
39+
* First two bytes of the file indicate endianness of numbers. If the bytes are FE FF, numbers in the file are stored in big endian byte order. If the bytes are FF FE, numbers in the file are stored in little endian byte order.
40+
* Each record contains exactly four bytes representing the score as a signed 32-bit integer, in the above described endian format, then the name of the player stored in UTF-8 which may not contain a null character, followed by a null terminating character.
41+
42+
The tool `od` can be useful for exploring binary data. For instance, we can run:
43+
44+
```console
45+
% od -t x1 projects/file-parsing/examples/custom-binary-le.bin
46+
0000000 ff fe 0a 00 00 00 41 79 61 00 1e 00 00 00 50 72
47+
0000020 69 73 68 61 00 ff ff ff ff 43 68 61 72 6c 69 65
48+
0000040 00 19 00 00 00 4d 61 72 67 6f 74 00
49+
0000054
50+
```
51+
52+
This prints each byte of the file, one at a time, represented as hexidecimal digits.
53+
54+
We can see in this example that the first byte is ff and the second is fe - according to our file format specification, that suggests the numbers here are stored in little endian byte order.
55+
56+
We can see the next four bytes contain 0a then three 00s, then three non-null bytes, then a null byte.
44 Bytes
Binary file not shown.
44 Bytes
Binary file not shown.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
name,high score
2+
Aya,10
3+
Prisha,30
4+
Charlie,-1
5+
Margot,25
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
[
2+
{"name": "Aya", "high_score": 10},
3+
{"name": "Prisha", "high_score": 30},
4+
{"name": "Charlie", "high_score": -1},
5+
{"name": "Margot", "high_score": 25}
6+
]
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# This file contains lines of data stored in JSON format. Each line contains exactly one record stored as an object.
2+
# Lines starting with # are comments and should be ignored.
3+
{"name": "Aya", "high_score": 10}
4+
{"name": "Prisha", "high_score": 30}
5+
# Charlie didn't do fantastically :(
6+
{"name": "Charlie", "high_score": -1}
7+
{"name": "Margot", "high_score": 25}

0 commit comments

Comments
 (0)