-
Notifications
You must be signed in to change notification settings - Fork 2
New datasets for marketing use cases and Mathesar inclusion #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
008df48
to
cdffb28
Compare
21ed3dc
to
219368e
Compare
9ea35a0
to
af9378c
Compare
af9378c
to
0ff30db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: I did not review:
- The python code, since it's not that relevant in this use-case
- The actual data in the SQL files.
Given our time constraints, and the size/scope of this PR, I instead restricted my attention to the data schemas.
The most important concern I have is that you cannot use DROP SCHEMA ... CASCADE;
in scripts being run against user databases. I suppose they're okay here in the data playground, but it makes me nervous that we'll forget one when copying these over.
Please use
id INTEGER PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
for id
columns rather than the current definition. (Note that it's generated by default, not always)
Please use Mathesar's custom types where possible. Also, consider adding a Website
column (or similar) somewhere, to use the mathesar_types.uri
type.
When dumping these data sets, please use the --inserts
flag to dump as INSERT
statements rather than COPY
. While it is possible to wire these up in python as COPY
statements, it doesn't work in the form output by pg_dump
(i.e., the form in these files). INSERT
statements are also more flexible in general, if we use these data sets in other places.
We should not encourage using the WITH TIME ZONE
date/time types so much. They don't do what you think. I feel like I keep having the TZ conversation... (though it's the first time with you, @zackkrida , haha). While we could have a column or two that demonstrates the behavior, the TZs are used all over these data sets.
You need to reset all sequences for all tables after inserting data. I'm getting insert errors on these data sets when adding rows in Mathesar:
UniqueViolation: duplicate key value violates unique constraint "Transactions_pkey" DETAIL: Key (id)=(5) already exists. CONTEXT: SQL statement " WITH insert_cte AS (INSERT INTO "Hardware Store"."Transactions" (asset_id, customer_id, transaction_type, transaction_date, total_charge, note) VALUES ('7', '7', 'Sale', '2025-01-17 15:00:00 +08:00', NULL, NULL) RETURNING id) SELECT * FROM insert_cte " PL/pgSQL function msar.add_record_to_table(oid,jsonb,boolean,jsonb) line 17 at EXECUTE
Finally, if we're getting rid of the Library Management
data set (I don't think we should at this point), we should increase the scale of one of these data sets to at least show how Mathesar deals with pagination, etc.
I made many more small notes throughout.
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY, | ||
first_name TEXT NOT NULL, | ||
last_name TEXT NOT NULL, | ||
email TEXT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the mathesar_types.email
type here.
SET search_path = "Bike Shop"; | ||
|
||
CREATE TABLE "Customers" ( | ||
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using this definition to align with our standard id
columns.
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY, | |
id INTEGER PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, |
equipment_id BIGINT REFERENCES "Equipment" (id), | ||
mechanic_id BIGINT REFERENCES "Mechanics" (id), | ||
request_description TEXT NOT NULL, | ||
cost NUMERIC(10, 2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cost NUMERIC(10, 2), | |
cost mathesar_types.mathesar_money, |
@@ -0,0 +1,53 @@ | |||
DROP SCHEMA IF EXISTS "Bike Shop" CASCADE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a DROP SCHEMA ... CASCADE;
statement in these scripts in the actual app can (and absolutely will, given enough users) lead to user data loss.
@@ -0,0 +1,56 @@ | |||
DROP SCHEMA IF EXISTS "Hardware Store" CASCADE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No schema dropping please.
CREATE TABLE "Patrons" ( | ||
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY, | ||
name TEXT NOT NULL, | ||
email TEXT UNIQUE NOT NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mathesar_types.email
, por favor.
FROM "Equipment Training" | ||
WHERE "Equipment Training".patron_id = NEW.patron_id | ||
AND "Equipment Training".equipment_id = NEW.equipment_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only works if the INSERT
query is run with the search_path
set to "Library Makerspace"
. For example, if you're in the public
schema and try
INSERT INTO "Library Makerspace"."Jobs" ...
breakage occurs. This is particularly bad in Mathesar, since we pretty much always operate in that fashion.
CREATE TABLE "Item_Collections" ( | ||
item_id BIGINT NOT NULL REFERENCES "Items" (id) ON DELETE CASCADE, | ||
collection_id BIGINT NOT NULL REFERENCES "Collections" (id) ON DELETE CASCADE, | ||
PRIMARY KEY (item_id, collection_id) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't currently support tables without a single-column primary key. We should, therefore, avoid that situation in demo data.
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY, | ||
name TEXT NOT NULL, | ||
description TEXT, | ||
amount NUMERIC(12, 2) NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mathesar_types.mathesar_money
please.
allocated_amount NUMERIC(12, 2) NOT NULL, | ||
spent_amount NUMERIC(12, 2) DEFAULT 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mathesar_types.mathesar_money
@zackkrida I talked to @mathemancer and he's going to take over the remaining work here. |
This pull request introduces six sample datasets under the
beta_use_cases
directory. These datasets were created to support marketing copy and produce screenshots for an upcoming redesign of the https://mathesar.org site.Testing instructions
To test this PR, I would recommend looking at the README markdown previews in GitHub or an editor like VSCode to get a feel for each schema via its corresponding mermaid ER diagram, and then loading them into a local mathesar instance. Here's how you could load them all quickly after
cd beta_use_cases
locally:The python code here to generate data is pretty inconsequential. I wrote one and copy-pasted for the rest. I think it's fine but I'm not someone who writes Python daily so suggestions for best, idiomatic practices are welcomed.