Releases: mitdbg/palimpzest
Releases · mitdbg/palimpzest
1.5.3: Added policy parametrization from kwargs (#301)
* Added policy parametrization from kwargs * Added policy parametrization from kwargs * bump version --------- Co-authored-by: Matthew Russo <mdrusso@mit.edu>
1.5.2: Bugfix: mock method to increment costs (#298)
* Added mock method to increment costs * bump version --------- Co-authored-by: Matthew Russo <mdrusso@mit.edu>
1.5.1: Security Patch for LiteLLM (#296)
* Fix broken dependencies (#227)
* Move DataRecord Internal Fields to Have Leading Underscore (#229)
* update README
* 1. support add_columns in Dataset; 2. support run().to_df(); 3. add demo in df-newinterface.py (#78)
* Support add_columns in Dataset. Support demo in df-newinterface.py
Currently we have to do
records, _ = qr3.run()
outputDf = DataRecord.to_df(records)
I'll try to make qr3.run().to_df() work in another PR.
* ruff check --fix
* Support run().to_df()
Update run() to DataRecordCollection, so that it will be easier for use to support more features for run() output.
We support to_df() in this change.
I'll send out following commits to update other demos.
* run check --fix
* fix typo in DataRecordCollection
* Update records.py
* fix tiny bug in mab processor.
The code will run into issue if we don't return any stats for this function in
```
max_quality_record_set = self.pick_highest_quality_output(all_source_record_sets)
if (
not prev_logical_op_is_filter
or (
prev_logical_op_is_filter
and max_quality_record_set.record_op_stats[0].passed_operator
)
```
* update record.to_df interface
update to record.to_df(records: list[DataRecord], project_cols: list[str] | None = None) which is consistent with other function in this class.
* Update demo for the new execute() output format
* better way to get plan from output.run()
* fix getting plan from DataRecordCollection.
people used to get plan from execute() of streaming processor, which is not a good practice.
I update plan_str to plan_stats, and they need to get physical plan from processor.
Consider use better ways to provide executed physical plan to DataRecordCollection, possibly from stats.
* Update df-newinterface.py
* update code based on comments from Matt.
1. add cardinality param in add_columns
2. remove extra testdata files
3. add __iter__ in DataRecordCollection to help iter over streaming output.
* see if copilot just saved me 20 minutes
* fix package name
* use sed to get version from pyproject.toml
* bump project version; keep docs behind to test ci pipeline
* bumping docs version to match code version
* use new __iter__ method in demos where possible
* add type hint for output of __iter__; use __iter__ in unit tests
* Update download-testdata.sh (#89)
Added enron-tiny.csv
* Clean up the retrieve API (#79)
* Clean up the retrieve operator interface
* fix comments
* Update to the new to_df() API
* Code update for https://github.com/mitdbg/palimpzest/issues/84 (#101)
* Create chat.rst (#96)
* Create chat.rst
* Update pyproject.toml
Hotfix for chat
* Update conf.py
Hotfix for chat.rst
* code update for https://github.com/mitdbg/palimpzest/issues/84
This implementation basically resolves https://github.com/mitdbg/palimpzest/issues/84.
One implementation is different from the #84:
.add_columns(
cols=[
{"name": "sender", "type": "string", "udf": compute_sender},
...
]
)
If add_columns() uses cols, udf, types as params, it will make this function confusing again. Instead, if users need to specify different udfs for different columns, they should just call add_columns() multiple times for different columns.
* changed types to make use of Python type system; updated use of types in tests; updated docs and README
* update test to match no longer allowing None default
---------
Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com>
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* Skip an operator if this is a duplicate op instead of raise error (#102)
* Create chat.rst (#96)
* Create chat.rst
* Update pyproject.toml
Hotfix for chat
* Update conf.py
Hotfix for chat.rst
* Skip an operator when it doesn't need any logicalOP instead of raise error
#Final Effects
1. Dataset() init only has one responsibility: wrap a datasource to a Dataset. I think this is a better interface.
2. No extra convert() will be added to the plan.
3. When users add the same op multiple times dataset.convert(File).convert(File), the system will just dedup the same op instead of raise error.
#Issue
Currently Dataset(src, schema) initiation has 2 responsibilities:
1. read source
2. convert source to schema.
When we use default schema for Dataset init(source, schema=DefaultSchema) for users, the code works like:
1. Read source to schema that DataSource provides. This schema is derived by system, so the users don't know (don't need to know).
2. Convert Source schema to DefaultSchema.
So everytime, the system will make one more convert call to convert SourceSchema to DefaultSchema, which is definitely wrong.
#Solution
1. We use schema from Datasource if exists, which is reasonable.
2. If we do 1, then we'll get a dataset node that no actual op as its input_schema ==output_schema, so I updated a line in optimizer to just skip the node if it doesn't do anything instead raiseerror.
#Real Examples
##Before
Generated plan:
0. MarshalAndScanDataOp -> PDFFile
1. PDFFile -> LLMConvertBonded -> DefaultSchema
(contents, filename, text_conte) -> (value)
Model: Model.GPT_4o
Prompt Strategy: PromptStrategy.COT_QA
2. DefaultSchema -> MixtureOfAgentsConvert -> ScientificPaper
(value) -> (contents, filename, paper_auth)
Prompt Strategy: None
Proposer Models: [GPT_4o]
Temperatures: [0.0]
Aggregator Model: Model.GPT_4o
Proposer Prompt Strategy: chain-of-thought-mixture-of-agents-proposer
Aggregator Prompt Strategy: chain-of-thought-mixture-of-agents-aggregation
3. ScientificPaper -> LLMFilter -> ScientificPaper
(contents, filename, paper_auth) -> (contents, filename, paper_auth)
Model: Model.GPT_4o
Filter: The paper mentions phosphorylation of Exo1
4. ScientificPaper -> MixtureOfAgentsConvert -> Reference
(contents, filename, paper_auth) -> (reference_first_author, refere)
Prompt Strategy: None
Proposer Models: [GPT_4o]
Temperatures: [0.8]
Aggregator Model: Model.GPT_4o
Proposer Prompt Strategy: chain-of-thought-mixture-of-agents-proposer
Aggregator Prompt Strategy: chain-of-thought-mixture-of-agents-aggregation
##After
Generated plan:
0. MarshalAndScanDataOp -> PDFFile
1. PDFFile -> LLMConvertBonded -> ScientificPaper
(contents, filename, text_conte) -> (contents, filename, paper_auth)
Model: Model.GPT_4o
Prompt Strategy: PromptStrategy.COT_QA
2. ScientificPaper -> LLMFilter -> ScientificPaper
(contents, filename, paper_auth) -> (contents, filename, paper_auth)
Model: Model.GPT_4o
Filter: The paper mentions phosphorylation of Exo1
3. ScientificPaper -> MixtureOfAgentsConvert -> Reference
(contents, filename, paper_auth) -> (reference_first_author, refere)
Prompt Strategy: None
Proposer Models: [GPT_4o]
Temperatures: [0.8]
Aggregator Model: Model.GPT_4o
Proposer Prompt Strategy: chain-of-thought-mixture-of-agents-proposer
Aggregator Prompt Strategy: chain-of-thought-mixture-of-agents-aggregation
* make equality check for new field names a bit more explicit
* fix fixture usage
* update all plans within code base to explicitly convert when needed; and removed unnecessary schemas for reading from datasource
---------
Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com>
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* Refactor demos to use .sem_add_columns or .add_columns instead of convert(), remove Schema from demos when possible. (#104)
* Create chat.rst (#96)
* Create chat.rst
* Update pyproject.toml
Hotfix for chat
* Update conf.py
Hotfix for chat.rst
* code update for https://github.com/mitdbg/palimpzest/issues/84
This implementation basically resolves https://github.com/mitdbg/palimpzest/issues/84.
One implementation is different from the #84:
.add_columns(
cols=[
{"name": "sender", "type": "string", "udf": compute_sender},
...
]
)
If add_columns() uses cols, udf, types as params, it will make this function confusing again. Instead, if users need to specify different udfs for different columns, they should just call add_columns() multiple times for different columns.
* use field_values instead of field_types as field_values have the actual values,
use field_values instead of field_types as field_values have the actual values, since field_values have the actual key-value pairs, while field_types are just contain fields and their types.
records[0].schema is the schema of the output, which doesn't mean we already populate the schema into record.
* Remove .convert() and use .sem_add_columns or .add_columns instead
This change is based on #101 and #102, please review them first then this change.
1. This is to refactor all demos to use .sem_add_columns or .add_columns, and remove .convert().
2. Remove Schema from demos, except demos using ValidationDataSource and dataset.retrieve() that need schema now. We can refactor these cases later.
* ruff check --fix
* fix unittest
* demos fixed and unit tests running
* fix add_columns --> sem_add_columns in demo
* udpate quickstart to reflect code changes; shorten text as much as possible
* passing unit tests
* remove convert() everywhere
* fixes to correct errors in demos; update quickstart and docs
---------
Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com>
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* Simplify Datasource (#103)
## Summary of PR changes
**Note 1:** I did not change anything related to val_datasource (including tangential functions like Dataset._set_data_source()) as that will all be modified in a subsequent PR to reflect our discussion re: validation data.
**Note 2:** I have completely commented out datamanager.py and config.py; for now I am willing to leave the code around in case we desperately need it for PalimpChat. However, my hope is that PalimpChat can be tweaked to work without the data manager and those files can be deleted before merging dev into main
**Note 3:** Despite the branch name, fixing the progress managers will be part of a separate PR.
- Collapsed all four `DataSource` classes down to a single `DataReader` class
- Limit the number of methods the user needs to implement to just `__len__()` and `__getitem__()`
- (Switched from using `get_item() --> __getitem__()` in `DataReader`)
- Provided `DataReader` directly to scan operators (also renamed `DataSourcePhysicalOp --> ScanPhysicalOp`
- Removed `DataDirectory()` from `src/` entirely; this included commenting out things which made use of the cache (e.g. caching computed `DataRecords` and codegen examples)
- Got rid of `dataset_id` everywhere (which tracks with the previous bullet)
- Removed the `Config` class which was a relic of a bygone era (and also intertwined with the `DataDirectory()`)
- Updated all demos to use `import palimpzest as pz` to make the import statement(s) more welcoming
- Fixed one bug resulting from converts now producing union schemas. Instead of including the `output_schema` in an operators' `get_id_params()` we simply report the `generated_fields`.
- Changed `source_id --> source_idx` everywhere (this eliminated some weird renaming logic)
- Finally, I added a large set of documentation for the DataSource class(es)
* Multi-LLM Refinement Pipeline for Query Output Validation (#118)
* Multi-LLM Refinement Pipeline for Query Output Validation (#92)
## Summary of PR
This PR contains the work to add a new `CriticConvert` physical operator to PZ. At a high-level, this operator runs a bonded convert, and then asks a critic model if the answer produced by the bonded convert can be improved upon. The original output and the critique are then fed into a refinement model, which produces the improved output.
The work to implement this includes:
1. Defining the physical operator in `src/palimpzest/query/operators/critique_and_refine_convert.py`
2. Adding an implementation rule for this physical operator in `src/palimpzest/query/optimizer/rules.py`
3. Adding boolean flag(s) to enable allowing / disallowing this physical optimization
4. Adding base prompts for the critique and refinement generations
One other change which this work spawned was an attempt to improve the management and construction of our prompts -- and to decouple this logic from the `BaseGenerator` class. On the management side, I split our single `prompts.py` file into a set of files. On the construction side, I created a `PromptFactory` class which templates prompts based on the `prompt_strategy` and input record.
The `PromptFactory` is not a perfect solution, but I think it is a step in the right direction.
Finally, I fixed an error which previously filtered out `RAGConvert` operators from being considered by the `Optimizer`, and I made 2-3 more miscellaneous small tweaks.
---------
Co-authored-by: Yash Agarwal <yash94404@gmail.com>
Co-authored-by: Yash Agarwal <yashaga@Yashs-Air.attlocal.net>
* MkDocs Site for Palimpzest API Documentation (#116)
## Summary of PR Changes
1. Changed `docs` to use [MkDocs](https://www.mkdocs.org/) instead of Sphinx
2. Created initial `Getting Started` content
3. Created placeholders for `User Guide` content (to follow in a subsequent PR)
4. Added autogenerated docs for our most user-facing code (we will need to add docstrings to our code in a subsequent PR)
5. Made small tweaks to `src/` to allow users to specify policy using kwargs in `.run()`
6. Renamed the `testdata/enron-tiny/` files so that they're not so damn weird
---------
Co-authored-by: Yash Agarwal <yash94404@gmail.com>
Co-authored-by: Yash Agarwal <yashaga@Yashs-Air.attlocal.net>
* remove registration of sources from CI; only check version bump if there is a code change
* remove filter for only checking version bump when src files changed
* Rename `nocache` --> `cache` everywhere (#128)
* first commit
* Removed myenv
* added to git ignore
* addressed the comments in review
* flip one minor comment
* minor spacing fix
* fix spaces in a few more spots
---------
Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-207-160.dyn.MIT.EDU>
Co-authored-by: muhamed <muhamed@mit.edu>
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* adding citation (and making 'others' explicit) (#136)
* Make Generator thread-safe (#139)
* fix moa prompt
* fix moa prompt aggregator
* update version
* make generator thread-safe
* update generator to return messages
* address comments
* Begin Process of Improving Index Abstraction(s) in PZ (#138)
* quick and dirty implementation which tracks retrieve costs
* bug fixes and currently unused index code
* add default search func which I forgot to implement and add chromadb to pyproject.toml
* leaving TODO
* hotfix to add cost for retrieve operation
* another hotfix to add ragatouille dependency
* Add logger for PZ (#134)
* add logger for PZ
1. When verbose=True, we save all logs to log_file and print them on console;
2. when verbose=False, we only save ERROR+ log to file and print ERROR+.
I just add logging to somewhere I think might be important for the execution, we always can add/remove for more or less.
Also I might update the logging message based on my later annotation work. But this PR should setup the logging mechanism for now.
* ruff fix
* update code based on comments
1. not logging output_records
2. not logging plan_stats
3. make the files to ".pz_logs"
---------
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* fix merge bug (#141)
* ruff fix
* update log dir and fix tiny bug
* fix merge bug
* Use a singleton API client for operators (#140)
* fix moa prompt
* fix moa prompt aggregator
* update version
* make generator thread-safe
* update generator to return messages
* address comments
* create a singleton API client
* fix linting
* fix logging in generators
* also create parent dir. if missing
* CUAD benchmark (#143)
* fix moa prompt
* fix moa prompt aggregator
* update version
* make generator thread-safe
* update generator to return messages
* address comments
* create a singleton API client
* fix linting
* fix logging in generators
* fix CUAD benchmarlk
* fix type
* minor fixes
* Limit the Scope of Logging within the Optimizer (#144)
* making it possible to set log level based on env. variable; adding time limit on seven filters test
* deleting instead of commenting out
* Remove Conventional LLM Convert; Update Bonded LLM Convert retry logic (#145)
* use NullHandler in __init__ and let application control logging config (#146)
* use NullHandler in __init__ and let application control logging config
* ruff fix
* Fix Progress Manager and Simplify `execute_plan` methods (#148)
* modifying ProgressManager class to allow for dynamically adding tasks
* beginning to use new progress manager
* initial rewrite of execute_plan methods with new progress manager
* unit tests passing
* trim a few lines
* unit tests passing; changes applied everywhere; MAB and Random coming in a separate PR
* enable final operator to show progress in parallel
* address comments
* The great deletion (#149)
* Adding Preliminary Work on Abacus and MAB Sentinel Execution (#147)
* updating models to avoid llama3
* fix parsing bugs and some generation errors
* don't require json for proposer and code synth generations; fix prompt format instruction for proposers
* fix typo/bug
* fix bugs in generator prep for field_answers; fix bug in filter impl.; other improvements
* adding new file for abacus workload
* fix len
* fix errors with dataset copy; prompt construction; and more
* remove JSON instruction from MOA proposer
* fixed bugs in optimizer configuration, llama 3.3 generation, and filter generation
* clean up demos; fix missing base prompt from map
* add one more missing base prompt
* prepare demo for full run; get embedding cost info from RAGConvert; use reasoning output from Critique
* add script to generate text-embedding-3-small reaction embeddings
* write to .chroma
* run full scale generation
* compute embeddings slowly and add progress bar
* add sleep
* fix import
* add total iters
* create embeddings before ingesting
* fix index start and finish
* load embeddings and insert directly
* make chroma use cosine sim.; finish initial search fcn. for biodex workload; naming tweak in rag convert
* capturing gen stats in Retrieve
* added UDF map operator; rewrote biodex pipeline to match docetl impl.; switched to using __name__ for functions instead of str()
* add optimizations back in
* write data to csv in demo
* limit to same model choice(s) as docetl and lotus
* fix punctuation error(s)
* try run without filter
* remove unused demo file
* remove print
* remove prints
* remove costed_phys_op_ids which were used for debugging
* try slightly diff. approach
* remove temp changes while branch is in PR review
* remove depends_on for map
* fix iteration bug in sentinel processors
* one more hotfix
* fix more errors w/SentinelPlanStats and sentinel processors
* remove logger lib to reduce confusion (#159)
* Update research.md (#160)
AISD @ NAACL 2025
* Add Pneuma-Palimpzest Integration Demo (#158)
* Add Pneuma demo
* Remove dataset semantic column addition
* Fix progress managers episode 2 attack of the clones (#156)
* modifying ProgressManager class to allow for dynamically adding tasks
* beginning to use new progress manager
* initial rewrite of execute_plan methods with new progress manager
* unit tests passing
* trim a few lines
* unit tests passing; changes applied everywhere; MAB and Random coming in a separate PR
* enable final operator to show progress in parallel
* initial work to refactor sentinel processors
* passing unit tests
* checking in minor changes
* remove use of setup_logger inside library
* stuff seems to be working
* big print
* turn off rag for test
* try debugging exception
* checking in code before changes to scoring
* finished initial refactoring of mab sentinel execution strategy
* get random sampling execution working with changes
* passing unit tests
* nosentinel progress looks good
* eyeball test is working for progress bars
* remove the old gods
* revert small change
* pull up progress manager logic in parallel execution
* catch errors in generating embeddings
* fix comments
* Merging in Changes for Sentinel Progress Bars; Split Convert (off by default); `demos/enron-demo.py`; and MMQA Benchmark (#163)
* modifying ProgressManager class to allow for dynamically adding tasks
* beginning to use new progress manager
* initial rewrite of execute_plan methods with new progress manager
* unit tests passing
* trim a few lines
* unit tests passing; changes applied everywhere; MAB and Random coming in a separate PR
* enable final operator to show progress in parallel
* initial work to refactor sentinel processors
* passing unit tests
* checking in minor changes
* remove use of setup_logger inside library
* stuff seems to be working
* big print
* turn off rag for test
* try debugging exception
* checking in code before changes to scoring
* finished initial refactoring of mab sentinel execution strategy
* get random sampling execution working with changes
* passing unit tests
* nosentinel progress looks good
* eyeball test is working for progress bars
* remove the old gods
* revert small change
* pull up progress manager logic in parallel execution
* adding prints to generator; turn progress off in favor of verbose for now
* catch errors in generating embeddings
* inspect frontier updates
* remove args.workload
* fix num_inputs in selectivity computation
* pdb in score
* fixed score fn issue
* use execution cache to avoid unnecessary computation; use sentinel stats for updating frontier
* fix progress counter
* debug
* fix empty stats
* only count stats from newly computed results
* fix tuple unpacking
* only update sample counts for llm ops
* de-dup duplicate record
* ugh
* dont forget to increment
* plz
* more plz
* increment
* recycle ops back onto reservoir so they may be reconsidered in the future
* remove pdb
* add progress to script args
* try without rag
* use term recall
* just check in on term recall
* make it easier to turn off progress
* remove pdb
* try to get re-rank to keep all inputs
* try to generate more reactions
* track total LLM calls
* 10x parallelism
* try retrieve directly on fulltext
* up max workers
* adding enron-demo w/optimization
* remove config option
* adding recall and precision to output
* allow operators to be recycled back onto frontier
* revert to using reactions instead of fulltext for similarity
* better cycling of off-frontier operators
* safety check on reservoir ops
* remove pdb
* fixing 5 results per query
* investigate sampling behavior
* check on seeds
* remove pdb
* test SplitConvert
* debug chunking
* fix bug in rag and split convert
* run with chunks
* test chunking logic
* fix chunking logic
* sum list
* remove split merge for now
* minor fixes to CUAD script
* add embedding scripts for mmqa tables and image titles
* address issue with empty titles and title collisions
* prepare script for using clip embeddings for images
* fix bug
* get full space of possible extensions
* debug
* weird bug fix?
* more debug
* fix idiotic mistake
* handle corrupted images and minor things
* add another corrupted image
* another one
* anotha
* more bad images
* last disallow file
* prepare cuad for runs
* specify execution strategy
* up samples
* add sentinel execution strategy to output name
* adding plan str and more stats
* specify no prior
* verbose=False
* fix comment; comment out prints
* make split merge optional for now
* addressing comments
* applying syntax changes to pneuma demo and supporting strings within retrieve
* bump version; fix lint; fix docs
* more docs tweaks; tweaking dependencies
* fix install issues
* one more version fix
* one more version fix
* one more version fix
* one more version fix
* last try
* change runner python version
* actually changing runner python version
* increase time limit for runners
* increase time limit for runners
* Merge in Changes From Final Abacus Work (WIP) (#173)
* modifying ProgressManager class to allow for dynamically adding tasks
* beginning to use new progress manager
* initial rewrite of execute_plan methods with new progress manager
* unit tests passing
* trim a few lines
* unit tests passing; changes applied everywhere; MAB and Random coming in a separate PR
* enable final operator to show progress in parallel
* initial work to refactor sentinel processors
* passing unit tests
* checking in minor changes
* remove use of setup_logger inside library
* stuff seems to be working
* big print
* turn off rag for test
* try debugging exception
* checking in code before changes to scoring
* finished initial refactoring of mab sentinel execution strategy
* get random sampling execution working with changes
* passing unit tests
* nosentinel progress looks good
* eyeball test is working for progress bars
* remove the old gods
* revert small change
* pull up progress manager logic in parallel execution
* adding prints to generator; turn progress off in favor of verbose for now
* catch errors in generating embeddings
* inspect frontier updates
* remove args.workload
* fix num_inputs in selectivity computation
* pdb in score
* fixed score fn issue
* use execution cache to avoid unnecessary computation; use sentinel stats for updating frontier
* fix progress counter
* debug
* fix empty stats
* only count stats from newly computed results
* fix tuple unpacking
* only update sample counts for llm ops
* de-dup duplicate record
* ugh
* dont forget to increment
* plz
* more plz
* increment
* recycle ops back onto reservoir so they may be reconsidered in the future
* remove pdb
* add progress to script args
* try without rag
* use term recall
* just check in on term recall
* make it easier to turn off progress
* remove pdb
* try to get re-rank to keep all inputs
* try to generate more reactions
* track total LLM calls
* 10x parallelism
* try retrieve directly on fulltext
* up max workers
* adding enron-demo w/optimization
* remove config option
* adding recall and precision to output
* allow operators to be recycled back onto frontier
* revert to using reactions instead of fulltext for similarity
* better cycling of off-frontier operators
* safety check on reservoir ops
* remove pdb
* fixing 5 results per query
* investigate sampling behavior
* check on seeds
* remove pdb
* test SplitConvert
* debug chunking
* fix bug in rag and split convert
* run with chunks
* test chunking logic
* fix chunking logic
* sum list
* remove split merge for now
* minor fixes to CUAD script
* add embedding scripts for mmqa tables and image titles
* address issue with empty titles and title collisions
* prepare script for using clip embeddings for images
* fix bug
* get full space of possible extensions
* debug
* weird bug fix?
* more debug
* fix idiotic mistake
* handle corrupted images and minor things
* add another corrupted image
* another one
* anotha
* more bad images
* last disallow file
* prepare cuad for runs
* specify execution strategy
* up samples
* add sentinel execution strategy to output name
* adding plan str and more stats
* specify no prior
* verbose=False
* fix comment; comment out prints
* make split merge optional for now
* addressing comments
* applying syntax changes to pneuma demo and supporting strings within retrieve
* add prints
* debug sample sets
* checking in code before tweaks to mab
* state of repo after running final Abacus experiments
* revert to opt-profiling-data
* removing print statement
* remove prints
* final fixes
* removing ragatouille dependency
* fix ruff lint checks
* bump version
* passing tests locally
* remove pdb
* fix complaint about match
* Move Abacus Research Scripts into Separate Folder (#175)
* re-organizing abacus research-related scripts
* fix model selection and other tweaks
* add data download script
* bump version
* remove scripts from root
* removing python files which were merged back in from main
* Fixed Issue(s) with Aggregate Operator Computation for Movie Queries (WIP) (#182)
* queries 1-4 working for movies
* removing RandomSampling
* Create `Context` Class + `compute` and `search` operators (#186)
* checking in changes
* refactored Dataset
* checking in
* checking in
* checking in
* queries extract final answer now
* checking in changes w/search operator
* adding changes to agents
* add isinstance checks to all executors
* removing script
* remove tools; include in future PR
* Remove `pz.Schema` in Favor of Using `pydantic.BaseModel` (#188)
* made changes throughout codebase and updated unit tests
* checking in; debugging failure with image use case
* simple demo / paper demos working
* eliminate caching features (#195)
* removing all code synthesis (#198)
* removing all code synthesis
* remove unused import
* Using LiteLLM to Manage Generator Clients / Completion APIs (#200)
* use LiteLLM for generators
* remove unused function; add TODO
* Added Anthropic Support; Simplified Rules; Removed Redundant Model Helpers (#202)
* changes after simplifying rules
* passing unit tests; removed unnecessary model helpers
* simplified primitives slightly
* fixing the assertion which used FieldInfo instead of FieldInfo.annotation (#204)
* add support for o4-mini, gemini-2.5-pro, gemini-2.0-flash, llama-4-maverick (#205)
* Adding Semantic Join Operator (#206)
* initial changes to support validator class; fixed bug in generator for images
* adding validator based optimization
* validator agent example working
* using o1 model; made validation more efficient
* added initial nested loops join implementation
* passing tests
* unit tests passing
* unit tests passing
* enron-demo.py working
* join demos in place
* parallel join and other bugfixes (#207)
* audio-demo (#208)
* remove pdb
* adding option to only use gemini models in audio demo
* adding parallelism; fixed bug w/unique_logical_op_id (#209)
* fixed issue which removed pipelined execution of operators in parallel setting (#210)
* Movie bugfixes (#211)
* fixed error in cost computation for gemini models; tested join on movie queries
* make join count monotonic
* removing progress bar updates for join for now
* adding reasoning effort (#212)
* made progress manager more efficient; made join op calculations accurate (#213)
* make groupby ignore None values
* make it possible to specify schema for MemoryDataset; reasoning model fixes
* adding audio-only match in substitution (#214)
* quick fix for audio prompt missing in MoA
* support passing in gemini/vertex credentials path; fix minor bugs in audio generation (#216)
* adding Distinct operator to PZ (#217)
* masking filepaths for sembench; fix audio pricing (#218)
* make GroupBySig a pz. import
* remove email demo
* reproduce abacus results
* add notes about deprecation to scripts for generating priors
* remove unsupported demos
* sem_add_columns -> sem_map
* Dev staging (#220)
* edit cuad abacus scripts to use loacl data
* edit cuad abacus scripts to use local data
* edit cuad abacus scripts to use local data
* fix: cuad data loader doesn't work via huggingface anymore (#215)
* edit cuad abacus scripts to use loacl data
* edit cuad abacus scripts to use local data
* edit cuad abacus scripts to use local data
---------
Co-authored-by: mdr223 <mdrusso@mit.edu>
---------
Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com>
* adding early support for vllm models
* changes to appease linter
* remove models now that we have access to gpt-5
* only perform time check on local; CI runners are slow
* Support google api and desc (#222)
* support shreya models and re-support desc
* adding gpt-5-nano to gpt-5 models
* bump version
* fixed merge error
* fixing bug where id column in schema overrides DataRecord.id
---------
Co-authored-by: Jun <130543538+chjuncn@users.noreply.github.com>
Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com>
Co-authored-by: Sivaprasad Sudhir <sivaprasad2626@gmail.com>
Co-authored-by: Yash Agarwal <yash94404@gmail.com>
Co-authored-by: Yash Agarwal <yashaga@Yashs-Air.attlocal.net>
Co-authored-by: Bari Bo LeBari <143016395+lilbarbar@users.noreply.github.com>
Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-207-160.dyn.MIT.EDU>
Co-authored-by: muhamed <muhamed@mit.edu>
Co-authored-by: Tranway1 <tranway@qq.com>
Co-authored-by: Luthfi Balaka <luthfibalaka@gmail.com>
Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com>
* Add Optimizations for Filter and Join Operators (#230)
* rename files to reflect that they will contain filter and map physical operators
* passing map unit tests
* passing filter tests
* finished tests
* adding tests for joins and initial embedding join
* adding vllm test
* fixed embedding join
* filter for filepaths instead of assert
* add embedding cost
* fixed full hashes bug with deep copy
* bump version
* undo linting change
* Reorder bug (#232)
* fixing map/filter/join tests for CI which doesn't have GEMINI access; adding test for real estate bug
* added exploration to re-order converts
* separate lack of gemini from ci tests
* Data Record Refactor (#233)
* Refactor DataRecord to hold data in the BaseModel member instead of separately.
* Some type fixes
* local unit tests passing
* enforce data record id uses list of schema fields
* remove unused code from copy
* use function instead of class internals
---------
Co-authored-by: Tianyu Li <litianyu@mit.edu>
* Updating Website to Use Docusaurus (#234)
* adding docusaurus website; still haven't updated doc content and home page
* fix links at bottom of page
* updated pages for website; docs are still not auto-rendered
* updating ci pipelines
* update path to package
* update node version
* update package
* fix build commands
* fix trigger
* fix runner and import
* fix some DataRecord inits
* switch to running llms w/separate flag b/c one test can fail due to bad generation
* changes to be more flexible on types for abacus scripts
* guessing at fix for build path
* removing old website
* remove commented ci code
* remove mkdocs from pyproject
* remove prints
* fix location of CNAME file
* Opt fixes (#236)
* fixed errors in optimizer
* added palimpchat page
* passing unit tests
* also relax types on train datasets
* bump version
* try lowercasing c
* fixed route
* eliminate slowdown from stringifying sentinel plan(s)
* bump version
* allow enron demo to swap filters w/convert
* remove print statements in validator and fix bug introduced for bytes fields
* bump version
* adding min and max
* fixing assertion error
* fix no reasoning prompt templating issue(s)
* add semantic aggregation operator
* bump version
* fix mock call in unit test
* add google analytics tracking
* Updated Website User Guide(s); Renamed `retrieve()` --> `sem_topk()` (#244)
* checking in in-flight changes
* adding code for unmatched records in left/right/outer joins
* optimization stuck
* new mmqa script is functional
* minor bugfixes
* fix naive estimates with new operators
* updated website user guides; renamed retrieve --> top-k
* fix defaults for join op
* bumping version
* fix documentation links
* Add Cost-Based Sample Budget; Fix RAGConvert/Filter for `str | Any` Types (#247)
* checking in in-flight changes
* adding code for unmatched records in left/right/outer joins
* optimization stuck
* new mmqa script is functional
* minor bugfixes
* fix naive estimates with new operators
* updated website user guides; renamed retrieve --> top-k
* add cost-based sample budget; fix rag convert and filter for str | Any fields
* Fix missing comma causing vLLM completions to break (#246)
* bumping version
* Final Changes from Revision for Abacus (#250)
* checking in in-flight changes
* adding code for unmatched records in left/right/outer joins
* optimization stuck
* new mmqa script is functional
* minor bugfixes
* fix naive estimates with new operators
* updated website user guides; renamed retrieve --> top-k
* add cost-based sample budget; fix rag convert and filter for str | Any fields
* pushing local mmqa experiment
* try n=20
* preparing final runs for table 2
* fix thread safety issue w/EmbeddingJoin
* adding full ablation study
* bugfixes in operators
* adding final revision work from local
* updated readme
* adding changes from berners-lee
* remove comments
* fix linting and bump version
* Blebari task 131 (#241)
* .
* .
* minor tweaks
* add embedding costs to RecordOpStats
* minor tweaks
* change comment
---------
Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-128-127.dyn.mit.edu>
Co-authored-by: Bari LeBari <barilebari@Baris-MacBook-Pro.local>
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* adding real-estate-eval-100 to download script
* adding real-estate-demo
* jczhang add model checks (#254)
* adding checks that user has support for models they need
* check if available models is empty
* trying to resolve dependency
* bump version
* gemini studio api issue (#257)
* recreating the issue
* fixing model provider for google AI studio
* add try-except back
---------
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* bump version
* fix model check
* Fix no reasoning (#270)
* enforce that setting reasoning effort to None turns of reasoning prompts; fix config copy error
* bump version
* update constants to reflect the cached-input token costs
* update GenerationStats
* update GenerationStats to include cache token/cost
* fix typo
* update stats in GenerationStats
* prompt cahing implementation
* split cache tokens into read and creation
* restructure prompt caching into PromptCacheManager class
* update CacheManager class
* caching demo
* add claude sonnet 4.0 (temporary)
* fix pretty print error for anthropic
* propagate cache-related stats from end-to-end
* fix bug for gemini model
* claude-3-7 deprecated
* fix formatting issues
* fix formatting issues
* fixing comments
* update token/cost logic to be disjoint for input and cache
* update demo
* Generalize Support for LiteLLM Models #265 (#272)
* model_info (Model -> ConfiguredModel in constants) - 265
* predictor function for unknown spec
* update full list of API keys
* add gemini3 and gpt5.2 to constants
* return models based on opt obj when models is None
* reorganize functions in model info/helper
* add tests and update model references and imports
* move validation from config to query processor
* add json file for model score/latency and update predictor function
* update model references and imports
* update dependencies and related test cases
* update Model to have both string and enum
* model_info -> model_helper
* update model usage in query config
* rollback import changes for CuratedModel -> Model
* ModelProvider class
* update all switch cases to ModelProvider when applicable
* reverted CuratedModel changes
* add test cases
* add additional test cases
* fix formatting issues
* add prompt caching stats for #262
* restructure Model class
* fix Model enum issue
* add sorting logic to model class
* use singular json file for info fetching
* expand model list and updates curated_model_info file
* restructure model info fetching, update Model class and test cases
* script to update pz_models_information and update get_optimal_models
* is_deepseek_model
* add audio cache read/creation
* remove claude sonnet 3.5 (retired)
* add deepseek-chat
* add .json files to pyproject.toml so that is packaged too
* revert uvicorn dependency
* some small tweaks
* passing tests
---------
Co-authored-by: joycequ <joycequ@mit.edu>
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* fixed model function calls
* clean up duplicate code to help with summing field stats
* update fields for classes in models.py, update usage in generators.py
* add test generation file
* add test generation file
* generator messages
* update anthropic stats
* update input/cache token stats
* remove generator messages from github repo
* update generator test cases and implement initial gemini wrapper class
* delete output audio tokens and update gemini client class
* ruff lint for test cases
* fix gemini reasoning effort bug
* fix cost and image issues
* incorporate all pr comments
* make anthropic version more flexible
* Revert "make anthropic version more flexible"
This reverts commit 8eeed6711f1d01185d75fa4ccc0a51e4e681a021.
* floatify everything
* all but two tests passing
* bump version and relax tests
* Local Model Execution (vLLM) #266 (#282)
* local vllm execution implementation
* update vllm local specs (predictors)
* more robust detection of local model capabilities
* fix formatting
* test script formatting update
* adding placeholder for vllm cache tokens
* remove prints
* remove print
* reverted type
* fix type annotation
* tests passing
---------
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* Allowing other provider than OpenAI for embeddings (#283)
* Removing hard-coded TEXT_EMBEDDING_3_SMALL in RAG and JOIN operators
* remove whitespace
* fixed embedding access in RAGFilter
* fix id/op_params for RAG ops and EmbeddingJoin; update rules to enforce CLIP cannot be used for text-only
* fix value
* unit tests passing
---------
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* fixed issue #286 and bumped version
* fix linter errors
* quest evals
* adding support for azure openai models (#292)
* adding support for azure openai models
* added warning in comment
* bump version
* fix typos
* fix linter error
* security patch for litellm
* bump version
* add TODO
---------
Co-authored-by: Tianyu Li <litianyu@mit.edu>
Co-authored-by: Jun <130543538+chjuncn@users.noreply.github.com>
Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com>
Co-authored-by: Sivaprasad Sudhir <sivaprasad2626@gmail.com>
Co-authored-by: Yash Agarwal <yash94404@gmail.com>
Co-authored-by: Yash Agarwal <yashaga@Yashs-Air.attlocal.net>
Co-authored-by: Bari Bo LeBari <143016395+lilbarbar@users.noreply.github.com>
Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-207-160.dyn.MIT.EDU>
Co-authored-by: muhamed <muhamed@mit.edu>
Co-authored-by: Tranway1 <tranway@qq.com>
Co-authored-by: Luthfi Balaka <luthfibalaka@gmail.com>
Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com>
Co-authored-by: Griffin Roupe <31631417+frostyfan109@users.noreply.github.com>
Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-128-127.dyn.mit.edu>
Co-authored-by: Bari LeBari <barilebari@Baris-MacBook-Pro.local>
Co-authored-by: Jerry Zhang <122544742+xqlcn@users.noreply.github.com>
Co-authored-by: joycequ <joycequ2016@gmail.com>
Co-authored-by: joycequu <65379523+joycequu@users.noreply.github.com>
Co-authored-by: joycequ <joycequ@mit.edu>
Co-authored-by: SoTrx <11771975+SoTrx@users.noreply.github.com>
1.5.0: Add Support For Azure OpenAI Models; Deprecate Llama 3.2 3B (#293)
* Fix broken dependencies (#227)
* Move DataRecord Internal Fields to Have Leading Underscore (#229)
* update README
* 1. support add_columns in Dataset; 2. support run().to_df(); 3. add demo in df-newinterface.py (#78)
* Support add_columns in Dataset. Support demo in df-newinterface.py
Currently we have to do
records, _ = qr3.run()
outputDf = DataRecord.to_df(records)
I'll try to make qr3.run().to_df() work in another PR.
* ruff check --fix
* Support run().to_df()
Update run() to DataRecordCollection, so that it will be easier for use to support more features for run() output.
We support to_df() in this change.
I'll send out following commits to update other demos.
* run check --fix
* fix typo in DataRecordCollection
* Update records.py
* fix tiny bug in mab processor.
The code will run into issue if we don't return any stats for this function in
```
max_quality_record_set = self.pick_highest_quality_output(all_source_record_sets)
if (
not prev_logical_op_is_filter
or (
prev_logical_op_is_filter
and max_quality_record_set.record_op_stats[0].passed_operator
)
```
* update record.to_df interface
update to record.to_df(records: list[DataRecord], project_cols: list[str] | None = None) which is consistent with other function in this class.
* Update demo for the new execute() output format
* better way to get plan from output.run()
* fix getting plan from DataRecordCollection.
people used to get plan from execute() of streaming processor, which is not a good practice.
I update plan_str to plan_stats, and they need to get physical plan from processor.
Consider use better ways to provide executed physical plan to DataRecordCollection, possibly from stats.
* Update df-newinterface.py
* update code based on comments from Matt.
1. add cardinality param in add_columns
2. remove extra testdata files
3. add __iter__ in DataRecordCollection to help iter over streaming output.
* see if copilot just saved me 20 minutes
* fix package name
* use sed to get version from pyproject.toml
* bump project version; keep docs behind to test ci pipeline
* bumping docs version to match code version
* use new __iter__ method in demos where possible
* add type hint for output of __iter__; use __iter__ in unit tests
* Update download-testdata.sh (#89)
Added enron-tiny.csv
* Clean up the retrieve API (#79)
* Clean up the retrieve operator interface
* fix comments
* Update to the new to_df() API
* Code update for https://github.com/mitdbg/palimpzest/issues/84 (#101)
* Create chat.rst (#96)
* Create chat.rst
* Update pyproject.toml
Hotfix for chat
* Update conf.py
Hotfix for chat.rst
* code update for https://github.com/mitdbg/palimpzest/issues/84
This implementation basically resolves https://github.com/mitdbg/palimpzest/issues/84.
One implementation is different from the #84:
.add_columns(
cols=[
{"name": "sender", "type": "string", "udf": compute_sender},
...
]
)
If add_columns() uses cols, udf, types as params, it will make this function confusing again. Instead, if users need to specify different udfs for different columns, they should just call add_columns() multiple times for different columns.
* changed types to make use of Python type system; updated use of types in tests; updated docs and README
* update test to match no longer allowing None default
---------
Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com>
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* Skip an operator if this is a duplicate op instead of raise error (#102)
* Create chat.rst (#96)
* Create chat.rst
* Update pyproject.toml
Hotfix for chat
* Update conf.py
Hotfix for chat.rst
* Skip an operator when it doesn't need any logicalOP instead of raise error
#Final Effects
1. Dataset() init only has one responsibility: wrap a datasource to a Dataset. I think this is a better interface.
2. No extra convert() will be added to the plan.
3. When users add the same op multiple times dataset.convert(File).convert(File), the system will just dedup the same op instead of raise error.
#Issue
Currently Dataset(src, schema) initiation has 2 responsibilities:
1. read source
2. convert source to schema.
When we use default schema for Dataset init(source, schema=DefaultSchema) for users, the code works like:
1. Read source to schema that DataSource provides. This schema is derived by system, so the users don't know (don't need to know).
2. Convert Source schema to DefaultSchema.
So everytime, the system will make one more convert call to convert SourceSchema to DefaultSchema, which is definitely wrong.
#Solution
1. We use schema from Datasource if exists, which is reasonable.
2. If we do 1, then we'll get a dataset node that no actual op as its input_schema ==output_schema, so I updated a line in optimizer to just skip the node if it doesn't do anything instead raiseerror.
#Real Examples
##Before
Generated plan:
0. MarshalAndScanDataOp -> PDFFile
1. PDFFile -> LLMConvertBonded -> DefaultSchema
(contents, filename, text_conte) -> (value)
Model: Model.GPT_4o
Prompt Strategy: PromptStrategy.COT_QA
2. DefaultSchema -> MixtureOfAgentsConvert -> ScientificPaper
(value) -> (contents, filename, paper_auth)
Prompt Strategy: None
Proposer Models: [GPT_4o]
Temperatures: [0.0]
Aggregator Model: Model.GPT_4o
Proposer Prompt Strategy: chain-of-thought-mixture-of-agents-proposer
Aggregator Prompt Strategy: chain-of-thought-mixture-of-agents-aggregation
3. ScientificPaper -> LLMFilter -> ScientificPaper
(contents, filename, paper_auth) -> (contents, filename, paper_auth)
Model: Model.GPT_4o
Filter: The paper mentions phosphorylation of Exo1
4. ScientificPaper -> MixtureOfAgentsConvert -> Reference
(contents, filename, paper_auth) -> (reference_first_author, refere)
Prompt Strategy: None
Proposer Models: [GPT_4o]
Temperatures: [0.8]
Aggregator Model: Model.GPT_4o
Proposer Prompt Strategy: chain-of-thought-mixture-of-agents-proposer
Aggregator Prompt Strategy: chain-of-thought-mixture-of-agents-aggregation
##After
Generated plan:
0. MarshalAndScanDataOp -> PDFFile
1. PDFFile -> LLMConvertBonded -> ScientificPaper
(contents, filename, text_conte) -> (contents, filename, paper_auth)
Model: Model.GPT_4o
Prompt Strategy: PromptStrategy.COT_QA
2. ScientificPaper -> LLMFilter -> ScientificPaper
(contents, filename, paper_auth) -> (contents, filename, paper_auth)
Model: Model.GPT_4o
Filter: The paper mentions phosphorylation of Exo1
3. ScientificPaper -> MixtureOfAgentsConvert -> Reference
(contents, filename, paper_auth) -> (reference_first_author, refere)
Prompt Strategy: None
Proposer Models: [GPT_4o]
Temperatures: [0.8]
Aggregator Model: Model.GPT_4o
Proposer Prompt Strategy: chain-of-thought-mixture-of-agents-proposer
Aggregator Prompt Strategy: chain-of-thought-mixture-of-agents-aggregation
* make equality check for new field names a bit more explicit
* fix fixture usage
* update all plans within code base to explicitly convert when needed; and removed unnecessary schemas for reading from datasource
---------
Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com>
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* Refactor demos to use .sem_add_columns or .add_columns instead of convert(), remove Schema from demos when possible. (#104)
* Create chat.rst (#96)
* Create chat.rst
* Update pyproject.toml
Hotfix for chat
* Update conf.py
Hotfix for chat.rst
* code update for https://github.com/mitdbg/palimpzest/issues/84
This implementation basically resolves https://github.com/mitdbg/palimpzest/issues/84.
One implementation is different from the #84:
.add_columns(
cols=[
{"name": "sender", "type": "string", "udf": compute_sender},
...
]
)
If add_columns() uses cols, udf, types as params, it will make this function confusing again. Instead, if users need to specify different udfs for different columns, they should just call add_columns() multiple times for different columns.
* use field_values instead of field_types as field_values have the actual values,
use field_values instead of field_types as field_values have the actual values, since field_values have the actual key-value pairs, while field_types are just contain fields and their types.
records[0].schema is the schema of the output, which doesn't mean we already populate the schema into record.
* Remove .convert() and use .sem_add_columns or .add_columns instead
This change is based on #101 and #102, please review them first then this change.
1. This is to refactor all demos to use .sem_add_columns or .add_columns, and remove .convert().
2. Remove Schema from demos, except demos using ValidationDataSource and dataset.retrieve() that need schema now. We can refactor these cases later.
* ruff check --fix
* fix unittest
* demos fixed and unit tests running
* fix add_columns --> sem_add_columns in demo
* udpate quickstart to reflect code changes; shorten text as much as possible
* passing unit tests
* remove convert() everywhere
* fixes to correct errors in demos; update quickstart and docs
---------
Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com>
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* Simplify Datasource (#103)
## Summary of PR changes
**Note 1:** I did not change anything related to val_datasource (including tangential functions like Dataset._set_data_source()) as that will all be modified in a subsequent PR to reflect our discussion re: validation data.
**Note 2:** I have completely commented out datamanager.py and config.py; for now I am willing to leave the code around in case we desperately need it for PalimpChat. However, my hope is that PalimpChat can be tweaked to work without the data manager and those files can be deleted before merging dev into main
**Note 3:** Despite the branch name, fixing the progress managers will be part of a separate PR.
- Collapsed all four `DataSource` classes down to a single `DataReader` class
- Limit the number of methods the user needs to implement to just `__len__()` and `__getitem__()`
- (Switched from using `get_item() --> __getitem__()` in `DataReader`)
- Provided `DataReader` directly to scan operators (also renamed `DataSourcePhysicalOp --> ScanPhysicalOp`
- Removed `DataDirectory()` from `src/` entirely; this included commenting out things which made use of the cache (e.g. caching computed `DataRecords` and codegen examples)
- Got rid of `dataset_id` everywhere (which tracks with the previous bullet)
- Removed the `Config` class which was a relic of a bygone era (and also intertwined with the `DataDirectory()`)
- Updated all demos to use `import palimpzest as pz` to make the import statement(s) more welcoming
- Fixed one bug resulting from converts now producing union schemas. Instead of including the `output_schema` in an operators' `get_id_params()` we simply report the `generated_fields`.
- Changed `source_id --> source_idx` everywhere (this eliminated some weird renaming logic)
- Finally, I added a large set of documentation for the DataSource class(es)
* Multi-LLM Refinement Pipeline for Query Output Validation (#118)
* Multi-LLM Refinement Pipeline for Query Output Validation (#92)
## Summary of PR
This PR contains the work to add a new `CriticConvert` physical operator to PZ. At a high-level, this operator runs a bonded convert, and then asks a critic model if the answer produced by the bonded convert can be improved upon. The original output and the critique are then fed into a refinement model, which produces the improved output.
The work to implement this includes:
1. Defining the physical operator in `src/palimpzest/query/operators/critique_and_refine_convert.py`
2. Adding an implementation rule for this physical operator in `src/palimpzest/query/optimizer/rules.py`
3. Adding boolean flag(s) to enable allowing / disallowing this physical optimization
4. Adding base prompts for the critique and refinement generations
One other change which this work spawned was an attempt to improve the management and construction of our prompts -- and to decouple this logic from the `BaseGenerator` class. On the management side, I split our single `prompts.py` file into a set of files. On the construction side, I created a `PromptFactory` class which templates prompts based on the `prompt_strategy` and input record.
The `PromptFactory` is not a perfect solution, but I think it is a step in the right direction.
Finally, I fixed an error which previously filtered out `RAGConvert` operators from being considered by the `Optimizer`, and I made 2-3 more miscellaneous small tweaks.
---------
Co-authored-by: Yash Agarwal <yash94404@gmail.com>
Co-authored-by: Yash Agarwal <yashaga@Yashs-Air.attlocal.net>
* MkDocs Site for Palimpzest API Documentation (#116)
## Summary of PR Changes
1. Changed `docs` to use [MkDocs](https://www.mkdocs.org/) instead of Sphinx
2. Created initial `Getting Started` content
3. Created placeholders for `User Guide` content (to follow in a subsequent PR)
4. Added autogenerated docs for our most user-facing code (we will need to add docstrings to our code in a subsequent PR)
5. Made small tweaks to `src/` to allow users to specify policy using kwargs in `.run()`
6. Renamed the `testdata/enron-tiny/` files so that they're not so damn weird
---------
Co-authored-by: Yash Agarwal <yash94404@gmail.com>
Co-authored-by: Yash Agarwal <yashaga@Yashs-Air.attlocal.net>
* remove registration of sources from CI; only check version bump if there is a code change
* remove filter for only checking version bump when src files changed
* Rename `nocache` --> `cache` everywhere (#128)
* first commit
* Removed myenv
* added to git ignore
* addressed the comments in review
* flip one minor comment
* minor spacing fix
* fix spaces in a few more spots
---------
Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-207-160.dyn.MIT.EDU>
Co-authored-by: muhamed <muhamed@mit.edu>
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* adding citation (and making 'others' explicit) (#136)
* Make Generator thread-safe (#139)
* fix moa prompt
* fix moa prompt aggregator
* update version
* make generator thread-safe
* update generator to return messages
* address comments
* Begin Process of Improving Index Abstraction(s) in PZ (#138)
* quick and dirty implementation which tracks retrieve costs
* bug fixes and currently unused index code
* add default search func which I forgot to implement and add chromadb to pyproject.toml
* leaving TODO
* hotfix to add cost for retrieve operation
* another hotfix to add ragatouille dependency
* Add logger for PZ (#134)
* add logger for PZ
1. When verbose=True, we save all logs to log_file and print them on console;
2. when verbose=False, we only save ERROR+ log to file and print ERROR+.
I just add logging to somewhere I think might be important for the execution, we always can add/remove for more or less.
Also I might update the logging message based on my later annotation work. But this PR should setup the logging mechanism for now.
* ruff fix
* update code based on comments
1. not logging output_records
2. not logging plan_stats
3. make the files to ".pz_logs"
---------
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* fix merge bug (#141)
* ruff fix
* update log dir and fix tiny bug
* fix merge bug
* Use a singleton API client for operators (#140)
* fix moa prompt
* fix moa prompt aggregator
* update version
* make generator thread-safe
* update generator to return messages
* address comments
* create a singleton API client
* fix linting
* fix logging in generators
* also create parent dir. if missing
* CUAD benchmark (#143)
* fix moa prompt
* fix moa prompt aggregator
* update version
* make generator thread-safe
* update generator to return messages
* address comments
* create a singleton API client
* fix linting
* fix logging in generators
* fix CUAD benchmarlk
* fix type
* minor fixes
* Limit the Scope of Logging within the Optimizer (#144)
* making it possible to set log level based on env. variable; adding time limit on seven filters test
* deleting instead of commenting out
* Remove Conventional LLM Convert; Update Bonded LLM Convert retry logic (#145)
* use NullHandler in __init__ and let application control logging config (#146)
* use NullHandler in __init__ and let application control logging config
* ruff fix
* Fix Progress Manager and Simplify `execute_plan` methods (#148)
* modifying ProgressManager class to allow for dynamically adding tasks
* beginning to use new progress manager
* initial rewrite of execute_plan methods with new progress manager
* unit tests passing
* trim a few lines
* unit tests passing; changes applied everywhere; MAB and Random coming in a separate PR
* enable final operator to show progress in parallel
* address comments
* The great deletion (#149)
* Adding Preliminary Work on Abacus and MAB Sentinel Execution (#147)
* updating models to avoid llama3
* fix parsing bugs and some generation errors
* don't require json for proposer and code synth generations; fix prompt format instruction for proposers
* fix typo/bug
* fix bugs in generator prep for field_answers; fix bug in filter impl.; other improvements
* adding new file for abacus workload
* fix len
* fix errors with dataset copy; prompt construction; and more
* remove JSON instruction from MOA proposer
* fixed bugs in optimizer configuration, llama 3.3 generation, and filter generation
* clean up demos; fix missing base prompt from map
* add one more missing base prompt
* prepare demo for full run; get embedding cost info from RAGConvert; use reasoning output from Critique
* add script to generate text-embedding-3-small reaction embeddings
* write to .chroma
* run full scale generation
* compute embeddings slowly and add progress bar
* add sleep
* fix import
* add total iters
* create embeddings before ingesting
* fix index start and finish
* load embeddings and insert directly
* make chroma use cosine sim.; finish initial search fcn. for biodex workload; naming tweak in rag convert
* capturing gen stats in Retrieve
* added UDF map operator; rewrote biodex pipeline to match docetl impl.; switched to using __name__ for functions instead of str()
* add optimizations back in
* write data to csv in demo
* limit to same model choice(s) as docetl and lotus
* fix punctuation error(s)
* try run without filter
* remove unused demo file
* remove print
* remove prints
* remove costed_phys_op_ids which were used for debugging
* try slightly diff. approach
* remove temp changes while branch is in PR review
* remove depends_on for map
* fix iteration bug in sentinel processors
* one more hotfix
* fix more errors w/SentinelPlanStats and sentinel processors
* remove logger lib to reduce confusion (#159)
* Update research.md (#160)
AISD @ NAACL 2025
* Add Pneuma-Palimpzest Integration Demo (#158)
* Add Pneuma demo
* Remove dataset semantic column addition
* Fix progress managers episode 2 attack of the clones (#156)
* modifying ProgressManager class to allow for dynamically adding tasks
* beginning to use new progress manager
* initial rewrite of execute_plan methods with new progress manager
* unit tests passing
* trim a few lines
* unit tests passing; changes applied everywhere; MAB and Random coming in a separate PR
* enable final operator to show progress in parallel
* initial work to refactor sentinel processors
* passing unit tests
* checking in minor changes
* remove use of setup_logger inside library
* stuff seems to be working
* big print
* turn off rag for test
* try debugging exception
* checking in code before changes to scoring
* finished initial refactoring of mab sentinel execution strategy
* get random sampling execution working with changes
* passing unit tests
* nosentinel progress looks good
* eyeball test is working for progress bars
* remove the old gods
* revert small change
* pull up progress manager logic in parallel execution
* catch errors in generating embeddings
* fix comments
* Merging in Changes for Sentinel Progress Bars; Split Convert (off by default); `demos/enron-demo.py`; and MMQA Benchmark (#163)
* modifying ProgressManager class to allow for dynamically adding tasks
* beginning to use new progress manager
* initial rewrite of execute_plan methods with new progress manager
* unit tests passing
* trim a few lines
* unit tests passing; changes applied everywhere; MAB and Random coming in a separate PR
* enable final operator to show progress in parallel
* initial work to refactor sentinel processors
* passing unit tests
* checking in minor changes
* remove use of setup_logger inside library
* stuff seems to be working
* big print
* turn off rag for test
* try debugging exception
* checking in code before changes to scoring
* finished initial refactoring of mab sentinel execution strategy
* get random sampling execution working with changes
* passing unit tests
* nosentinel progress looks good
* eyeball test is working for progress bars
* remove the old gods
* revert small change
* pull up progress manager logic in parallel execution
* adding prints to generator; turn progress off in favor of verbose for now
* catch errors in generating embeddings
* inspect frontier updates
* remove args.workload
* fix num_inputs in selectivity computation
* pdb in score
* fixed score fn issue
* use execution cache to avoid unnecessary computation; use sentinel stats for updating frontier
* fix progress counter
* debug
* fix empty stats
* only count stats from newly computed results
* fix tuple unpacking
* only update sample counts for llm ops
* de-dup duplicate record
* ugh
* dont forget to increment
* plz
* more plz
* increment
* recycle ops back onto reservoir so they may be reconsidered in the future
* remove pdb
* add progress to script args
* try without rag
* use term recall
* just check in on term recall
* make it easier to turn off progress
* remove pdb
* try to get re-rank to keep all inputs
* try to generate more reactions
* track total LLM calls
* 10x parallelism
* try retrieve directly on fulltext
* up max workers
* adding enron-demo w/optimization
* remove config option
* adding recall and precision to output
* allow operators to be recycled back onto frontier
* revert to using reactions instead of fulltext for similarity
* better cycling of off-frontier operators
* safety check on reservoir ops
* remove pdb
* fixing 5 results per query
* investigate sampling behavior
* check on seeds
* remove pdb
* test SplitConvert
* debug chunking
* fix bug in rag and split convert
* run with chunks
* test chunking logic
* fix chunking logic
* sum list
* remove split merge for now
* minor fixes to CUAD script
* add embedding scripts for mmqa tables and image titles
* address issue with empty titles and title collisions
* prepare script for using clip embeddings for images
* fix bug
* get full space of possible extensions
* debug
* weird bug fix?
* more debug
* fix idiotic mistake
* handle corrupted images and minor things
* add another corrupted image
* another one
* anotha
* more bad images
* last disallow file
* prepare cuad for runs
* specify execution strategy
* up samples
* add sentinel execution strategy to output name
* adding plan str and more stats
* specify no prior
* verbose=False
* fix comment; comment out prints
* make split merge optional for now
* addressing comments
* applying syntax changes to pneuma demo and supporting strings within retrieve
* bump version; fix lint; fix docs
* more docs tweaks; tweaking dependencies
* fix install issues
* one more version fix
* one more version fix
* one more version fix
* one more version fix
* last try
* change runner python version
* actually changing runner python version
* increase time limit for runners
* increase time limit for runners
* Merge in Changes From Final Abacus Work (WIP) (#173)
* modifying ProgressManager class to allow for dynamically adding tasks
* beginning to use new progress manager
* initial rewrite of execute_plan methods with new progress manager
* unit tests passing
* trim a few lines
* unit tests passing; changes applied everywhere; MAB and Random coming in a separate PR
* enable final operator to show progress in parallel
* initial work to refactor sentinel processors
* passing unit tests
* checking in minor changes
* remove use of setup_logger inside library
* stuff seems to be working
* big print
* turn off rag for test
* try debugging exception
* checking in code before changes to scoring
* finished initial refactoring of mab sentinel execution strategy
* get random sampling execution working with changes
* passing unit tests
* nosentinel progress looks good
* eyeball test is working for progress bars
* remove the old gods
* revert small change
* pull up progress manager logic in parallel execution
* adding prints to generator; turn progress off in favor of verbose for now
* catch errors in generating embeddings
* inspect frontier updates
* remove args.workload
* fix num_inputs in selectivity computation
* pdb in score
* fixed score fn issue
* use execution cache to avoid unnecessary computation; use sentinel stats for updating frontier
* fix progress counter
* debug
* fix empty stats
* only count stats from newly computed results
* fix tuple unpacking
* only update sample counts for llm ops
* de-dup duplicate record
* ugh
* dont forget to increment
* plz
* more plz
* increment
* recycle ops back onto reservoir so they may be reconsidered in the future
* remove pdb
* add progress to script args
* try without rag
* use term recall
* just check in on term recall
* make it easier to turn off progress
* remove pdb
* try to get re-rank to keep all inputs
* try to generate more reactions
* track total LLM calls
* 10x parallelism
* try retrieve directly on fulltext
* up max workers
* adding enron-demo w/optimization
* remove config option
* adding recall and precision to output
* allow operators to be recycled back onto frontier
* revert to using reactions instead of fulltext for similarity
* better cycling of off-frontier operators
* safety check on reservoir ops
* remove pdb
* fixing 5 results per query
* investigate sampling behavior
* check on seeds
* remove pdb
* test SplitConvert
* debug chunking
* fix bug in rag and split convert
* run with chunks
* test chunking logic
* fix chunking logic
* sum list
* remove split merge for now
* minor fixes to CUAD script
* add embedding scripts for mmqa tables and image titles
* address issue with empty titles and title collisions
* prepare script for using clip embeddings for images
* fix bug
* get full space of possible extensions
* debug
* weird bug fix?
* more debug
* fix idiotic mistake
* handle corrupted images and minor things
* add another corrupted image
* another one
* anotha
* more bad images
* last disallow file
* prepare cuad for runs
* specify execution strategy
* up samples
* add sentinel execution strategy to output name
* adding plan str and more stats
* specify no prior
* verbose=False
* fix comment; comment out prints
* make split merge optional for now
* addressing comments
* applying syntax changes to pneuma demo and supporting strings within retrieve
* add prints
* debug sample sets
* checking in code before tweaks to mab
* state of repo after running final Abacus experiments
* revert to opt-profiling-data
* removing print statement
* remove prints
* final fixes
* removing ragatouille dependency
* fix ruff lint checks
* bump version
* passing tests locally
* remove pdb
* fix complaint about match
* Move Abacus Research Scripts into Separate Folder (#175)
* re-organizing abacus research-related scripts
* fix model selection and other tweaks
* add data download script
* bump version
* remove scripts from root
* removing python files which were merged back in from main
* Fixed Issue(s) with Aggregate Operator Computation for Movie Queries (WIP) (#182)
* queries 1-4 working for movies
* removing RandomSampling
* Create `Context` Class + `compute` and `search` operators (#186)
* checking in changes
* refactored Dataset
* checking in
* checking in
* checking in
* queries extract final answer now
* checking in changes w/search operator
* adding changes to agents
* add isinstance checks to all executors
* removing script
* remove tools; include in future PR
* Remove `pz.Schema` in Favor of Using `pydantic.BaseModel` (#188)
* made changes throughout codebase and updated unit tests
* checking in; debugging failure with image use case
* simple demo / paper demos working
* eliminate caching features (#195)
* removing all code synthesis (#198)
* removing all code synthesis
* remove unused import
* Using LiteLLM to Manage Generator Clients / Completion APIs (#200)
* use LiteLLM for generators
* remove unused function; add TODO
* Added Anthropic Support; Simplified Rules; Removed Redundant Model Helpers (#202)
* changes after simplifying rules
* passing unit tests; removed unnecessary model helpers
* simplified primitives slightly
* fixing the assertion which used FieldInfo instead of FieldInfo.annotation (#204)
* add support for o4-mini, gemini-2.5-pro, gemini-2.0-flash, llama-4-maverick (#205)
* Adding Semantic Join Operator (#206)
* initial changes to support validator class; fixed bug in generator for images
* adding validator based optimization
* validator agent example working
* using o1 model; made validation more efficient
* added initial nested loops join implementation
* passing tests
* unit tests passing
* unit tests passing
* enron-demo.py working
* join demos in place
* parallel join and other bugfixes (#207)
* audio-demo (#208)
* remove pdb
* adding option to only use gemini models in audio demo
* adding parallelism; fixed bug w/unique_logical_op_id (#209)
* fixed issue which removed pipelined execution of operators in parallel setting (#210)
* Movie bugfixes (#211)
* fixed error in cost computation for gemini models; tested join on movie queries
* make join count monotonic
* removing progress bar updates for join for now
* adding reasoning effort (#212)
* made progress manager more efficient; made join op calculations accurate (#213)
* make groupby ignore None values
* make it possible to specify schema for MemoryDataset; reasoning model fixes
* adding audio-only match in substitution (#214)
* quick fix for audio prompt missing in MoA
* support passing in gemini/vertex credentials path; fix minor bugs in audio generation (#216)
* adding Distinct operator to PZ (#217)
* masking filepaths for sembench; fix audio pricing (#218)
* make GroupBySig a pz. import
* remove email demo
* reproduce abacus results
* add notes about deprecation to scripts for generating priors
* remove unsupported demos
* sem_add_columns -> sem_map
* Dev staging (#220)
* edit cuad abacus scripts to use loacl data
* edit cuad abacus scripts to use local data
* edit cuad abacus scripts to use local data
* fix: cuad data loader doesn't work via huggingface anymore (#215)
* edit cuad abacus scripts to use loacl data
* edit cuad abacus scripts to use local data
* edit cuad abacus scripts to use local data
---------
Co-authored-by: mdr223 <mdrusso@mit.edu>
---------
Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com>
* adding early support for vllm models
* changes to appease linter
* remove models now that we have access to gpt-5
* only perform time check on local; CI runners are slow
* Support google api and desc (#222)
* support shreya models and re-support desc
* adding gpt-5-nano to gpt-5 models
* bump version
* fixed merge error
* fixing bug where id column in schema overrides DataRecord.id
---------
Co-authored-by: Jun <130543538+chjuncn@users.noreply.github.com>
Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com>
Co-authored-by: Sivaprasad Sudhir <sivaprasad2626@gmail.com>
Co-authored-by: Yash Agarwal <yash94404@gmail.com>
Co-authored-by: Yash Agarwal <yashaga@Yashs-Air.attlocal.net>
Co-authored-by: Bari Bo LeBari <143016395+lilbarbar@users.noreply.github.com>
Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-207-160.dyn.MIT.EDU>
Co-authored-by: muhamed <muhamed@mit.edu>
Co-authored-by: Tranway1 <tranway@qq.com>
Co-authored-by: Luthfi Balaka <luthfibalaka@gmail.com>
Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com>
* Add Optimizations for Filter and Join Operators (#230)
* rename files to reflect that they will contain filter and map physical operators
* passing map unit tests
* passing filter tests
* finished tests
* adding tests for joins and initial embedding join
* adding vllm test
* fixed embedding join
* filter for filepaths instead of assert
* add embedding cost
* fixed full hashes bug with deep copy
* bump version
* undo linting change
* Reorder bug (#232)
* fixing map/filter/join tests for CI which doesn't have GEMINI access; adding test for real estate bug
* added exploration to re-order converts
* separate lack of gemini from ci tests
* Data Record Refactor (#233)
* Refactor DataRecord to hold data in the BaseModel member instead of separately.
* Some type fixes
* local unit tests passing
* enforce data record id uses list of schema fields
* remove unused code from copy
* use function instead of class internals
---------
Co-authored-by: Tianyu Li <litianyu@mit.edu>
* Updating Website to Use Docusaurus (#234)
* adding docusaurus website; still haven't updated doc content and home page
* fix links at bottom of page
* updated pages for website; docs are still not auto-rendered
* updating ci pipelines
* update path to package
* update node version
* update package
* fix build commands
* fix trigger
* fix runner and import
* fix some DataRecord inits
* switch to running llms w/separate flag b/c one test can fail due to bad generation
* changes to be more flexible on types for abacus scripts
* guessing at fix for build path
* removing old website
* remove commented ci code
* remove mkdocs from pyproject
* remove prints
* fix location of CNAME file
* Opt fixes (#236)
* fixed errors in optimizer
* added palimpchat page
* passing unit tests
* also relax types on train datasets
* bump version
* try lowercasing c
* fixed route
* eliminate slowdown from stringifying sentinel plan(s)
* bump version
* allow enron demo to swap filters w/convert
* remove print statements in validator and fix bug introduced for bytes fields
* bump version
* adding min and max
* fixing assertion error
* fix no reasoning prompt templating issue(s)
* add semantic aggregation operator
* bump version
* fix mock call in unit test
* add google analytics tracking
* Updated Website User Guide(s); Renamed `retrieve()` --> `sem_topk()` (#244)
* checking in in-flight changes
* adding code for unmatched records in left/right/outer joins
* optimization stuck
* new mmqa script is functional
* minor bugfixes
* fix naive estimates with new operators
* updated website user guides; renamed retrieve --> top-k
* fix defaults for join op
* bumping version
* fix documentation links
* Add Cost-Based Sample Budget; Fix RAGConvert/Filter for `str | Any` Types (#247)
* checking in in-flight changes
* adding code for unmatched records in left/right/outer joins
* optimization stuck
* new mmqa script is functional
* minor bugfixes
* fix naive estimates with new operators
* updated website user guides; renamed retrieve --> top-k
* add cost-based sample budget; fix rag convert and filter for str | Any fields
* Fix missing comma causing vLLM completions to break (#246)
* bumping version
* Final Changes from Revision for Abacus (#250)
* checking in in-flight changes
* adding code for unmatched records in left/right/outer joins
* optimization stuck
* new mmqa script is functional
* minor bugfixes
* fix naive estimates with new operators
* updated website user guides; renamed retrieve --> top-k
* add cost-based sample budget; fix rag convert and filter for str | Any fields
* pushing local mmqa experiment
* try n=20
* preparing final runs for table 2
* fix thread safety issue w/EmbeddingJoin
* adding full ablation study
* bugfixes in operators
* adding final revision work from local
* updated readme
* adding changes from berners-lee
* remove comments
* fix linting and bump version
* Blebari task 131 (#241)
* .
* .
* minor tweaks
* add embedding costs to RecordOpStats
* minor tweaks
* change comment
---------
Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-128-127.dyn.mit.edu>
Co-authored-by: Bari LeBari <barilebari@Baris-MacBook-Pro.local>
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* adding real-estate-eval-100 to download script
* adding real-estate-demo
* jczhang add model checks (#254)
* adding checks that user has support for models they need
* check if available models is empty
* trying to resolve dependency
* bump version
* gemini studio api issue (#257)
* recreating the issue
* fixing model provider for google AI studio
* add try-except back
---------
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* bump version
* fix model check
* Fix no reasoning (#270)
* enforce that setting reasoning effort to None turns of reasoning prompts; fix config copy error
* bump version
* update constants to reflect the cached-input token costs
* update GenerationStats
* update GenerationStats to include cache token/cost
* fix typo
* update stats in GenerationStats
* prompt cahing implementation
* split cache tokens into read and creation
* restructure prompt caching into PromptCacheManager class
* update CacheManager class
* caching demo
* add claude sonnet 4.0 (temporary)
* fix pretty print error for anthropic
* propagate cache-related stats from end-to-end
* fix bug for gemini model
* claude-3-7 deprecated
* fix formatting issues
* fix formatting issues
* fixing comments
* update token/cost logic to be disjoint for input and cache
* update demo
* Generalize Support for LiteLLM Models #265 (#272)
* model_info (Model -> ConfiguredModel in constants) - 265
* predictor function for unknown spec
* update full list of API keys
* add gemini3 and gpt5.2 to constants
* return models based on opt obj when models is None
* reorganize functions in model info/helper
* add tests and update model references and imports
* move validation from config to query processor
* add json file for model score/latency and update predictor function
* update model references and imports
* update dependencies and related test cases
* update Model to have both string and enum
* model_info -> model_helper
* update model usage in query config
* rollback import changes for CuratedModel -> Model
* ModelProvider class
* update all switch cases to ModelProvider when applicable
* reverted CuratedModel changes
* add test cases
* add additional test cases
* fix formatting issues
* add prompt caching stats for #262
* restructure Model class
* fix Model enum issue
* add sorting logic to model class
* use singular json file for info fetching
* expand model list and updates curated_model_info file
* restructure model info fetching, update Model class and test cases
* script to update pz_models_information and update get_optimal_models
* is_deepseek_model
* add audio cache read/creation
* remove claude sonnet 3.5 (retired)
* add deepseek-chat
* add .json files to pyproject.toml so that is packaged too
* revert uvicorn dependency
* some small tweaks
* passing tests
---------
Co-authored-by: joycequ <joycequ@mit.edu>
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* fixed model function calls
* clean up duplicate code to help with summing field stats
* update fields for classes in models.py, update usage in generators.py
* add test generation file
* add test generation file
* generator messages
* update anthropic stats
* update input/cache token stats
* remove generator messages from github repo
* update generator test cases and implement initial gemini wrapper class
* delete output audio tokens and update gemini client class
* ruff lint for test cases
* fix gemini reasoning effort bug
* fix cost and image issues
* incorporate all pr comments
* make anthropic version more flexible
* Revert "make anthropic version more flexible"
This reverts commit 8eeed6711f1d01185d75fa4ccc0a51e4e681a021.
* floatify everything
* all but two tests passing
* bump version and relax tests
* Local Model Execution (vLLM) #266 (#282)
* local vllm execution implementation
* update vllm local specs (predictors)
* more robust detection of local model capabilities
* fix formatting
* test script formatting update
* adding placeholder for vllm cache tokens
* remove prints
* remove print
* reverted type
* fix type annotation
* tests passing
---------
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* Allowing other provider than OpenAI for embeddings (#283)
* Removing hard-coded TEXT_EMBEDDING_3_SMALL in RAG and JOIN operators
* remove whitespace
* fixed embedding access in RAGFilter
* fix id/op_params for RAG ops and EmbeddingJoin; update rules to enforce CLIP cannot be used for text-only
* fix value
* unit tests passing
---------
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* fixed issue #286 and bumped version
* fix linter errors
* quest evals
* adding support for azure openai models (#292)
* adding support for azure openai models
* added warning in comment
* bump version
* fix typos
* fix linter error
---------
Co-authored-by: Tianyu Li <litianyu@mit.edu>
Co-authored-by: Jun <130543538+chjuncn@users.noreply.github.com>
Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com>
Co-authored-by: Sivaprasad Sudhir <sivaprasad2626@gmail.com>
Co-authored-by: Yash Agarwal <yash94404@gmail.com>
Co-authored-by: Yash Agarwal <yashaga@Yashs-Air.attlocal.net>
Co-authored-by: Bari Bo LeBari <143016395+lilbarbar@users.noreply.github.com>
Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-207-160.dyn.MIT.EDU>
Co-authored-by: muhamed <muhamed@mit.edu>
Co-authored-by: Tranway1 <tranway@qq.com>
Co-authored-by: Luthfi Balaka <luthfibalaka@gmail.com>
Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com>
Co-authored-by: Griffin Roupe <31631417+frostyfan109@users.noreply.github.com>
Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-128-127.dyn.mit.edu>
Co-authored-by: Bari LeBari <barilebari@Baris-MacBook-Pro.local>
Co-authored-by: Jerry Zhang <122544742+xqlcn@users.noreply.github.com>
Co-authored-by: joycequ <joycequ2016@gmail.com>
Co-authored-by: joycequu <65379523+joycequu@users.noreply.github.com>
Co-authored-by: joycequ <joycequ@mit.edu>
Co-authored-by: SoTrx <11771975+SoTrx@users.noreply.github.com>
1.4.0
Improved Caching and Support for Local Model Execution w/vLLM and Var…
1.3.4
bump version (#281)
1.3.3
bump version
1.3.2
loosen anthropic version (#278)
1.3.1
Enforce Setting Reasoning Effort to None Turns Off Reasoning Prompts …
1.3.0: Resolve Issue with Google AI Studio Models (#259)
* Fix broken dependencies (#227)
* Move DataRecord Internal Fields to Have Leading Underscore (#229)
* update README
* 1. support add_columns in Dataset; 2. support run().to_df(); 3. add demo in df-newinterface.py (#78)
* Support add_columns in Dataset. Support demo in df-newinterface.py
Currently we have to do
records, _ = qr3.run()
outputDf = DataRecord.to_df(records)
I'll try to make qr3.run().to_df() work in another PR.
* ruff check --fix
* Support run().to_df()
Update run() to DataRecordCollection, so that it will be easier for use to support more features for run() output.
We support to_df() in this change.
I'll send out following commits to update other demos.
* run check --fix
* fix typo in DataRecordCollection
* Update records.py
* fix tiny bug in mab processor.
The code will run into issue if we don't return any stats for this function in
```
max_quality_record_set = self.pick_highest_quality_output(all_source_record_sets)
if (
not prev_logical_op_is_filter
or (
prev_logical_op_is_filter
and max_quality_record_set.record_op_stats[0].passed_operator
)
```
* update record.to_df interface
update to record.to_df(records: list[DataRecord], project_cols: list[str] | None = None) which is consistent with other function in this class.
* Update demo for the new execute() output format
* better way to get plan from output.run()
* fix getting plan from DataRecordCollection.
people used to get plan from execute() of streaming processor, which is not a good practice.
I update plan_str to plan_stats, and they need to get physical plan from processor.
Consider use better ways to provide executed physical plan to DataRecordCollection, possibly from stats.
* Update df-newinterface.py
* update code based on comments from Matt.
1. add cardinality param in add_columns
2. remove extra testdata files
3. add __iter__ in DataRecordCollection to help iter over streaming output.
* see if copilot just saved me 20 minutes
* fix package name
* use sed to get version from pyproject.toml
* bump project version; keep docs behind to test ci pipeline
* bumping docs version to match code version
* use new __iter__ method in demos where possible
* add type hint for output of __iter__; use __iter__ in unit tests
* Update download-testdata.sh (#89)
Added enron-tiny.csv
* Clean up the retrieve API (#79)
* Clean up the retrieve operator interface
* fix comments
* Update to the new to_df() API
* Code update for https://github.com/mitdbg/palimpzest/issues/84 (#101)
* Create chat.rst (#96)
* Create chat.rst
* Update pyproject.toml
Hotfix for chat
* Update conf.py
Hotfix for chat.rst
* code update for https://github.com/mitdbg/palimpzest/issues/84
This implementation basically resolves https://github.com/mitdbg/palimpzest/issues/84.
One implementation is different from the #84:
.add_columns(
cols=[
{"name": "sender", "type": "string", "udf": compute_sender},
...
]
)
If add_columns() uses cols, udf, types as params, it will make this function confusing again. Instead, if users need to specify different udfs for different columns, they should just call add_columns() multiple times for different columns.
* changed types to make use of Python type system; updated use of types in tests; updated docs and README
* update test to match no longer allowing None default
---------
Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com>
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* Skip an operator if this is a duplicate op instead of raise error (#102)
* Create chat.rst (#96)
* Create chat.rst
* Update pyproject.toml
Hotfix for chat
* Update conf.py
Hotfix for chat.rst
* Skip an operator when it doesn't need any logicalOP instead of raise error
#Final Effects
1. Dataset() init only has one responsibility: wrap a datasource to a Dataset. I think this is a better interface.
2. No extra convert() will be added to the plan.
3. When users add the same op multiple times dataset.convert(File).convert(File), the system will just dedup the same op instead of raise error.
#Issue
Currently Dataset(src, schema) initiation has 2 responsibilities:
1. read source
2. convert source to schema.
When we use default schema for Dataset init(source, schema=DefaultSchema) for users, the code works like:
1. Read source to schema that DataSource provides. This schema is derived by system, so the users don't know (don't need to know).
2. Convert Source schema to DefaultSchema.
So everytime, the system will make one more convert call to convert SourceSchema to DefaultSchema, which is definitely wrong.
#Solution
1. We use schema from Datasource if exists, which is reasonable.
2. If we do 1, then we'll get a dataset node that no actual op as its input_schema ==output_schema, so I updated a line in optimizer to just skip the node if it doesn't do anything instead raiseerror.
#Real Examples
##Before
Generated plan:
0. MarshalAndScanDataOp -> PDFFile
1. PDFFile -> LLMConvertBonded -> DefaultSchema
(contents, filename, text_conte) -> (value)
Model: Model.GPT_4o
Prompt Strategy: PromptStrategy.COT_QA
2. DefaultSchema -> MixtureOfAgentsConvert -> ScientificPaper
(value) -> (contents, filename, paper_auth)
Prompt Strategy: None
Proposer Models: [GPT_4o]
Temperatures: [0.0]
Aggregator Model: Model.GPT_4o
Proposer Prompt Strategy: chain-of-thought-mixture-of-agents-proposer
Aggregator Prompt Strategy: chain-of-thought-mixture-of-agents-aggregation
3. ScientificPaper -> LLMFilter -> ScientificPaper
(contents, filename, paper_auth) -> (contents, filename, paper_auth)
Model: Model.GPT_4o
Filter: The paper mentions phosphorylation of Exo1
4. ScientificPaper -> MixtureOfAgentsConvert -> Reference
(contents, filename, paper_auth) -> (reference_first_author, refere)
Prompt Strategy: None
Proposer Models: [GPT_4o]
Temperatures: [0.8]
Aggregator Model: Model.GPT_4o
Proposer Prompt Strategy: chain-of-thought-mixture-of-agents-proposer
Aggregator Prompt Strategy: chain-of-thought-mixture-of-agents-aggregation
##After
Generated plan:
0. MarshalAndScanDataOp -> PDFFile
1. PDFFile -> LLMConvertBonded -> ScientificPaper
(contents, filename, text_conte) -> (contents, filename, paper_auth)
Model: Model.GPT_4o
Prompt Strategy: PromptStrategy.COT_QA
2. ScientificPaper -> LLMFilter -> ScientificPaper
(contents, filename, paper_auth) -> (contents, filename, paper_auth)
Model: Model.GPT_4o
Filter: The paper mentions phosphorylation of Exo1
3. ScientificPaper -> MixtureOfAgentsConvert -> Reference
(contents, filename, paper_auth) -> (reference_first_author, refere)
Prompt Strategy: None
Proposer Models: [GPT_4o]
Temperatures: [0.8]
Aggregator Model: Model.GPT_4o
Proposer Prompt Strategy: chain-of-thought-mixture-of-agents-proposer
Aggregator Prompt Strategy: chain-of-thought-mixture-of-agents-aggregation
* make equality check for new field names a bit more explicit
* fix fixture usage
* update all plans within code base to explicitly convert when needed; and removed unnecessary schemas for reading from datasource
---------
Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com>
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* Refactor demos to use .sem_add_columns or .add_columns instead of convert(), remove Schema from demos when possible. (#104)
* Create chat.rst (#96)
* Create chat.rst
* Update pyproject.toml
Hotfix for chat
* Update conf.py
Hotfix for chat.rst
* code update for https://github.com/mitdbg/palimpzest/issues/84
This implementation basically resolves https://github.com/mitdbg/palimpzest/issues/84.
One implementation is different from the #84:
.add_columns(
cols=[
{"name": "sender", "type": "string", "udf": compute_sender},
...
]
)
If add_columns() uses cols, udf, types as params, it will make this function confusing again. Instead, if users need to specify different udfs for different columns, they should just call add_columns() multiple times for different columns.
* use field_values instead of field_types as field_values have the actual values,
use field_values instead of field_types as field_values have the actual values, since field_values have the actual key-value pairs, while field_types are just contain fields and their types.
records[0].schema is the schema of the output, which doesn't mean we already populate the schema into record.
* Remove .convert() and use .sem_add_columns or .add_columns instead
This change is based on #101 and #102, please review them first then this change.
1. This is to refactor all demos to use .sem_add_columns or .add_columns, and remove .convert().
2. Remove Schema from demos, except demos using ValidationDataSource and dataset.retrieve() that need schema now. We can refactor these cases later.
* ruff check --fix
* fix unittest
* demos fixed and unit tests running
* fix add_columns --> sem_add_columns in demo
* udpate quickstart to reflect code changes; shorten text as much as possible
* passing unit tests
* remove convert() everywhere
* fixes to correct errors in demos; update quickstart and docs
---------
Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com>
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* Simplify Datasource (#103)
## Summary of PR changes
**Note 1:** I did not change anything related to val_datasource (including tangential functions like Dataset._set_data_source()) as that will all be modified in a subsequent PR to reflect our discussion re: validation data.
**Note 2:** I have completely commented out datamanager.py and config.py; for now I am willing to leave the code around in case we desperately need it for PalimpChat. However, my hope is that PalimpChat can be tweaked to work without the data manager and those files can be deleted before merging dev into main
**Note 3:** Despite the branch name, fixing the progress managers will be part of a separate PR.
- Collapsed all four `DataSource` classes down to a single `DataReader` class
- Limit the number of methods the user needs to implement to just `__len__()` and `__getitem__()`
- (Switched from using `get_item() --> __getitem__()` in `DataReader`)
- Provided `DataReader` directly to scan operators (also renamed `DataSourcePhysicalOp --> ScanPhysicalOp`
- Removed `DataDirectory()` from `src/` entirely; this included commenting out things which made use of the cache (e.g. caching computed `DataRecords` and codegen examples)
- Got rid of `dataset_id` everywhere (which tracks with the previous bullet)
- Removed the `Config` class which was a relic of a bygone era (and also intertwined with the `DataDirectory()`)
- Updated all demos to use `import palimpzest as pz` to make the import statement(s) more welcoming
- Fixed one bug resulting from converts now producing union schemas. Instead of including the `output_schema` in an operators' `get_id_params()` we simply report the `generated_fields`.
- Changed `source_id --> source_idx` everywhere (this eliminated some weird renaming logic)
- Finally, I added a large set of documentation for the DataSource class(es)
* Multi-LLM Refinement Pipeline for Query Output Validation (#118)
* Multi-LLM Refinement Pipeline for Query Output Validation (#92)
## Summary of PR
This PR contains the work to add a new `CriticConvert` physical operator to PZ. At a high-level, this operator runs a bonded convert, and then asks a critic model if the answer produced by the bonded convert can be improved upon. The original output and the critique are then fed into a refinement model, which produces the improved output.
The work to implement this includes:
1. Defining the physical operator in `src/palimpzest/query/operators/critique_and_refine_convert.py`
2. Adding an implementation rule for this physical operator in `src/palimpzest/query/optimizer/rules.py`
3. Adding boolean flag(s) to enable allowing / disallowing this physical optimization
4. Adding base prompts for the critique and refinement generations
One other change which this work spawned was an attempt to improve the management and construction of our prompts -- and to decouple this logic from the `BaseGenerator` class. On the management side, I split our single `prompts.py` file into a set of files. On the construction side, I created a `PromptFactory` class which templates prompts based on the `prompt_strategy` and input record.
The `PromptFactory` is not a perfect solution, but I think it is a step in the right direction.
Finally, I fixed an error which previously filtered out `RAGConvert` operators from being considered by the `Optimizer`, and I made 2-3 more miscellaneous small tweaks.
---------
Co-authored-by: Yash Agarwal <yash94404@gmail.com>
Co-authored-by: Yash Agarwal <yashaga@Yashs-Air.attlocal.net>
* MkDocs Site for Palimpzest API Documentation (#116)
## Summary of PR Changes
1. Changed `docs` to use [MkDocs](https://www.mkdocs.org/) instead of Sphinx
2. Created initial `Getting Started` content
3. Created placeholders for `User Guide` content (to follow in a subsequent PR)
4. Added autogenerated docs for our most user-facing code (we will need to add docstrings to our code in a subsequent PR)
5. Made small tweaks to `src/` to allow users to specify policy using kwargs in `.run()`
6. Renamed the `testdata/enron-tiny/` files so that they're not so damn weird
---------
Co-authored-by: Yash Agarwal <yash94404@gmail.com>
Co-authored-by: Yash Agarwal <yashaga@Yashs-Air.attlocal.net>
* remove registration of sources from CI; only check version bump if there is a code change
* remove filter for only checking version bump when src files changed
* Rename `nocache` --> `cache` everywhere (#128)
* first commit
* Removed myenv
* added to git ignore
* addressed the comments in review
* flip one minor comment
* minor spacing fix
* fix spaces in a few more spots
---------
Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-207-160.dyn.MIT.EDU>
Co-authored-by: muhamed <muhamed@mit.edu>
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* adding citation (and making 'others' explicit) (#136)
* Make Generator thread-safe (#139)
* fix moa prompt
* fix moa prompt aggregator
* update version
* make generator thread-safe
* update generator to return messages
* address comments
* Begin Process of Improving Index Abstraction(s) in PZ (#138)
* quick and dirty implementation which tracks retrieve costs
* bug fixes and currently unused index code
* add default search func which I forgot to implement and add chromadb to pyproject.toml
* leaving TODO
* hotfix to add cost for retrieve operation
* another hotfix to add ragatouille dependency
* Add logger for PZ (#134)
* add logger for PZ
1. When verbose=True, we save all logs to log_file and print them on console;
2. when verbose=False, we only save ERROR+ log to file and print ERROR+.
I just add logging to somewhere I think might be important for the execution, we always can add/remove for more or less.
Also I might update the logging message based on my later annotation work. But this PR should setup the logging mechanism for now.
* ruff fix
* update code based on comments
1. not logging output_records
2. not logging plan_stats
3. make the files to ".pz_logs"
---------
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* fix merge bug (#141)
* ruff fix
* update log dir and fix tiny bug
* fix merge bug
* Use a singleton API client for operators (#140)
* fix moa prompt
* fix moa prompt aggregator
* update version
* make generator thread-safe
* update generator to return messages
* address comments
* create a singleton API client
* fix linting
* fix logging in generators
* also create parent dir. if missing
* CUAD benchmark (#143)
* fix moa prompt
* fix moa prompt aggregator
* update version
* make generator thread-safe
* update generator to return messages
* address comments
* create a singleton API client
* fix linting
* fix logging in generators
* fix CUAD benchmarlk
* fix type
* minor fixes
* Limit the Scope of Logging within the Optimizer (#144)
* making it possible to set log level based on env. variable; adding time limit on seven filters test
* deleting instead of commenting out
* Remove Conventional LLM Convert; Update Bonded LLM Convert retry logic (#145)
* use NullHandler in __init__ and let application control logging config (#146)
* use NullHandler in __init__ and let application control logging config
* ruff fix
* Fix Progress Manager and Simplify `execute_plan` methods (#148)
* modifying ProgressManager class to allow for dynamically adding tasks
* beginning to use new progress manager
* initial rewrite of execute_plan methods with new progress manager
* unit tests passing
* trim a few lines
* unit tests passing; changes applied everywhere; MAB and Random coming in a separate PR
* enable final operator to show progress in parallel
* address comments
* The great deletion (#149)
* Adding Preliminary Work on Abacus and MAB Sentinel Execution (#147)
* updating models to avoid llama3
* fix parsing bugs and some generation errors
* don't require json for proposer and code synth generations; fix prompt format instruction for proposers
* fix typo/bug
* fix bugs in generator prep for field_answers; fix bug in filter impl.; other improvements
* adding new file for abacus workload
* fix len
* fix errors with dataset copy; prompt construction; and more
* remove JSON instruction from MOA proposer
* fixed bugs in optimizer configuration, llama 3.3 generation, and filter generation
* clean up demos; fix missing base prompt from map
* add one more missing base prompt
* prepare demo for full run; get embedding cost info from RAGConvert; use reasoning output from Critique
* add script to generate text-embedding-3-small reaction embeddings
* write to .chroma
* run full scale generation
* compute embeddings slowly and add progress bar
* add sleep
* fix import
* add total iters
* create embeddings before ingesting
* fix index start and finish
* load embeddings and insert directly
* make chroma use cosine sim.; finish initial search fcn. for biodex workload; naming tweak in rag convert
* capturing gen stats in Retrieve
* added UDF map operator; rewrote biodex pipeline to match docetl impl.; switched to using __name__ for functions instead of str()
* add optimizations back in
* write data to csv in demo
* limit to same model choice(s) as docetl and lotus
* fix punctuation error(s)
* try run without filter
* remove unused demo file
* remove print
* remove prints
* remove costed_phys_op_ids which were used for debugging
* try slightly diff. approach
* remove temp changes while branch is in PR review
* remove depends_on for map
* fix iteration bug in sentinel processors
* one more hotfix
* fix more errors w/SentinelPlanStats and sentinel processors
* remove logger lib to reduce confusion (#159)
* Update research.md (#160)
AISD @ NAACL 2025
* Add Pneuma-Palimpzest Integration Demo (#158)
* Add Pneuma demo
* Remove dataset semantic column addition
* Fix progress managers episode 2 attack of the clones (#156)
* modifying ProgressManager class to allow for dynamically adding tasks
* beginning to use new progress manager
* initial rewrite of execute_plan methods with new progress manager
* unit tests passing
* trim a few lines
* unit tests passing; changes applied everywhere; MAB and Random coming in a separate PR
* enable final operator to show progress in parallel
* initial work to refactor sentinel processors
* passing unit tests
* checking in minor changes
* remove use of setup_logger inside library
* stuff seems to be working
* big print
* turn off rag for test
* try debugging exception
* checking in code before changes to scoring
* finished initial refactoring of mab sentinel execution strategy
* get random sampling execution working with changes
* passing unit tests
* nosentinel progress looks good
* eyeball test is working for progress bars
* remove the old gods
* revert small change
* pull up progress manager logic in parallel execution
* catch errors in generating embeddings
* fix comments
* Merging in Changes for Sentinel Progress Bars; Split Convert (off by default); `demos/enron-demo.py`; and MMQA Benchmark (#163)
* modifying ProgressManager class to allow for dynamically adding tasks
* beginning to use new progress manager
* initial rewrite of execute_plan methods with new progress manager
* unit tests passing
* trim a few lines
* unit tests passing; changes applied everywhere; MAB and Random coming in a separate PR
* enable final operator to show progress in parallel
* initial work to refactor sentinel processors
* passing unit tests
* checking in minor changes
* remove use of setup_logger inside library
* stuff seems to be working
* big print
* turn off rag for test
* try debugging exception
* checking in code before changes to scoring
* finished initial refactoring of mab sentinel execution strategy
* get random sampling execution working with changes
* passing unit tests
* nosentinel progress looks good
* eyeball test is working for progress bars
* remove the old gods
* revert small change
* pull up progress manager logic in parallel execution
* adding prints to generator; turn progress off in favor of verbose for now
* catch errors in generating embeddings
* inspect frontier updates
* remove args.workload
* fix num_inputs in selectivity computation
* pdb in score
* fixed score fn issue
* use execution cache to avoid unnecessary computation; use sentinel stats for updating frontier
* fix progress counter
* debug
* fix empty stats
* only count stats from newly computed results
* fix tuple unpacking
* only update sample counts for llm ops
* de-dup duplicate record
* ugh
* dont forget to increment
* plz
* more plz
* increment
* recycle ops back onto reservoir so they may be reconsidered in the future
* remove pdb
* add progress to script args
* try without rag
* use term recall
* just check in on term recall
* make it easier to turn off progress
* remove pdb
* try to get re-rank to keep all inputs
* try to generate more reactions
* track total LLM calls
* 10x parallelism
* try retrieve directly on fulltext
* up max workers
* adding enron-demo w/optimization
* remove config option
* adding recall and precision to output
* allow operators to be recycled back onto frontier
* revert to using reactions instead of fulltext for similarity
* better cycling of off-frontier operators
* safety check on reservoir ops
* remove pdb
* fixing 5 results per query
* investigate sampling behavior
* check on seeds
* remove pdb
* test SplitConvert
* debug chunking
* fix bug in rag and split convert
* run with chunks
* test chunking logic
* fix chunking logic
* sum list
* remove split merge for now
* minor fixes to CUAD script
* add embedding scripts for mmqa tables and image titles
* address issue with empty titles and title collisions
* prepare script for using clip embeddings for images
* fix bug
* get full space of possible extensions
* debug
* weird bug fix?
* more debug
* fix idiotic mistake
* handle corrupted images and minor things
* add another corrupted image
* another one
* anotha
* more bad images
* last disallow file
* prepare cuad for runs
* specify execution strategy
* up samples
* add sentinel execution strategy to output name
* adding plan str and more stats
* specify no prior
* verbose=False
* fix comment; comment out prints
* make split merge optional for now
* addressing comments
* applying syntax changes to pneuma demo and supporting strings within retrieve
* bump version; fix lint; fix docs
* more docs tweaks; tweaking dependencies
* fix install issues
* one more version fix
* one more version fix
* one more version fix
* one more version fix
* last try
* change runner python version
* actually changing runner python version
* increase time limit for runners
* increase time limit for runners
* Merge in Changes From Final Abacus Work (WIP) (#173)
* modifying ProgressManager class to allow for dynamically adding tasks
* beginning to use new progress manager
* initial rewrite of execute_plan methods with new progress manager
* unit tests passing
* trim a few lines
* unit tests passing; changes applied everywhere; MAB and Random coming in a separate PR
* enable final operator to show progress in parallel
* initial work to refactor sentinel processors
* passing unit tests
* checking in minor changes
* remove use of setup_logger inside library
* stuff seems to be working
* big print
* turn off rag for test
* try debugging exception
* checking in code before changes to scoring
* finished initial refactoring of mab sentinel execution strategy
* get random sampling execution working with changes
* passing unit tests
* nosentinel progress looks good
* eyeball test is working for progress bars
* remove the old gods
* revert small change
* pull up progress manager logic in parallel execution
* adding prints to generator; turn progress off in favor of verbose for now
* catch errors in generating embeddings
* inspect frontier updates
* remove args.workload
* fix num_inputs in selectivity computation
* pdb in score
* fixed score fn issue
* use execution cache to avoid unnecessary computation; use sentinel stats for updating frontier
* fix progress counter
* debug
* fix empty stats
* only count stats from newly computed results
* fix tuple unpacking
* only update sample counts for llm ops
* de-dup duplicate record
* ugh
* dont forget to increment
* plz
* more plz
* increment
* recycle ops back onto reservoir so they may be reconsidered in the future
* remove pdb
* add progress to script args
* try without rag
* use term recall
* just check in on term recall
* make it easier to turn off progress
* remove pdb
* try to get re-rank to keep all inputs
* try to generate more reactions
* track total LLM calls
* 10x parallelism
* try retrieve directly on fulltext
* up max workers
* adding enron-demo w/optimization
* remove config option
* adding recall and precision to output
* allow operators to be recycled back onto frontier
* revert to using reactions instead of fulltext for similarity
* better cycling of off-frontier operators
* safety check on reservoir ops
* remove pdb
* fixing 5 results per query
* investigate sampling behavior
* check on seeds
* remove pdb
* test SplitConvert
* debug chunking
* fix bug in rag and split convert
* run with chunks
* test chunking logic
* fix chunking logic
* sum list
* remove split merge for now
* minor fixes to CUAD script
* add embedding scripts for mmqa tables and image titles
* address issue with empty titles and title collisions
* prepare script for using clip embeddings for images
* fix bug
* get full space of possible extensions
* debug
* weird bug fix?
* more debug
* fix idiotic mistake
* handle corrupted images and minor things
* add another corrupted image
* another one
* anotha
* more bad images
* last disallow file
* prepare cuad for runs
* specify execution strategy
* up samples
* add sentinel execution strategy to output name
* adding plan str and more stats
* specify no prior
* verbose=False
* fix comment; comment out prints
* make split merge optional for now
* addressing comments
* applying syntax changes to pneuma demo and supporting strings within retrieve
* add prints
* debug sample sets
* checking in code before tweaks to mab
* state of repo after running final Abacus experiments
* revert to opt-profiling-data
* removing print statement
* remove prints
* final fixes
* removing ragatouille dependency
* fix ruff lint checks
* bump version
* passing tests locally
* remove pdb
* fix complaint about match
* Move Abacus Research Scripts into Separate Folder (#175)
* re-organizing abacus research-related scripts
* fix model selection and other tweaks
* add data download script
* bump version
* remove scripts from root
* removing python files which were merged back in from main
* Fixed Issue(s) with Aggregate Operator Computation for Movie Queries (WIP) (#182)
* queries 1-4 working for movies
* removing RandomSampling
* Create `Context` Class + `compute` and `search` operators (#186)
* checking in changes
* refactored Dataset
* checking in
* checking in
* checking in
* queries extract final answer now
* checking in changes w/search operator
* adding changes to agents
* add isinstance checks to all executors
* removing script
* remove tools; include in future PR
* Remove `pz.Schema` in Favor of Using `pydantic.BaseModel` (#188)
* made changes throughout codebase and updated unit tests
* checking in; debugging failure with image use case
* simple demo / paper demos working
* eliminate caching features (#195)
* removing all code synthesis (#198)
* removing all code synthesis
* remove unused import
* Using LiteLLM to Manage Generator Clients / Completion APIs (#200)
* use LiteLLM for generators
* remove unused function; add TODO
* Added Anthropic Support; Simplified Rules; Removed Redundant Model Helpers (#202)
* changes after simplifying rules
* passing unit tests; removed unnecessary model helpers
* simplified primitives slightly
* fixing the assertion which used FieldInfo instead of FieldInfo.annotation (#204)
* add support for o4-mini, gemini-2.5-pro, gemini-2.0-flash, llama-4-maverick (#205)
* Adding Semantic Join Operator (#206)
* initial changes to support validator class; fixed bug in generator for images
* adding validator based optimization
* validator agent example working
* using o1 model; made validation more efficient
* added initial nested loops join implementation
* passing tests
* unit tests passing
* unit tests passing
* enron-demo.py working
* join demos in place
* parallel join and other bugfixes (#207)
* audio-demo (#208)
* remove pdb
* adding option to only use gemini models in audio demo
* adding parallelism; fixed bug w/unique_logical_op_id (#209)
* fixed issue which removed pipelined execution of operators in parallel setting (#210)
* Movie bugfixes (#211)
* fixed error in cost computation for gemini models; tested join on movie queries
* make join count monotonic
* removing progress bar updates for join for now
* adding reasoning effort (#212)
* made progress manager more efficient; made join op calculations accurate (#213)
* make groupby ignore None values
* make it possible to specify schema for MemoryDataset; reasoning model fixes
* adding audio-only match in substitution (#214)
* quick fix for audio prompt missing in MoA
* support passing in gemini/vertex credentials path; fix minor bugs in audio generation (#216)
* adding Distinct operator to PZ (#217)
* masking filepaths for sembench; fix audio pricing (#218)
* make GroupBySig a pz. import
* remove email demo
* reproduce abacus results
* add notes about deprecation to scripts for generating priors
* remove unsupported demos
* sem_add_columns -> sem_map
* Dev staging (#220)
* edit cuad abacus scripts to use loacl data
* edit cuad abacus scripts to use local data
* edit cuad abacus scripts to use local data
* fix: cuad data loader doesn't work via huggingface anymore (#215)
* edit cuad abacus scripts to use loacl data
* edit cuad abacus scripts to use local data
* edit cuad abacus scripts to use local data
---------
Co-authored-by: mdr223 <mdrusso@mit.edu>
---------
Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com>
* adding early support for vllm models
* changes to appease linter
* remove models now that we have access to gpt-5
* only perform time check on local; CI runners are slow
* Support google api and desc (#222)
* support shreya models and re-support desc
* adding gpt-5-nano to gpt-5 models
* bump version
* fixed merge error
* fixing bug where id column in schema overrides DataRecord.id
---------
Co-authored-by: Jun <130543538+chjuncn@users.noreply.github.com>
Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com>
Co-authored-by: Sivaprasad Sudhir <sivaprasad2626@gmail.com>
Co-authored-by: Yash Agarwal <yash94404@gmail.com>
Co-authored-by: Yash Agarwal <yashaga@Yashs-Air.attlocal.net>
Co-authored-by: Bari Bo LeBari <143016395+lilbarbar@users.noreply.github.com>
Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-207-160.dyn.MIT.EDU>
Co-authored-by: muhamed <muhamed@mit.edu>
Co-authored-by: Tranway1 <tranway@qq.com>
Co-authored-by: Luthfi Balaka <luthfibalaka@gmail.com>
Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com>
* Add Optimizations for Filter and Join Operators (#230)
* rename files to reflect that they will contain filter and map physical operators
* passing map unit tests
* passing filter tests
* finished tests
* adding tests for joins and initial embedding join
* adding vllm test
* fixed embedding join
* filter for filepaths instead of assert
* add embedding cost
* fixed full hashes bug with deep copy
* bump version
* undo linting change
* Reorder bug (#232)
* fixing map/filter/join tests for CI which doesn't have GEMINI access; adding test for real estate bug
* added exploration to re-order converts
* separate lack of gemini from ci tests
* Data Record Refactor (#233)
* Refactor DataRecord to hold data in the BaseModel member instead of separately.
* Some type fixes
* local unit tests passing
* enforce data record id uses list of schema fields
* remove unused code from copy
* use function instead of class internals
---------
Co-authored-by: Tianyu Li <litianyu@mit.edu>
* Updating Website to Use Docusaurus (#234)
* adding docusaurus website; still haven't updated doc content and home page
* fix links at bottom of page
* updated pages for website; docs are still not auto-rendered
* updating ci pipelines
* update path to package
* update node version
* update package
* fix build commands
* fix trigger
* fix runner and import
* fix some DataRecord inits
* switch to running llms w/separate flag b/c one test can fail due to bad generation
* changes to be more flexible on types for abacus scripts
* guessing at fix for build path
* removing old website
* remove commented ci code
* remove mkdocs from pyproject
* remove prints
* fix location of CNAME file
* Opt fixes (#236)
* fixed errors in optimizer
* added palimpchat page
* passing unit tests
* also relax types on train datasets
* bump version
* try lowercasing c
* fixed route
* eliminate slowdown from stringifying sentinel plan(s)
* bump version
* allow enron demo to swap filters w/convert
* remove print statements in validator and fix bug introduced for bytes fields
* bump version
* adding min and max
* fixing assertion error
* fix no reasoning prompt templating issue(s)
* add semantic aggregation operator
* bump version
* fix mock call in unit test
* add google analytics tracking
* Updated Website User Guide(s); Renamed `retrieve()` --> `sem_topk()` (#244)
* checking in in-flight changes
* adding code for unmatched records in left/right/outer joins
* optimization stuck
* new mmqa script is functional
* minor bugfixes
* fix naive estimates with new operators
* updated website user guides; renamed retrieve --> top-k
* fix defaults for join op
* bumping version
* fix documentation links
* Add Cost-Based Sample Budget; Fix RAGConvert/Filter for `str | Any` Types (#247)
* checking in in-flight changes
* adding code for unmatched records in left/right/outer joins
* optimization stuck
* new mmqa script is functional
* minor bugfixes
* fix naive estimates with new operators
* updated website user guides; renamed retrieve --> top-k
* add cost-based sample budget; fix rag convert and filter for str | Any fields
* Fix missing comma causing vLLM completions to break (#246)
* bumping version
* Final Changes from Revision for Abacus (#250)
* checking in in-flight changes
* adding code for unmatched records in left/right/outer joins
* optimization stuck
* new mmqa script is functional
* minor bugfixes
* fix naive estimates with new operators
* updated website user guides; renamed retrieve --> top-k
* add cost-based sample budget; fix rag convert and filter for str | Any fields
* pushing local mmqa experiment
* try n=20
* preparing final runs for table 2
* fix thread safety issue w/EmbeddingJoin
* adding full ablation study
* bugfixes in operators
* adding final revision work from local
* updated readme
* adding changes from berners-lee
* remove comments
* fix linting and bump version
* Blebari task 131 (#241)
* .
* .
* minor tweaks
* add embedding costs to RecordOpStats
* minor tweaks
* change comment
---------
Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-128-127.dyn.mit.edu>
Co-authored-by: Bari LeBari <barilebari@Baris-MacBook-Pro.local>
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* adding real-estate-eval-100 to download script
* adding real-estate-demo
* jczhang add model checks (#254)
* adding checks that user has support for models they need
* check if available models is empty
* trying to resolve dependency
* bump version
* gemini studio api issue (#257)
* recreating the issue
* fixing model provider for google AI studio
* add try-except back
---------
Co-authored-by: Matthew Russo <mdrusso@mit.edu>
* bump version
* fix model check
---------
Co-authored-by: Tianyu Li <litianyu@mit.edu>
Co-authored-by: Jun <130543538+chjuncn@users.noreply.github.com>
Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com>
Co-authored-by: Sivaprasad Sudhir <sivaprasad2626@gmail.com>
Co-authored-by: Yash Agarwal <yash94404@gmail.com>
Co-authored-by: Yash Agarwal <yashaga@Yashs-Air.attlocal.net>
Co-authored-by: Bari Bo LeBari <143016395+lilbarbar@users.noreply.github.com>
Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-207-160.dyn.MIT.EDU>
Co-authored-by: muhamed <muhamed@mit.edu>
Co-authored-by: Tranway1 <tranway@qq.com>
Co-authored-by: Luthfi Balaka <luthfibalaka@gmail.com>
Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com>
Co-authored-by: Griffin Roupe <31631417+frostyfan109@users.noreply.github.com>
Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-128-127.dyn.mit.edu>
Co-authored-by: Bari LeBari <barilebari@Baris-MacBook-Pro.local>
Co-authored-by: Jerry Zhang <122544742+xqlcn@users.noreply.github.com>