Merge master into branch

Yijia-Chen · Yijia-Chen · commit a774abcb75e4 · 2019-03-10T19:51:54.000-07:00
diff --git a/.travis.yml b/.travis.yml
@@ -5,14 +5,15 @@ os:
 julia:
   - 0.7
   - 1.0
+  - 1.1
   - nightly
 notifications:
   email: false
 branches:
   only:
   - master
   - /release-.*/
-  - /v(\d+)\.(\d+)\.(\d+)/   
+  - /v(\d+)\.(\d+)\.(\d+)/
 matrix:
   allow_failures:
   - julia: nightly
@@ -22,7 +23,7 @@ after_success:
 jobs:
   include:
     - stage: "Documentation"
-      julia: 1.0
+      julia: 1.1
       os: linux
       script:
         - julia --project=docs/ -e 'using Pkg; Pkg.instantiate(); Pkg.develop(PackageSpec(path=pwd()))'
diff --git a/NEWS.md b/NEWS.md
@@ -1,5 +1,5 @@
 # Query.jl v0.11.0 Release Notes
-* Add @select, @rename and @mutate standalone macros
+* Add @unique, @select, @rename and @mutate standalone macros
 * Fix all doctest errors
 * Various bugfixes
 
diff --git a/README.md b/README.md
@@ -27,7 +27,7 @@ any array,
 
 The package currently provides working implementations for in-memory data sources, but will eventually be able to translate queries into e.g. SQL. There is a prototype implementation of such a "query provider" for [SQLite](https://github.com/JuliaDB/SQLite.jl) in the package, but it is experimental at this point and only works for a *very* small subset of queries.
 
-Query is heavily inspired by [LINQ](https://msdn.microsoft.com/en-us/library/bb397926.aspx), in fact right now the package is largely an implementation of the [LINQ](https://msdn.microsoft.com/en-us/library/bb397926.aspx) part of the [C# specification](https://msdn.microsoft.com/en-us/library/ms228593.aspx). Future versions of Query will most likely add features that are not found in the original [LINQ](https://msdn.microsoft.com/en-us/library/bb397926.aspx) design.
+Query is heavily inspired by [LINQ](https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/index), in fact right now the package is largely an implementation of the [LINQ](https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/indexq) part of the [C# specification](https://msdn.microsoft.com/en-us/library/ms228593.aspx). Future versions of Query will most likely add features that are not found in the original [LINQ](https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/index) design.
 
 ## Alternatives
 [Query.jl](https://github.com/queryverse/Query.jl) is not the only julia initiative for querying data, there are many other packages that have similar goals. Take a look at [DataFramesMeta.jl](https://github.com/JuliaStats/DataFramesMeta.jl), [StructuredQueries.jl](https://github.com/davidagold/StructuredQueries.jl), [LazyQuery.jl](https://github.com/bramtayl/LazyQuery.jl) and [SplitApplyCombine.jl](https://github.com/JuliaData/SplitApplyCombine.jl). *If I missed other initiatives, please let me know and I'll add them to this list!*
diff --git a/REQUIRE b/REQUIRE
@@ -3,4 +3,4 @@ TableTraits 0.3.1
 IterableTables 0.8.2
 DataValues 0.4.4
 MacroTools 0.4.4
-QueryOperators 0.5.1
+QueryOperators 0.6.0
diff --git a/appveyor.yml b/appveyor.yml
@@ -1,7 +1,8 @@
 environment:
   matrix:
   - julia_version: 0.7
-  - julia_version: 1
+  - julia_version: 1.0
+  - julia_version: 1.1
   - julia_version: nightly
 
 platform:
diff --git a/docs/Project.toml b/docs/Project.toml
@@ -6,4 +6,4 @@ Query = "1a8c2f83-1ff3-5112-b086-8aa67b057ba1"
 TypedTables = "9d95f2ec-7b3d-5a63-8d20-e2491e220bb9"
 
 [compat]
-Documenter = "~0.20"
+Documenter = "~0.21"
diff --git a/docs/make.jl b/docs/make.jl
@@ -3,11 +3,12 @@ using Documenter, Query
 makedocs(
 	modules = [Query],
 	sitename = "Query.jl",
+	analytics="UA-132838790-1",
 	pages = [
 		"Introduction" => "index.md",
 		"Getting Started" => "gettingstarted.md",
 		"Standalone Query Commands" => "standalonequerycommands.md",
-		"LINQ Style Query Commands" => "linqquerycommands.md",		
+		"LINQ Style Query Commands" => "linqquerycommands.md",
 		"Data Sources" => "sources.md",
 		"Data Sinks" => "sinks.md",
 		"Experimental Features" => "experimental.md",
diff --git a/docs/src/experimental.md b/docs/src/experimental.md
@@ -6,136 +6,9 @@ deal with significant changes to these features in future versions of
 Query.jl. At the same time any feedback on these features would be
 especially welcome.
 
-The `@map`, `@filter`, `@groupby`, `@orderby` (and various variants),
-`@groupjoin`, `@join`, `@mapmany`, `@take` and `@drop` commands can be used in standalone
-versions. Those standalone versions are especially convenient in
-combination with the pipe syntax in julia. Here is an example that
-demonstrates their use:
+## Source as the first argument to standalone query commands
 
-```julia
-using Query, DataFrames, Statistics
-
-df = DataFrame(a=[1,1,2,3], b=[4,5,6,8])
-
-df2 = df |>
-    @groupby(_.a) |>
-    @map({a=key(_), b=mean(_.b)}) |>
-    @filter(_.b > 5) |>
-    @orderby_descending(_.b) |>
-    DataFrame
-```
-
-This example makes use of three experimental features: 1) the standalone
-query commands, 2) the `.` syntax and 3) the `_` anonymous function syntax.
-
-## Standalone query operators
-
-All standalone query commands can either take a source as their first
-argument, or one can pipe the source into the command, as in the above
-example. For example, one can either write
-
-```julia
-df = df |> @groupby(_.a)
-```
-or
-```julia
-df = @groupby(df, _.a)
-```
-both forms are equivalent.
-
-The remaining arguments of each query demand are command specific.
-
-The following discussion will present each command in the version that
-accepts a source as the first argument.
-
-### The `@map` command
-
-The `@map` command has the form `@map(source, element_selector)`.
-`source` can be any source that can be queried. `element_selector` must
-be an anonymous function that accepts one element of the element type of
-the source and applies some transformation to this single element.
-
-### The `@filter` command
-
-The `@filter` command has the form `@filter(source, filter_condition)`.
-`source` can be any source that can be queried. `filter_condition` must
-be an anonymous function that accepts one element of the element type of
-the source and returns `true` if that element should be retained, and
-`false` if that element should be filtered out.
-
-### The `@groupby` command
-
-There are two versions of the `@groupby` command. The simple version has
-the form `@groupby(source, key_selector)`. `source` can be any source
-that can be queried. `key_selector` must be an anonymous function that
-returns a value for each element of `source` by which the source elements
-should be grouped.
-
-The second variant has the form `@groupby(source, key_selector, element_selector)`.
-The definition of `source` and `key_selector` is the same as in the simple
-variant. `element_selector` must be an anonymous function that is applied
-to each element of the `source` before that element is placed into a group,
-i.e. this is a projection function.
-
-The return value of `@groupby` is an iterable of groups. Each group is itself a
-collection of data rows, and has a `key` field that is equal to the value the
-rows were grouped by. Often the next step in the pipeline will be to use `@map`
-with a function that acts on each group, summarizing it in a new data row.
-
-### The `@orderby`, `@orderby_descending`, `@thenby` and `@thenby_descending` command
-
-There are four commands that are used to sort data. Any sorting has to
-start with either a `@orderby` or `@orderby_descending` command. `@thenby`
-and `@thenby_descending` commands can only directly follow a previous sorting
-command. They specify how ties in the previous sorting condition are to be
-resolved.
-
-The general sorting command form is `@orderby(source, key_selector)`.
-`source` can be any source than can be queried. `key_selector` must be an
-anonymous function that returns a value for each element of `source`. The
-elements of the source are then sorted is ascending order by the value
-returned from the `key_selector` function. The `@orderby_descending`
-command works in the same way, but sorts things in descending order. The
-`@thenby` and `@thenby_descending` command only accept the return value
-of any of the four sorting commands as their `source`, otherwise they have
-the same syntax as the `@orderby` and `@orderby_descending` commands.
-
-### The `@groupjoin` command
-
-The `@groupjoin` command has the form `@groupjoin(outer, inner, outer_selector, inner_selector, result_selector)`.
-`outer` and `inner` can be any source that can be queried. `outer_selector`
-and `inner_selector` must be an anonymous function that extracts the value
-from the outer and inner source respectively on which the join should
-be run. The `result_selector` must be an anonymous function that takes two
-arguments, first the element from the `outer` source, and second an array
-of those elements from the second source that are grouped together.
-
-### The `@join` command
-
-The `@join` command has the form `@join(outer, inner, outer_selector, inner_selector, result_selector)`.
-`outer` and `inner` can be any source that can be queried. `outer_selector`
-and `inner_selector` must be an anonymous function that extracts the value
-from the outer and inner source respectively on which the join should
-be run. The `result_selector` must be an anonymous function that takes two
-arguments. It will be called for each element in the result set, and the
-first argument will hold the element from the outer source and the second
-argument will hold the element from the inner source.
-
-### The `@mapmany` command
-
-The `@mapmany` command has the form `@mapmany(source, collection_selector, result_selector)`.
-`source` can be any source that can be queried. `collection_selector` must
-be an anonymous function that takes one argument and returns a collection.
-`result_selector` must be an anonymous function that takes two arguments.
-It will be applied to each element of the intermediate collection.
-
-### The `@take` command
-
-The `@take` command has the form `@take(source, n)`. `source` can be any source that can be queried. `n` must be an integer, and it specifies how many elements from the beginning of the source should be kept.
-
-### The `@drop` command
-
-The `@drop` command has the form `@drop(source, n)`. `source` can be any source that can be queried. `n` must be an integer, and it specifies how many elements from the beginning of the source should be dropped from the results.
+Some standalone query commands accept the source argument as the first argument, in addition to accepting it via the pipe operator. For example, `source |> @map(_)` and @map(source, _)` are equivalent. These source-as-the-first-argument versions of the standalone query operators are considered experimental and might disappear in future releases.
 
 ## The `_` and `__` syntax
 
@@ -157,3 +30,21 @@ df_children = DataFrame(Name=["Bill", "Joe", "Mary"], Parent=["John", "John", "S
 
 df_parents |> @join(df_children, _.Name, _.Parent, {Parent=_.Name, Child=__.Name}) |> DataFrame
 ```
+
+## Key selector in the `@unique` standalone command
+
+As an experimental feature, one can specify a key selector for the `@unique` command. In that case uniqueness is tested based on that key.
+
+```jldoctest
+using Query
+
+source = [1,-1,2,2,3]
+
+q = source |> @unique(abs(_)) |> collect
+
+println(q)
+
+# output
+
+[1, 2, 3]
+```
diff --git a/docs/src/gettingstarted.md b/docs/src/gettingstarted.md
@@ -4,7 +4,7 @@ Query.jl supports two different front-end syntax options: 1) standalone query op
 
 ## Standalone query operators
 
-The standalone query operators are typically combined into more complicated queries via the pipe operator. The example from the previous section can also be written like this, using the `@filter` and `@map` standalone query operators:
+The standalone query operators are typically combined into more complicated queries via the pipe operator. Probably the most simple example is a query that filters a DataFrame and returns a subset of its columns:
 
 ```jldoctest
 using Query, DataFrames
@@ -37,7 +37,7 @@ q = @from <range variable> in <source> begin
 end
 ```
 
-Multiple `<query statements>` are separated by line breaks. Probably the most simple example is a query that filters a `DataFrame` and returns a subset of its columns:
+Multiple `<query statements>` are separated by line breaks. The example from the previous section can also be written like this using LINQ style queryies:
 
 ```jldoctest
 using Query, DataFrames
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -4,7 +4,7 @@
 
 Query is a package for querying julia data sources. It can filter, project, join, sort and group data from any iterable data source, including all the sources that support the [TableTraits.jl](https://github.com/queryverse/TableTraits.jl) interface (this includes everything listed in [IterableTables.jl](https://github.com/queryverse/IterableTables.jl)).
 
-Query is heavily inspired by [LINQ](https://msdn.microsoft.com/en-us/library/bb397926.aspx) and [dplyr](https://dplyr.tidyverse.org/).
+Query is heavily inspired by [LINQ](https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/index) and [dplyr](https://dplyr.tidyverse.org/).
 
 ## Installation
 
diff --git a/docs/src/standalonequerycommands.md b/docs/src/standalonequerycommands.md
@@ -18,21 +18,6 @@ df2 = df |>
 
 ## Standalone query operators
 
-All standalone query commands can either take a source as their first argument, or one can pipe the source into the command, as in the above example. For example, one can either write
-
-```julia
-df = df |> @groupby(_.a)
-```
-or
-```julia
-df = @groupby(df, _.a)
-```
-both forms are equivalent.
-
-The remaining arguments of each query demand are command specific.
-
-The following discussion will present each command in the version where a source is piped into the command.
-
 ## The `@map` command
 
 The `@map` command has the form `source |> @map(element_selector)`. `source` can be any source that can be queried. `element_selector` must be an anonymous function that accepts one element of the element type of the source and applies some transformation to this single element.
@@ -114,7 +99,7 @@ println(x)
 
 There are four commands that are used to sort data. Any sorting has to start with either a `@orderby` or `@orderby_descending` command. `@thenby` and `@thenby_descending` commands can only directly follow a previous sorting command. They specify how ties in the previous sorting condition are to be resolved.
 
-The general sorting command form is `source |> @orderby(key_selector)`. `source` can be any source than can be queried. `key_selector` must be an anonymous function that returns a value for each element of `source`. The elements of the source are then sorted is ascending order by the value returned from the `key_selector` function. The `@orderby_descending` command works in the same way, but sorts things in descending order. The `@thenby` and `@thenby_descending` command only accept the return value of any of the four sorting commands as their `source`, otherwise they have the same syntax as the `@orderby` and `@orderby_descending` commands.
+The general sorting command form is `source |> @orderby(key_selector)`. `source` can be any source than can be queried. `key_selector` must be an anonymous function that returns a value for each element of `source`. The elements of the source are then sorted is in ascending order by the value returned from the `key_selector` function. The `@orderby_descending` command works in the same way, but sorts things in descending order. The `@thenby` and `@thenby_descending` command only accept the return value of any of the four sorting commands as their `source`, otherwise they have the same syntax as the `@orderby` and `@orderby_descending` commands.
 
 #### Example
 
@@ -262,6 +247,26 @@ println(q)
 [4, 5]
 ```
 
+## The `@unique` command
+
+The `@unique` command has the form `source |> @unique()`. `source` can be any source that can be queried. The command will filter out any duplicates from the input source. Note that there is also an experimental version of this command that accepts a key selector, see the experimental section in the documentation.
+
+#### Exmample
+
+```jldoctest
+using Query
+
+source = [1,1,2,2,3]
+
+q = source |> @unique() |> collect
+
+println(q)
+
+# output
+
+[1, 2, 3]
+```
+
 ## The `@select` command
 
 The `@select` command has the form `source |> @select(selectors...)`. `source` can be any source that can be queried. Each selector of `selectors...` can either select elements from `source` and add them to the result set, or select elements from the result set and remove them. A selector may select or remove an element by name, by position, or using a predicate function. All `selectors...` are executed in order and may not commute.
diff --git a/src/Query.jl b/src/Query.jl
@@ -8,7 +8,7 @@ using QueryOperators
 
 export @from, @query, @count, Grouping, key
 
-export @map, @filter, @groupby, @orderby, @orderby_descending,
+export @map, @filter, @groupby, @orderby, @orderby_descending, @unique,
 	@thenby, @thenby_descending, @groupjoin, @join, @mapmany, @take, @drop
 
 export @select, @rename, @mutate
diff --git a/src/standalone_query_macros.jl b/src/standalone_query_macros.jl
@@ -237,3 +237,15 @@ end
 macro drop(n)
     return :( i -> QueryOperators.drop(QueryOperators.query(i), $(esc(n))))
 end
+
+macro unique()
+    return :( i -> QueryOperators.unique(QueryOperators.query(i), q->q, :(q->q))) |>
+        helper_namedtuples_replacement
+end
+
+macro unique(f)
+    f_as_anonym_func = helper_replace_anon_func_syntax(f)
+    q = Expr(:quote, helper_replace_anon_func_syntax(f_as_anonym_func))
+    return :( i -> QueryOperators.unique(QueryOperators.query(i), $(esc(f_as_anonym_func)), $(esc(q)))) |>
+        helper_namedtuples_replacement
+end
diff --git a/test/test_standalone.jl b/test/test_standalone.jl
@@ -44,4 +44,11 @@ end
     @test df2[:c] == ["b","c"]
 end
 
+@testset "@unique operator" begin
+    df = DataFrame(a=[1,2,1], b=[3.,3.,3.])
+
+    @test df |> @unique() |> collect == [(a=1,b=3.), (a=2,b=3.)]
+    @test df |> @unique(_.b) |> collect == [(a=1,b=3.)]
+end
+
 end