Joining DataFrames: Difference between revisions

From QPR ProcessAnalyzer Wiki
Jump to navigation Jump to search
(Created page with "Join function for DataFrame performs a [https://en.wikipedia.org/wiki/Relational_algebra#Joins_and_join-like_operators joining operation] between two DataFrames. Parameters:...")
 
No edit summary
 
(4 intermediate revisions by the same user not shown)
Line 2: Line 2:


Parameters:
Parameters:
# '''DataFrame''': The other DataFrame to join.
# '''DataFrame''': Other DataFrame in the join.
# '''Columns to match''': Columns which the joining is based on, can be defined as follows:
# '''Columns to match''': Columns which the joining is based on, can be defined as follows:
#* If joining using '''one column having the same name in both DataFrames''', the column name is specified as as string.
#* If joining using '''one column having the same name in both DataFrames''', the column name is specified as as string.
Line 12: Line 12:
#* '''rightouter''': at least one row is generated for each right side DataFrame row, even if there is no matching other row (in that case ''null'' is given as value for the other columns).
#* '''rightouter''': at least one row is generated for each right side DataFrame row, even if there is no matching other row (in that case ''null'' is given as value for the other columns).
#* '''outer''': at least one row is generated both for the left and right side DataFrames even if there is no matching other row (in that case ''null'' is given as value for the other columns).
#* '''outer''': at least one row is generated both for the left and right side DataFrames even if there is no matching other row (in that case ''null'' is given as value for the other columns).
#* '''cross''': One row is generated for each combination of the joined DataFrames. For the cross join, the columns to match is not defined at all. Note that the cross join might be a slow operation as the number of resulting rows tend to grow.


Examples:
Example: inner join:
<pre>
<pre>
let left = ToDataFrame([[0, "zero"], [1, "one"]], ["id", "left"]);
let left = ToDataFrame([[0, "zero"], [1, "one"]], ["id", "left"]);
let right = ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "right"]);
let right = ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "right"]);
left.join(right, "id").ToCsv()
left.join(right, "id").ToCsv();
Returns:
Returns:
id;left;right
id;left;right
0;zero;zero
0;zero;zero
</pre>


Example: outer join with matching column names:
<pre>
let left = ToDataFrame([[0, "zero"], [1, "one"]], ["id", "left"]);
let left = ToDataFrame([[0, "zero"], [1, "one"]], ["id", "left"]);
let right = ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "right"]);
let right = ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "right"]);
left.join(right, "id", "leftouter").ToCsv()
left.join(right, "id", "leftouter").ToCsv();
Returns:
Returns:
id;left;right
id;left;right
0;zero;zero
0;zero;zero
1;one;
1;one;
</pre>


Example: inner join with different column names (one column pair):
<pre>
let left = ToDataFrame([[0, 0, "zerozeroleft"], [0, 1, "zeroleft"], [1, 2, "oneleft"]], ["idleft1", "idleft2", "left"]);
let left = ToDataFrame([[0, 0, "zerozeroleft"], [0, 1, "zeroleft"], [1, 2, "oneleft"]], ["idleft1", "idleft2", "left"]);
let right = ToDataFrame([[0, 0, "zerozeroright"], [0, 1, "zeroright"], [2, 3, "tworight"], [3, 4, "threeright"]], ["idright1", "idright2", "right"]);
let right = ToDataFrame([[0, 0, "zerozeroright"], [0, 1, "zeroright"], [2, 3, "tworight"], [3, 4, "threeright"]], ["idright1", "idright2", "right"]);
left.join(right, ["idleft1": "idright1"], "inner");
left.join(right, ["idleft1": "idright1"], "inner").ToCsv();
Returns:
Returns:
idleft1;idleft2;left;idright2;right
idleft1;idleft2;left;idright2;right
Line 39: Line 46:
0;1;zeroleft;0;zerozeroright
0;1;zeroleft;0;zerozeroright
0;1;zeroleft;1;zeroright
0;1;zeroleft;1;zeroright
</pre>


Example: inner join with different column names (two column pairs):
<pre>
let left = ToDataFrame([[0, 0, "zerozeroleft"], [0, 1, "zeroleft"], [1, 2, "oneleft"]], ["idleft1", "idleft2", "left"]);
let left = ToDataFrame([[0, 0, "zerozeroleft"], [0, 1, "zeroleft"], [1, 2, "oneleft"]], ["idleft1", "idleft2", "left"]);
let right = ToDataFrame([[0, 0, "zerozeroright"], [0, 1, "zeroright"], [2, 3, "tworight"], [3, 4, "threeright"]], ["idright1", "idright2", "right"]);
let right = ToDataFrame([[0, 0, "zerozeroright"], [0, 1, "zeroright"], [2, 3, "tworight"], [3, 4, "threeright"]], ["idright1", "idright2", "right"]);
left.join(right, ["idleft1": "idright1", "idleft2": "idright2"], "inner");
left.join(right, ["idleft1": "idright1", "idleft2": "idright2"], "inner").ToCsv();
Returns:
Returns:
idleft1;idleft2;left;right
idleft1;idleft2;left;right
0;0;zerozeroleft;zerozeroright
0;0;zerozeroleft;zerozeroright
0;1;zeroleft;zeroright
0;1;zeroleft;zeroright
</pre>
Example: cross join:
<pre>
let left = ToDataFrame([[1, "one"], [2, "two"]], ["left number", "left text"]);
let right = ToDataFrame([[1], [2], [3]], ["right number"]);
left.join(right, null, "cross").ToCsv();
Returns:
left number;left text;right number
1;one;1
1;one;2
1;one;3
2;two;1
2;two;2
2;two;3
</pre>
</pre>

Latest revision as of 20:14, 21 February 2024

Join function for DataFrame performs a joining operation between two DataFrames.

Parameters:

  1. DataFrame: Other DataFrame in the join.
  2. Columns to match: Columns which the joining is based on, can be defined as follows:
    • If joining using one column having the same name in both DataFrames, the column name is specified as as string.
    • If joining using several columns having the same names in both DataFrames, the column names are specified as a string array.
    • If joining using columns having different names between the DataFrames, columns are specified as an array of key-value pairs, where the key is the column name in the left side DataFrame, and value is the column name in the right side DataFrame.
  3. Join type which can be
    • inner (default): row is generated if both DataFrames have the key.
    • leftouter: at least one row is generated for each left side DataFrame row, even if there is no matching other row (in that case null is given as value for the other columns).
    • rightouter: at least one row is generated for each right side DataFrame row, even if there is no matching other row (in that case null is given as value for the other columns).
    • outer: at least one row is generated both for the left and right side DataFrames even if there is no matching other row (in that case null is given as value for the other columns).
    • cross: One row is generated for each combination of the joined DataFrames. For the cross join, the columns to match is not defined at all. Note that the cross join might be a slow operation as the number of resulting rows tend to grow.

Example: inner join:

let left = ToDataFrame([[0, "zero"], [1, "one"]], ["id", "left"]);
let right = ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "right"]);
left.join(right, "id").ToCsv();
Returns:
id;left;right
0;zero;zero

Example: outer join with matching column names:

let left = ToDataFrame([[0, "zero"], [1, "one"]], ["id", "left"]);
let right = ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "right"]);
left.join(right, "id", "leftouter").ToCsv();
Returns:
id;left;right
0;zero;zero
1;one;

Example: inner join with different column names (one column pair):

let left = ToDataFrame([[0, 0, "zerozeroleft"], [0, 1, "zeroleft"], [1, 2, "oneleft"]], ["idleft1", "idleft2", "left"]);
let right = ToDataFrame([[0, 0, "zerozeroright"], [0, 1, "zeroright"], [2, 3, "tworight"], [3, 4, "threeright"]], ["idright1", "idright2", "right"]);
left.join(right, ["idleft1": "idright1"], "inner").ToCsv();
Returns:
idleft1;idleft2;left;idright2;right
0;0;zerozeroleft;0;zerozeroright
0;0;zerozeroleft;1;zeroright
0;1;zeroleft;0;zerozeroright
0;1;zeroleft;1;zeroright

Example: inner join with different column names (two column pairs):

let left = ToDataFrame([[0, 0, "zerozeroleft"], [0, 1, "zeroleft"], [1, 2, "oneleft"]], ["idleft1", "idleft2", "left"]);
let right = ToDataFrame([[0, 0, "zerozeroright"], [0, 1, "zeroright"], [2, 3, "tworight"], [3, 4, "threeright"]], ["idright1", "idright2", "right"]);
left.join(right, ["idleft1": "idright1", "idleft2": "idright2"], "inner").ToCsv();
Returns:
idleft1;idleft2;left;right
0;0;zerozeroleft;zerozeroright
0;1;zeroleft;zeroright

Example: cross join:

let left = ToDataFrame([[1, "one"], [2, "two"]], ["left number", "left text"]);
let right = ToDataFrame([[1], [2], [3]], ["right number"]);
left.join(right, null, "cross").ToCsv();
Returns:
left number;left text;right number
1;one;1
1;one;2
1;one;3
2;two;1
2;two;2
2;two;3