Subscribe to Vincent Granville's Weekly Digest:

How to data merging with not matching condtion.


I have two data sets data_A and data_B. data_A consists of costumer demographic information and data_B have costumer transactional information. costumer_ID is the common variable in both data set. now my question is how can i do data merging where costumer_ID not match in both data sets.

Please anyone help....

Views: 129

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Tom Wolfer on October 11, 2010 at 11:54am
Vincent. I am not sure, but, I think that this would only work if customer id in both tables matched. And, I believe that Prashant mentioned that this was not the case.
Comment by Vincent Amoruccio on October 7, 2010 at 7:01pm
I agree with the gentleman who said to use SQL. Since this seems to be a 1:M (one to many) join you could also do a LEFT or RIGHT join.

For example:

proc sql;
create table combine as
select distinct *
from data_B as b
left join
data_A as a
on b.costumer_ID = a.costumer_ID;
quit;

You could also achieve similiar results through a data step, depending on the order for which you put the data in the step.

Best,

Vince
Comment by Michael Bain on September 22, 2010 at 5:00pm
IF this in a Ansi compliant SQL database with your data you can do a UNION to find the nonmatched "customer_IDs" (or a FULL OUTER JOIN in Oracle will also work).

If you want to insert into new table and don't need to understand from which table the original data was from ==> it's probably easiest to do two simple insert statements with LEFT Joins on unmatched data as follows.

I suggest you ask your DBA to generate queries. Here's an example.


INSERT INTO Table3 (unmatched_costumer_ID)
SELECT A.costumer_ID
FROM
data_A AS A
LEFT JOIN data_B AS B
ON A.costumer_ID = B.costumer_ID
WHERE B.costumer_ID IS NULL;

INSERT INTO Table3 (unmatched_costumer_ID)
SELECT A.costumer_ID
FROM
data_B AS A
LEFT JOIN data_A AS B
ON A.costumer_ID = B.costumer_ID
WHERE B.costumer_ID IS NULL;
Comment by Tom Wolfer on September 21, 2010 at 5:26pm
Yes, that will work too.
Comment by Prashant on September 21, 2010 at 9:13am
Tom,

Thanks for your input,

Actually i want to create new table which represent both tables(data_A and data_B) information where data_A.costumer_ID~=data_B.costumer_ID in single step.
Comment by Tom Wolfer on September 21, 2010 at 8:17am
Here is one approach that you can take. It can either be done in steps, manually, or on the fly virtually with code to produce only the query result:

STEP 1: Create a new, 1 character boolean variable, 'Customer_Match' in the Customer Transaction table

STEP 2: Write a little code so that, if the Customer ID in the Customer Transaction table matches a Customer ID in the Customer Demographics table, the Customer_Match field is populated with a value of 'Y'.

STEP 3: Now, do a query on the Customer Transaction table to select all transaction records for which the Customer_Match value is NOT equal to 'Y': this will select only transaction records for which there is no Customer ID match in the Customer Demographics table.

Is this what you want?

Follow us

© 2013   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC

Badges  |  Report an Issue  |  Terms of Service