Wednesday, November 19, 2014

Adress Database & Data Management

We have a Dataset of 500'000 records from different sources. These records have to be normalized and assigned to categories, according to NOGA 2008 (General Classification of Economic Activities). For more information about NOGA, see this link: [url removed, login to view]

The goal is to store all raw records in a database and assign it to minimum one NOGA-Category. The result must be possible to find all records related to a category-search. Each record MUST match a category and must be assigned accordingly. The raw records are stored in a CSV (see attachment). The attributes are Name\tZip\tLocality\tCategories. The Data-File is anonymized. Some lines are empty, they have also to be categorized. If more than one category matches, they have to be separated by a pipe "|"
This project is one work package out of many and can be considered as entry-point for more work.

0 comments:

Post a Comment