Etl Developer Resume
4.00/5 (Submit Your Rating)
TECHNICAL SKILLS
- PL/SQL, Apache Hive
- Python
- Linux shell script
- Oracle 11g, DataGuard, Streams, ETL, RMAN, SQL Server 2008
- Python Experience
- For the final project on my degree I chose to work in python and Hadoop technologies. I created a platform that collects data from job posting websites and analyzes relationship between keywords and display trends of the job market. Something similarly to itjobwatch.co.uk. Using beautifulsoup, peewee, pandas, NLTK modules among others. I developed web crawlers agents in python in Linux machines that collects data regularly. I developed a process in python that tokenizes the words of the job description. Using Pandas I was able to perform analysis over the datasets to create aggregation data that reflects trends. I used Hadoop HIVE to execute queries over the raw data.
- For a research project I had to detect the language of 4 billion websites using NLTK module. In order to finish earlier I cleaned up the list of urls and discarded non interesting websites using a personal modified version of the Easylist regular expression list. I created a Hadoop cluster in order to execute the process of cleaning, downloading, processing text, language detection using the Hadoop Streaming.
- For a music industry research I created an algorithm to match music artists between several datasets: musicBrainz, iTunes, Facebook, torrents, Gracenote, radios... I also had to developed several scripts to download information from APIS: iTunes, Facebook and gracenote.
- Processed network graphs using NetworkX. Classification, clustering, plotting networks to RDBMS and searching for out layers
- Data analysis using Pandas.
- Connected to RDBMS using cx oracle and peewee modules.
- I use python on the day to day jobs at work. Especially to automate jobs if it’s possible. For instance, at my present work I developed a python program where it can be easily loaded a data file into an Oracle table. I used for all my one time loads to database.
- Off topic: I use python for my home tasks. For instance, I created a python program that normalizes mp3 tags and download the album cover from google images. The album cover is selected form the most repeated similar image in the results using OpenCV module.
- Regular attendee in Boston Python Group meetups. The largest python users group worldwide.
PROFESSIONAL EXPERIENCE
ETL Developer
Confidential
Responsibilities:
- Consolidating all bank information in several data warehouses. Including Debit, and credit cards operations, loans, mortgages, accounts, customers, contracts, checks, deposits, Money orders, transfers, ATMS,...
- Developing ETL processes in PL/SQL and Linux scripting to feed several data warehouses for a Money Laundering Analytics project.
- Developing Python scripts to deal with quick one time problems with data files.
- Manipulating data using Python or PL/SQL for increasing data quality.
- Developing hot fixes for a legacy Java ETL application.
- Business analytics tasks over the Bank’s data, optimization of SQL queries.
- Data analysis of Bank information for data quality and fixing data issues.
Data Architect / Developer
Confidential
Responsibilities:
- Harvesting music likeliness among users.
- Creation of several Hadoop clusters with Hive installed.
- Using data from Facebook, iTunes, Torrents, MusicBrainz, Gracenote and Nielsen stored in Hadoop cluster.
- Data migration to Oracle using PL/SQL, python, bash scripting.
- Creation of a python algorithm for Artist and songs matching from several databases using machine learning techniques. The algorithm learns new patterns of matching between music artists and applies them to future artists matching.
- Helping data scientists using python numeric and scientific libraries.
- Work with business teams and client architects to define data requirements, and SLA’s.
- Researching on websites’ language visited by European citizens
- Using python over Hadoop detect the language of a about 4billion websites. Modifying the famous ‘Easylist’ regular expression to improve the process by deleting non wanted websites.
- Creating graph databases from structured databases that contains users Internet behavior.
Additional tasks:
- Oracle 11g DBA
- Define and apply data architecture policy across the institute’s databases.
- Give technical support about big data and databases on projects.
- Red Hat Sysadmin.
Oracle DBA /Developer
Confidential
Responsibilities:
- Migrating a dbase application to Oracle & C++ Application.
- DB Development: Oracle PL/SQL, C++
- DBA: Oracle 11g, RAC 11g, Data Guard, Streams, RMAN.
- Data migration to Oracle using PL/SQL, xBase language and ETL tools.
- Data Modeling
- Analysis and modeling of Business Process: BPMN 2.0.
- Project Management for the company’s Information Security Management System UNE - ISO/IEC 27001.
- Lead system administration: Red Hat, Oracle EL and Windows Server.
- Elaborating the company’s Annual Report.
Intern systems administrator
Confidential .
Responsibilities:
- Multi parallel processing Research
- Providing MySQL database solutions to researchers for their heavily computational process. - Implementing the department's public and Intranet websites using Drupal and Joomla.
- LAMP, Drupal and Joomla administrator.
- Network administrator.
- Java, JSP & JSTL development.
Co-Founder
Confidential
Responsibilities:
- Tenerife Lan Party 2006, 2007 and 2008 (tlp-tenerife.com)
- Network Engineer and Project Management.
- Designing the events network for around 1300 computer and devices.
- Writing large projects for the Public Administration and private enterprises.
- Searching and negotiating sponsorship with first line international I.T. brands.
- Designing the marketing and publishing the events over web and billboards.