Talend Etl Developer, Data Engineer, Bigdata Consultant Resume
Santa Clara, CaliforniA
PROFESSIONAL SUMMARY:
- Expertise in design, development and implementation of Enterprise Data Warehouse solutions using MediationZone DigitalRoute and Talend ETL BigData Integration suite version 6.2
- Experience in integrating Talend Open Studio with Hadoop, Hive, Spark and MySQL.
- Designed and Developed Slowly changing Dimensional Modeling mappings to load Data to Dimensions and Fact tables
- Strong experience in leveraging Apache Sqoop, Spark to migrate data from RDBMS system into Hadoop cluster environment HBase and Hive
- Strong experience in implementation of Star Schemas, Snowflake Schemas using Dimensional Data Modeling.
TECHNICAL SKILLS/TOOLS:
ETL Technologies: Talend OpenStudio 6.2 Bigdata Edition, DigitalRoute Billing Mediation
Hadoop Technologies: Cloudera CDH 5.0, Spark 1.6 - 2.0, Sqoop, Pig
NoSQL: MongoDB, HBase, Cassandra
Relational Databases: MySQL, MS SqlServer, Oracle
Modeling Methodology: NoSQL, OLAP
Programing Languages: Core Java, J2EE
Scripting Languages: Python, Shell Script, HTML, JavaScript
Source Code Control: Github
WORK EXPERIENCE:
Confidential, Santa Clara, California
Talend ETL Developer, Data Engineer, Bigdata Consultant
Technologies used: Talend OpenStudio 6.2 Bigdata Edition, Spark SQL, MySQL, Hive, PySpark, Beautiful Soup, Pandas, Hadoop.
Responsibilities:
- Designed and developed a data pipeline to collect data from multiple sources and inject it to Hadoop, Hive data lake using Talend Bigdata, Spark.
- Extracted toxic and hazardous substances data from the internet and injected to MySQL using BeautifulSoup, Python. Integrated the new pipeline with existing ETL framework
- Implemented user behavior, engagement, retention and sales analytics data lake.
- Created ETL pipelines to process data from Segment, MongoDB and multiple MYSQL shards.
- Implemented pipeline using PySpark. Also used Talend spark components
Confidential, San Jose, California
Talend ETL developer
Technologies: Talend ETL OpenStudio, Tableau BI tool
Responsibilities:
- Design, develop, and deploy convergent mediation platform for data collection and billing process using Talend ETL
- Collaborate with product owner to improve the process efficiency and effectiveness.
- Teamed with IT/Engineering to maintain and enhance the harness network data system to deliver business insight.
Confidential, Palo Alto, California
Software Engineer
Responsibilities:
- Develop automated tool to collect data and test REST API web server using python.
- Write python scripts and ad-hoc query to generate weekly statistical report.
- Installed and configured 5 nodes hadoop cluster using cloudera manager
- Load 10GB Foursquare data into hive
- Wrote hive queries to analyze Foursquare data
- Presenting the total number of check-in broken down by day of week, and gender
- Finding top 25 check-in locations in the Confidential
- Finding top 25 locations that attract women/men