Data Warehouse Developer Resume
Chicago, IL
SUMMARY
- Around 7+ years of total experience in IT industry which includes recent experience in Big Data Hadoop Ecosystem like Hive, Pig, Spark - streaming and data warehousing tool.
- Experience with mapping, creation, designing, analysis, design implementation and support of application software based on client requirement, developing with n-Tier architecture solutions with distributed components of internet/intranet.
- Around 4.5+ years of hands-on experience in working on Apache Hadoop ecosystem components like Map-Reduce, Sqoop, Flume, Spark, Pig, Hive, HBase, Oozie, Kafka, and Zookeeper.
- Excellent understanding/ knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Map Reduce.
- Experience in analyzing and cleansing raw data using HiveQL, Pig Latin.
- Knowledge in job work-flow/ sub work flows scheduling and monitoring tools like Oozie.
- Experience in different Hadoop distributions like Cloudera 5.3(CDH4, CHD 5).
- Experience in optimization of MapReduce algorithm using Combiners and Partitioners to deliver efficient results.
- Hands on experience with the Spark SQL for complex data transformations using Scala programming language.
- Experienced in using Kafka as a distributed publisher-subscriber messaging system.
- Experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems and vice-versa.
- Experience in data transformations using Map-Reduce, HIVE,Pig scripts and Spark for different file formats.
- Expertise in analyzing the data using HIVE and writing custom UDF's in JAVA for extended HIVE and PIG core functionality.
- Hands on experience in using BI tools like Splunk/Hunk, Tableau.
- Used JIRA for tracking the project modules.
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Solid understanding of HDFS Designs, Daemons and HDFS high availability (HA).
- Experience with various scripting languages like Linux/Unix shell scripts, Python.
- Experience in understanding and managing Hadoop Log Files.
- Experience in managing the Hadoop infrastructure with Cloudera Manager.
- Experienced with data warehousing and ETL processes.
- Expert knowledge of data warehousing concepts, with hands-on in developing ETL applications in a dimensional data mart/data warehouse environment.
- Involved in creating MVC architecture using java, validating files, Struts frame Work.
- Participated in Review of Test Plan, Test Cases and Test Scripts prepared by system integration testing team.
- Monitored the performance and identified performance bottlenecks in ETL code.
- Strong experience in client interaction and understanding business application, business data flow and data relations.
- Have flair to adapt to new software applications and products, self-starter, have excellent communication skills and good understanding of business work flow.
TECHNICAL SKILLS
Scripting/ Languages: Shell Scripting, Unix script, Pig Latin, HiveQL.
Big Data Technologies: HDFS, MapReduce, Hive, Hue, Pig, Sqoop, Flume, Spark, Zookeeper, Oozie, Kafka, Impala,Hbase
RDBMS: MySQL, Oracle, Teradata, MSSQL
No Sql Databases: Hbase, Mark Logic, Cassandra
Programming language: Python, Scala, SQL, Java
IDE’s: NetBeans, Eclipse
Tools: Maven, Tableau
Virtual Machines: VMWare, Virtual Box
Hadoop Distributions: Cdh3,cdh5
Software Methodologies: SDLC(water Fall)/Agile, Scrum
PROFESSIONAL EXPERIENCE
Hadoop Developer
Confidential - Hoffman Estate, IL
Responsibilities:
- Worked with the source team to understand the format & delimiters of the data files.
- Responsible for generating actionable insights from complex data to drive real business results for various application teams.
- Developed spark scripts by using Scala shell as per requirements.
- Developed and implemented API services using Scala in spark.
- Troubleshoot and resolve data quality issues and maintain high level of data accuracy in the data being reported.
- Extensively implemented POC’s on migrating to Spark-Streaming to process the live data.
- Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
- Re-writing existing map-reduce jobs to use new features and improvements for achieving faster results.
- Analyzes the large amount of data sets to determine optimal way to aggregate and report on it.
- Performance tuned slow running resource intensive jobs.
- Worked on Data serialization formats for converting complex objects into sequence bits by using Avro, Parquet, Json, CSV formats.
- Hands on experience working on in-memory based application Apache Spark for ETL transformations.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
- Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Developed Flume configuration to extract log data from different resources and transfer data with different file formats (JSON, XML, Parquet) to hive tables using different SerDe’s.
- Setup Oozie work flow /sub work flow jobs for Hive/Sqoop/HDFS actions.
- Experience in accessing Kafka cluster to consume data into Hadoop.
- Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Involved in some product business and functional requirement through gathering team, update the user comments in JIRA and documentation in confluence
- Worked with team project manager on daily basis to complete project scheduling and admin tasks.
- Handled tasks like maintaining accurate roadmap for project or certain product.
- Monitoring the sprints, burndown charts and completing the monthly reports.
Environment: • Hive • SQL •Pig • Flume •Kafka • Sqoop • Scala• Java •Shell Scripting •Unix Scripting •Spark • Teradata •Oracle •Oozie •Java •Cassandra
HADOOP DEVELOPER
Confidential, Columbus, OH
Responsibilities:
- Worked on Spark SQL, Reading/Writing data from JSON file, text file, parquet file, Schema RDD.
- Identifying data sources and create appropriate data ingestion procedures.
- Transformed the data using Spark, Hive, Pig for BI team in order to perform visual analytics according to the client requirement.
- Populate big data customer Marketing data structures.
- Performed complex joins on tables in hive with various optimization techniques.
- Implemented lateral view in conjunction with UDFs in Hive according to the client requirement.
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Performed ETL Procedures on the data HDFS.
- Connected hive and impala to tableau reporting tool and generated graphical reports.
- Worked extensively with HIVE DDLS and Hive Query language(HQLs).
- Developed PIG Latin for handling business transformations and Responsible writing PIG script and Hive queries for data processing.
- Extend Hive and Pig core functionality by writing custom UDFs using Java.
- Developed HBase tables to store variable data format of input data coming from different portfolios.
- Involved in adding huge volume of data in rows and columns to store in Hbase.
- Created and maintained technical documentation for launching Hadoop cluster and for executing Hive queries and Pig scripts.
- Involved in moving all transaction files generated from various sources to HDFS for further processing through flume.
- Developed Sqoop jobs to extract data from Teradata/oracle into HDFS.
- Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Automated all the jobs, for pulling data from FTP server to load data into Hive tables using Oozie workflows.
- Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
Environment: • Hive • SQL •Pig • Flume •Kafka • Map reduce • Sqoop •Python•Tableau• Java •Shell Scripting •Unix Scripting •Spark • Teradata •Oracle •Oozie •Java•Cassandra
Data Warehouse Developer
Confidential - Columbus, OH
Responsibilities:
- Designed jobs involving various cross reference lookups and joins, shared containers which can be used in multiple jobs.
- Sequencers are created at job level to include multiple jobs and a layer level sequence which include all job level sequences.
- Loading Data into Dimension tables.
- Extensively employed Data Stage Director to validate, run, schedule, monitor the jobs and followed job log carefully to debug the jobs.
- Carefully monitored the performance statistics and involved in fine tuning of jobs for the improved processing time.
- Involved in developing UNIX scripts to call Data stage jobs.
- Responsible for integrating the Datastage jobs with ETL Control framework and Tivoli, scheduling.
- Involved in fine tuning, trouble shooting, bug fixing, defect analysis and enhancement of the multiple admin systems Data stage jobs.
- Involved in the designing of marts and dimensional and fact tables.
Environment: • Data stage 7.5 • Teradata •Main Frames • SQL •ETL
Jr.JAVA DEVELOPER
Confidential - Kalamazoo, MI
Responsibilities:
- The application was developed in J2EE using an MVC based architecture.
- Implemented MVC design using Struts1.3 frameworks, JSP custom tag Libraries and various in-house custom tag libraries for the presentation layer.
- Created tile definitions, Struts-config files, validation files and resource bundles for all modules using Struts framework.
- Wrote prepared statements and called stored Procedures using callable statements in MySQL.
- Executed SQL queries to perform crud operations on customer records.
- Used Apache web sphere as the application server for deployment.
- Used Web services for transmission of large blocks of XML data over HTTP.
Environment: • Java • JSP •Mysql • Struts •Tomcat Web Server • Html • XML • Eclipse •CSS
Support Engineer
Confidential - Chicago, IL
Responsibilities:
- Developed and executed shell scripts to automate the jobs.
- Wrote complex Hive queries and UDFs.
- Worked on reading multiple data formats on HDFS using Hive using SerDe’s.
- Wrote Pig scripts to run ETL jobs on the data in HDFS.
- Used Hive to do analysis on the data and identify different correlations.
- Written lots PIG UDF to process some complex data.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
Environment: • Hive • SQL •Pig • Flume • MYSQL