Big Data Engineer Resume
Phoenix, AZ
SUMMARY:
- Over all 8+ years of professional IT experience with over 4 Years of Big Data experience in ingestion, storage, querying, processing and analysis.
- Excellent understanding and hands on experience in HDFS, HIVE and Hadoop eco system tools including Pig, Sqoop.
- Used Hive for data analysis, Sqoop for data migration, Flume for data ingestion, Oozie for scheduling and ZooKeeper for coordinating cluster resources.
- Experience in developing Pig scripting for data processing on HDFS.
- Experience in writing HiveQL queries to store processed data into Hive tables for analysis.
- Experience in optimizing the hive queries by modifying the hive configuration files.
- Experience in scheduling jobs using OOZIE workflow.
- Experience in deploying and managing the Hadoop cluster using Cloudera Manager.
- Involved in design and development of various web and enterprise applications using various technologies like XML and Amazon Web Services.
- Good experience in databases like SQL Server and MySQL and good at database design, creating Tables, views, Stored Procedures, Functions, Triggers and Indexes.
- Excellent interpersonal skills, good experience in interacting with clients with good team player and problem - solving skills.
- Experience in using different file formats: Avro, Parquet, RCFile, JSON, SequenceFile .
- Hands on experience in Shell scripting and Python.
- Created HBase tables to store various data formats as input coming from different sources.
- Strong experience in all the phases of SDLC including requirements gathering, analysis, design, implementation, deployment and support.
- Ability to blend technical expertise with strong Conceptual, Business and Analytical skills to provide quality solutions.
TECHNICAL SKILLS:
Big data/Hadoop Ecosystem: HDFS, Map Reduce, Crunch, HIVE, PIG, HBase, Sqoop, Flume, Oozie and Avro
Programming Languages: C, Scala, SQL, PL/SQL, Linux shell scripts.
NoSQL Databases: HBase
Database: SQL Server 2008, MySQL .
Tools: Used: Eclipse, IntelliJ, GIT, Putty, Winscp, Cygwin
Operating System: Ubuntu (Linux), Win 95/98/2000/XP, Mac OS.
ETL Tools: SSIS
Testing: Hadoop Testing, Hive Testing, Quality Center (QC)
Monitoring and Reporting tools: Tableau, Custom Shell scripts.
PROFESSIONAL EXPERIENCE:
Confidential, Phoenix, AZ
Big Data Engineer
Responsibilities:
- Involved in creating Business case and Functional requirement docs for Hadoop Ecosystem
- Involved in installing Hadoop Ecosystem components i.e. CDH4,
- Apache Hadoop 2.0, Python 2.7.5, etc
- Created Cognos dashboards on top of HDFS for VIP customer lifestyle analysis
- Used to manage and review the Hadoop log files.
- Responsible to manage data coming from different sources.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in HDFS maintenance and loading of structured and unstructured data.
- Installed and configured Pig and also written PigLatin scripts.
- Wrote MapReduce job using Pig Latin.
- Involved in managing and reviewing Hadoop log files.
- Imported data using Sqoop to load data from Teradata to HDFS on regular basis.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Written Hive queries for data analysis to meet the business requirements.
- Creating Hive tables and working on them using Hive QL.
- Used Remedy for bug tracking, issue tracking, and project management
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: Hadoop, Map Reduce, HDFS, Sqoop, FLUME, Oozie, HBASE, Apache Spark, WinScp, UNIX Shell Scripting, HIVE, PIG, Cloudera.
Confidential, Austin, TX
Big Data Developer
Responsibilities:
- Processed data into HDFS by developing solutions, analyzed the data using MapReduce programs produce summary results from Hadoop to downstream systems
- Developed Map Reduce jobs using Map Reduce Java API and HIVEQL.
- Writing Map Reduce program and implementing different design patterns
- Developed Sqoop scripts to extract the data from MYSQL and load into HDFS
- Applied Hive quires to perform data analysis on HBase using Storage Handler to meet the business requirements
- Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
- Writing Hive Queries to Aggregate Data that needs to be pushed to the Cassandra Tables.
- Developing Scripts and Batch Job to schedule abundle (group of coordinators) which consists of various Hadoop Programs using Oozie
- Implemented dynamic partitions, bucketing, sequence files, Multi Insert queries, compression techniques.
- Experienced in using Avro data serialization system to handle Avro data files in map reduce programs.
- Implemented optimized joins to gather data from different data sources using Map reduce joins.
- Experienced in optimizing hive queries, joins to handle different data sets.
- Configured oozie schedulers to handle different Hadoop actions on timely basis.
- Involved in ETL, Data Integration and Migration by writing pig scripts.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts
Environment: HDFS, Hive, SQL Server 2008, SQL, PL/SQL, oozie, MYSql, UNIX Shell Scripting.
Confidential, Rancho Cordova, CA
Data Engineer
Responsibilities:
- Worked extensively on Data warehousing, extensively used SQL Server an ETL tool to design and packages to move data from Source to Target database-using Stages.
- Obtained detailed understanding of data sources, Flat files and Complex Data Schema.
- Designed parallel jobs using various stages like Aggregator, Join, Transformer, Sort, Merge, Filter and Lookup, Sequence, ODBC, Hash file.
- Worked extensively on Slowly Changing Dimensions using CDC stage.
- Broadly involved in Data Extraction, Transformation and Loading (ETL process) from Source to target systems using ssis.
- Generating Surrogate ID’s for the dimensions in the fact table for indexed, faster data access.
- To reduce the response time, aggregated the data, data conversion and cleansed the large chunks of data in the process of transformation.
- Involved in creating technical documentation for source to target mapping procedures to facilitate better understanding of the process and in corporate changes as and when necessary.
- Successfully Integrated data across multiple and high volumes of data sources and target applications.
- Automation of ETL processes using ssis Job Sequencer and Transform functions.
- Extensively used ssis Director for Job Scheduling, emailing production support for troubleshooting from LOG files.
- Optimized job performance by carrying out Performance Tuning Methods.
- Used Autosys for scheduling the jobs.
- Strictly followed the change control methodologies while deploying the code from QA to Production
Environment: SQL Server 2008, SQL, PL/SQL, Autosys 4.5, Visio, UNIX Shell Scripting, ssis.
Confidential
Sr. System Analyst
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) of the application like Requirement gathering, Design, Analysis and Code development.
- Prepared Use Cases, sequence diagrams, class diagrams and deployment diagrams based on UML to enforce
- Rational Unified Process using Rational Rose.
- Developed and implemented the MVC Architectural Pattern using Struts Framework including JSP, Servlets, EJB, Form Bean and Action classes.
- Written Junit Test cases for performing unit testing.
- Used Rational Clear Case as Version control.
- Developed the war/ear file using Ant script and deployed into Web Logic Application Server.
- Used JavaScript for client-side validation and Struts Validator Framework for form validations.
- Implemented Java/J2EE Design patterns like Business Delegate and Data Transfer Object (DTO), Data Access Object.
- Developed web based presentation-using JSP, AJAX using YUI components and Servlets technologies and implemented using struts framework.
- Designed and developed backend java Components residing on different machines to exchange information and data using JMS.
- Worked with QA team for testing and resolving defects.
- Used Jira for bug tracking and project management.
Environment: J2EE, JSP, JDBC, Spring Core, Struts, Hibernate, Design Patterns, XML, WebLogic, Apache Axis, Clear case, Junit, JavaScript, Web Services, SOAP, XSLT, Jira, Oracle, PL/SQL Developer and Windows.