Sr. Data Engineer Resume
Dallas, TX
PROFESSIONAL SUMMARY:
- Hadoop Developer with 8+ years of overall IT experience in a variety of industries, which includes hands on experience in Big Data technologies.
- Have 4+ years of comprehensive experience in Big Data processing using Hadoop and its ecosystem (MapReduce, Pig, Hive, Sqoop, Flume, Spark, Kafka and HBase).
- Worked on installing, configuring, and administrating Hadoop cluster for distributions like Cloudera Distribution.
- Efficient in writing MapReduce Programs and using Apache Hadoop API for analyzing the structured and unstructured data.
- Expert in working with Hive data warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- Debugging Pig and Hive scripts and optimizing MapReduce job and debugging Map reduce job.
- Administrator for Pig, Hive and Hbase installing updates patches and upgrades.
- Hands-on experience in managing and reviewing Hadoop logs.
- Good knowledge about YARN configuration.
- Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Experience in importing and exporting data using Sqoop from HDFS to RelationalDatabase Systems and vice-versa.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Good working knowledge onNoSQL databases such as Hbase, MongoDB and Cassandra.
- Used Hbase in accordance with PIG/Hive as and when required for real time low latency queries.
- Knowledge of job workflow scheduling and monitoring tools like Oozie (hive, pig) and Zookeeper (Hbase).
- Good working experience on Spark (spark streaming, spark SQL), Scala and Kafka.
- Worked on reading multiple data formats on HDFS using Scala.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm- Kafka.
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm
- Developed various shell scripts and python scripts to address various production issues.
- Developed and designed automation framework using Python and Shell scripting
- Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
- Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
- Configuring AWS EC2 instances in VPC network & managing security through IAM and Monitoring servers’ health through Cloud Watch.
- Good Knowledge of data compression formats like Snappy, Avro.
- Hands on experience in developing the applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g and MS-SQL Server RDBMS.
- Dealt with huge transaction volumes while interfacing the front end application written in Java, JSP, Struts, Hibernate, SOAP Web service and withTomcat Web server.
- Delivered zero defect code for three large projects which involved changes to both front end (Core Java, Presentation services) and back-end (Oracle).
- Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
- Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.
- Also have experience in understanding of existing systems, maintenance and production support, on technologies such as Java, J2EE and various databases (Oracle, SQL Server).
TECHNICAL SKILLS:
Big Data: Cloudera Distribution, HDFS, Zookeeper, Yarn, Data Node, Name Node, Resource Manager, Node Manager, Mapreduce, PIG, SQOOP, Hbase, Hive, Flume, Cassandra, MongoDB, Oozie, Kafka, Spark, Storm, Scala, Impala
Operating System: Windows, Linux, Unix.
Languages: Java, J2EE, SQL, PYTHON, Scala
Databases: IBM DB2, Oracle, SQL Server, MySQL, PostGres
Web Technologies: JSP, Servlets, HTML, CSS, JDBC, SOAP, XSLT.
Version Tools: GIT, SVN, CVS
IDE: IBM RAD, Eclipse, IntelliJ
Tools: TOAD, SQL Developer, ANT, Log4J
Web Services: WSDL, SOAP.
ETL: Talend ETL, Talend Studio
Web/App Server: UNIX server, Apache Tomcat
PROFESSIONAL EXPERIENCE:
Confidential, Dallas, TX
Sr. Data Engineer
Responsibilities:
- Handled the importing of data from various data sources like Oracle and MySQL using SQOOP performed transformation using Hive and loaded the data into HDFS.
- Participate in requirement gathering and documenting the business requirements by conducting grooming sessions/meetings with various business users.
- Involved in creating Hive tables and written multiple Hive queries to load the hive tables for analyzing the market data coming from distinct sources.
- Created extensive SQL queries for data extraction to test the data against the various databases.
- Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations.
- Created Partitioning, Bucketing, Map side Join, Parallel execution for optimizing the hive queries.
- Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
- Developed Simple to complex Map Reduce Jobs using Hive and Hbase.
- Orchestrated variousSqoop queries, Pig scripts, Hive queries using Oozie workflows and sub-workflows.
- Responsible for handling different data formats like Avro, Parquet and ORC formats.
- Involved in generating analytics data using Map/Reduce programs written in Python.
- Used Kafka to load data in to HDFS and move data into NoSQL databases.
- Involved in creation and designing of data ingest pipelines using technologies such as Apache Kafka.
- Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
- Experience in custom aggregate functions using Spark SQL and performed interactive querying.
- Implemented Apache Spark data processing project to handle data from RDBMS and streaming sources.
- Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SqlContext.
- Performed analysis on implementing Spark using Scala.
- Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to that of MR jobs.
- Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
- Configuring AWS EC2 instances in VPC network & managing security through IAM and Monitoring servers health through Cloud Watch.
- Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
- Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.
Environment: Hadoop, Java, AWS, MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Python, Spark, Impala, Scala, Kafka, Shell Scripting, Eclipse, Cloudera, MySQL, Talend, Cassandra
Confidential, Fort Worth, Texas
Data Engineer
Responsibilities:
- Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
- Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Developed Spark scripts by using Java, and Python shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Experienced in querying HBase using Impala
- Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Maintained the cluster securely using Kerberos and making the cluster up and running all the times.
- Implemented optimization and performance testing and tuning of Hive and Pig.
- Experience in migrating HiveQL into Impala to minimize query response time.
- Developed a data pipeline using Kafka to store data into HDFS.
- Worked on reading multiple data formats on HDFS using Scala
- Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SqlContext.
- Performed analysis on implementing Spark using Scala.
- Implemented spark sample programs in python using pyspark.
- Responsible for creating, modifying topics (Kafka Queues) as and when required with varying configurations involving replication factors and partitions.
- Written shell scripts and Python scripts for automation of job.
Environment: Cloudera, HDFS, Hive, HQL scripts, Map Reduce, Java, HBase, Pig, Sqoop, Kafka, Impala, Shell Scripts, Python Scripts, Spark, Scala, Oozie.
Confidential, Englewood, CO
Hadoop Developer
Responsibilities:
- Developed MapReduce programs to parse the raw data, and create intermediate data which would befurther used to be loaded into Hive portioned data.
- Involved in creating Hive ORC tables, loading the data into it and writing Hive queries to analyze thedata.
- Involved in data ingestion into HDFS using Sqoop for full load and Flume for incremental load onvariety of sources like web server, RDBMS and Data API’s.
- Performed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing
- Used different file formats like Text files, Sequence Files, Avro, Optimized Row Columnar (ORC)
- Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume
- Expertise in creating TWS Jobs and Jobstreams and automate them as per schedule
- Worked on Golden Gate replication tool to get data from various data sources into HDFS
- Worked on HBase for support enterprise production and loading data into HBASE using SQOOP.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFSfor further analysis.
- Exported the data from Avro files and indexed the documents in ORC file format.
- Responsible for created Technical Specification documents for the generated extracts
- Involved in performance tuning using Partitioning, bucketing of Hive tables
- Created UDFs to calculate the pending payment for the given customer data based on last day of everymonth and used in Hive Scripts.
- Involved in writing shell scripts to run the jobs in parallel and increase the performance
- Involved in running TWS jobs for processing millions of records using ITG.
Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, PIG, Oozie, Sqoop, HBase, Flume,Linux, Shell scripting, Java, Eclipse, SQL
Confidential, Dallas, TX
Hadoop Developer
Responsibilities:
- Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
- Used Sqoop to export data from HDFS to RDBMS.
- Having experience on Hadoop eco system components HDFS, MapReduce, Hive, Pig, Sqoop and HBase.
- Expertise with web based GUI architecture and development using HTML, CSS, AJAX, JQuery, Angular Js, and JavaScript.
- Developed Map Reduce programs for some refined queries on big data.
- Involved in loading data from UNIX file system to HDFS.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Extracted the data from Databases into HDFS using Sqoop.
- Handled importing of data from various data sources, performed transformations using Hive, PIG and loaded data into HDFS.
- Used PIG predefined functions to convert the fixed width file to delimited file
- Used HIVE join queries to join multiple tables of a source system and load them into Elastic Search Tables.
- Automated the workflow using shell scripts.
- Manage and review Hadoop log files. Implemented lambda architecture as s solution to a problem.
- Involved in analysis, design, testing phases and responsible for documenting technical specifications.
- Very good understanding of Partitions, bucketing concepts Managed and External tables in Hive to optimize performance.
- Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
- Experienced in running Hadoop streaming jobs to process terabytes data in Hive and designed both.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, PIG, Oozie, Sqoop, HBase, Flume, Linux, Shell scripting, Java, Eclipse, SQL
Confidential
Java Developer
Responsibilities:
- Also helped developing UML Diagrams: Use cases, Activity diagram, Sequence diagram, class Diagram.
- Developed JSPs and Servlets to dynamically generate HTML and display the data to the client side.
- Developed application on Struts MVCarchitecture utilizing Action Classes, Action Forms and validations
- Was responsible in implementing various J2EE Design Patterns like Service Locator, Business Delegate, Session Facade and Factory Pattern.
- Involved in the design and decision making for Hibernate OR Mapping.
- Developed Hibernate Mapping file(.hbm.xml) files for mapping declarations.
- Configured Queues in WebLogic server where the messages, using JMSAPI, were published.
- Consumed Web Services (WSDL, SOAP, and UDDI) from third party for authorizing payments to/from customers.
- Writing/Manipulating the database queries, stored procedures for Oracle9i.
- Developed the services by following a full flown Test Driven Development.
- Interacting with the system analysts, business users for design & requirement clarifications.
- Designed front end pages using Jsp, HTML, Angular JS, JQuery, JavaScript, Css and Ajax calls to get the required data from backend.
- Designed and developed this application using Spring MVC.
- Spring Framework IOC (Inversion of Control) design pattern is used to have relationships between application components.
- Developed the DAO layer for the application using Hibernate and JDBC.
- Implemented Restful web services with JAX-RS (Jersey).
- Used JMS (Java Message Service) to send, receive and read messages in the application.
- Used Junit for testing.
- Used databases like Oracle 10g, DB2 and wrote complex SQL statements, PL/SQL Procedures, Cursors to retrieve data from DB.
- Used extensively Eclipse and RAD in development and debugging the application.
- Used maven as a project build, dependency and management tool.
Environment: Java,/J2EE, Eclipse, Web Logic Application Server, Oracle, JSP1,HTML, JavaScript, JMS, Servlets, UML, XML, Struts, Web Services, WSDL, SOAP, UDDI, ANT, JUnit, Log4j.
Confidential
Java Developer
Responsibilities:
- Responsible for understanding the business requirement.
- Worked with Business Analyst and helped representing the business domain details in technical specifications.
- Also helped developing UML Diagrams: Use cases, Activity diagram, Sequence diagram, class Diagram.
- Was also actively involved in setting coding standards and writing related documentation.
- Developed the Java Code using Eclipse as IDE.
- Developed JSPs and Servlets to dynamically generate HTML and display the data to the client side.
- Developed application on Struts MVCarchitecture utilizing Action Classes, Action Forms and validations.
- Tiles were used as an implementation of Composite View pattern
- Was responsible in implementing various J2EE Design Patterns like Service Locator, Business Delegate, Session Facade and Factory Pattern.
- Code Review & Debugging using Eclipse Debugger.
- Involved in the design and decision making for Hibernate OR Mapping.
- Developed Hibernate Mapping file (.hbm.xml) files for mapping declarations.
- Configured Queues in WebLogic server where the messages, using JMSAPI, were published.
- Consumed Web Services (WSDL, SOAP, and UDDI) from third party for authorizing payments to/from customers.
- Writing/Manipulating the database queries, stored procedures for Oracle9i.
Environment: Java,/J2EE, Eclipse, Web Logic Application Server, Oracle, JSP,HTML, JavaScript, JMS, Servlets, UML, XML, Struts, WSDL, SOAP, UDDI, ANT, JUnit, Log4j.