We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

Dallas, TX

PROFESSIONAL SUMMARY:

  • Hadoop Developer with 8+ years of overall IT experience in a variety of industries, which includes hands on experience in Big Data technologies.
  • Have 4+ years of comprehensive experience in Big Data processing using Hadoop and its ecosystem (MapReduce, Pig, Hive, Sqoop, Flume, Spark, Kafka and HBase).
  • Worked on installing, configuring, and administrating Hadoop cluster for distributions like Cloudera Distribution.
  • Efficient in writing MapReduce Programs and using Apache Hadoop API for analyzing the structured and unstructured data.
  • Expert in working with Hive data warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
  • Debugging Pig and Hive scripts and optimizing MapReduce job and debugging Map reduce job.
  • Administrator for Pig, Hive and Hbase installing updates patches and upgrades.
  • Hands-on experience in managing and reviewing Hadoop logs.
  • Good knowledge about YARN configuration.
  • Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Experience in importing and exporting data using Sqoop from HDFS to RelationalDatabase Systems and vice-versa.
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
  • Good working knowledge onNoSQL databases such as Hbase, MongoDB and Cassandra.
  • Used Hbase in accordance with PIG/Hive as and when required for real time low latency queries.
  • Knowledge of job workflow scheduling and monitoring tools like Oozie (hive, pig) and Zookeeper (Hbase).
  • Good working experience on Spark (spark streaming, spark SQL), Scala and Kafka.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm- Kafka.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm
  • Developed various shell scripts and python scripts to address various production issues.
  • Developed and designed automation framework using Python and Shell scripting
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
  • Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Configuring AWS EC2 instances in VPC network & managing security through IAM and Monitoring servers’ health through Cloud Watch.
  • Good Knowledge of data compression formats like Snappy, Avro.
  • Hands on experience in developing the applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g and MS-SQL Server RDBMS.
  • Dealt with huge transaction volumes while interfacing the front end application written in Java, JSP, Struts, Hibernate, SOAP Web service and withTomcat Web server.
  • Delivered zero defect code for three large projects which involved changes to both front end (Core Java, Presentation services) and back-end (Oracle).
  • Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
  • Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.
  • Also have experience in understanding of existing systems, maintenance and production support, on technologies such as Java, J2EE and various databases (Oracle, SQL Server).

TECHNICAL SKILLS:

Big Data: Cloudera Distribution, HDFS, Zookeeper, Yarn, Data Node, Name Node, Resource Manager, Node Manager, Mapreduce, PIG, SQOOP, Hbase, Hive, Flume, Cassandra, MongoDB, Oozie, Kafka, Spark, Storm, Scala, Impala

Operating System: Windows, Linux, Unix.

Languages: Java, J2EE, SQL, PYTHON, Scala

Databases: IBM DB2, Oracle, SQL Server, MySQL, PostGres

Web Technologies: JSP, Servlets, HTML, CSS, JDBC, SOAP, XSLT.

Version Tools: GIT, SVN, CVS

IDE: IBM RAD, Eclipse, IntelliJ

Tools: TOAD, SQL Developer, ANT, Log4J

Web Services: WSDL, SOAP.

ETL: Talend ETL, Talend Studio

Web/App Server: UNIX server, Apache Tomcat

PROFESSIONAL EXPERIENCE:

Confidential, Dallas, TX

Sr. Data Engineer

Responsibilities:

  • Handled the importing of data from various data sources like Oracle and MySQL using SQOOP performed transformation using Hive and loaded the data into HDFS.
  • Participate in requirement gathering and documenting the business requirements by conducting grooming sessions/meetings with various business users.
  • Involved in creating Hive tables and written multiple Hive queries to load the hive tables for analyzing the market data coming from distinct sources.
  • Created extensive SQL queries for data extraction to test the data against the various databases.
  • Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations.
  • Created Partitioning, Bucketing, Map side Join, Parallel execution for optimizing the hive queries.
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
  • Developed Simple to complex Map Reduce Jobs using Hive and Hbase.
  • Orchestrated variousSqoop queries, Pig scripts, Hive queries using Oozie workflows and sub-workflows.
  • Responsible for handling different data formats like Avro, Parquet and ORC formats.
  • Involved in generating analytics data using Map/Reduce programs written in Python.
  • Used Kafka to load data in to HDFS and move data into NoSQL databases.
  • Involved in creation and designing of data ingest pipelines using technologies such as Apache Kafka.
  • Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
  • Experience in custom aggregate functions using Spark SQL and performed interactive querying.
  • Implemented Apache Spark data processing project to handle data from RDBMS and streaming sources.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SqlContext.
  • Performed analysis on implementing Spark using Scala.
  • Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to that of MR jobs.
  • Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Configuring AWS EC2 instances in VPC network & managing security through IAM and Monitoring servers health through Cloud Watch.
  • Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
  • Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.

Environment: Hadoop, Java, AWS, MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Python, Spark, Impala, Scala, Kafka, Shell Scripting, Eclipse, Cloudera, MySQL, Talend, Cassandra

Confidential, Fort Worth, Texas

Data Engineer

Responsibilities:

  • Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
  • Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Java, and Python shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Experienced in querying HBase using Impala
  • Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Maintained the cluster securely using Kerberos and making the cluster up and running all the times.
  • Implemented optimization and performance testing and tuning of Hive and Pig.
  • Experience in migrating HiveQL into Impala to minimize query response time.
  • Developed a data pipeline using Kafka to store data into HDFS.
  • Worked on reading multiple data formats on HDFS using Scala
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SqlContext.
  • Performed analysis on implementing Spark using Scala.
  • Implemented spark sample programs in python using pyspark.
  • Responsible for creating, modifying topics (Kafka Queues) as and when required with varying configurations involving replication factors and partitions.
  • Written shell scripts and Python scripts for automation of job.

Environment: Cloudera, HDFS, Hive, HQL scripts, Map Reduce, Java, HBase, Pig, Sqoop, Kafka, Impala, Shell Scripts, Python Scripts, Spark, Scala, Oozie.

Confidential, Englewood, CO

Hadoop Developer

Responsibilities:

  • Developed MapReduce programs to parse the raw data, and create intermediate data which would befurther used to be loaded into Hive portioned data.
  • Involved in creating Hive ORC tables, loading the data into it and writing Hive queries to analyze thedata.
  • Involved in data ingestion into HDFS using Sqoop for full load and Flume for incremental load onvariety of sources like web server, RDBMS and Data API’s.
  • Performed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing
  • Used different file formats like Text files, Sequence Files, Avro, Optimized Row Columnar (ORC)
  • Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume
  • Expertise in creating TWS Jobs and Jobstreams and automate them as per schedule
  • Worked on Golden Gate replication tool to get data from various data sources into HDFS
  • Worked on HBase for support enterprise production and loading data into HBASE using SQOOP.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFSfor further analysis.
  • Exported the data from Avro files and indexed the documents in ORC file format.
  • Responsible for created Technical Specification documents for the generated extracts
  • Involved in performance tuning using Partitioning, bucketing of Hive tables
  • Created UDFs to calculate the pending payment for the given customer data based on last day of everymonth and used in Hive Scripts.
  • Involved in writing shell scripts to run the jobs in parallel and increase the performance
  • Involved in running TWS jobs for processing millions of records using ITG.

Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, PIG, Oozie, Sqoop, HBase, Flume,Linux, Shell scripting, Java, Eclipse, SQL

Confidential, Dallas, TX

Hadoop Developer

Responsibilities:

  • Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
  • Used Sqoop to export data from HDFS to RDBMS.
  • Having experience on Hadoop eco system components HDFS, MapReduce, Hive, Pig, Sqoop and HBase.
  • Expertise with web based GUI architecture and development using HTML, CSS, AJAX, JQuery, Angular Js, and JavaScript.
  • Developed Map Reduce programs for some refined queries on big data.
  • Involved in loading data from UNIX file system to HDFS.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Extracted the data from Databases into HDFS using Sqoop.
  • Handled importing of data from various data sources, performed transformations using Hive, PIG and loaded data into HDFS.
  • Used PIG predefined functions to convert the fixed width file to delimited file
  • Used HIVE join queries to join multiple tables of a source system and load them into Elastic Search Tables.
  • Automated the workflow using shell scripts.
  • Manage and review Hadoop log files. Implemented lambda architecture as s solution to a problem.
  • Involved in analysis, design, testing phases and responsible for documenting technical specifications.
  • Very good understanding of Partitions, bucketing concepts Managed and External tables in Hive to optimize performance.
  • Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
  • Experienced in running Hadoop streaming jobs to process terabytes data in Hive and designed both.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.

Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, PIG, Oozie, Sqoop, HBase, Flume, Linux, Shell scripting, Java, Eclipse, SQL

Confidential

Java Developer

Responsibilities:

  • Also helped developing UML Diagrams: Use cases, Activity diagram, Sequence diagram, class Diagram.
  • Developed JSPs and Servlets to dynamically generate HTML and display the data to the client side.
  • Developed application on Struts MVCarchitecture utilizing Action Classes, Action Forms and validations
  • Was responsible in implementing various J2EE Design Patterns like Service Locator, Business Delegate, Session Facade and Factory Pattern.
  • Involved in the design and decision making for Hibernate OR Mapping.
  • Developed Hibernate Mapping file(.hbm.xml) files for mapping declarations.
  • Configured Queues in WebLogic server where the messages, using JMSAPI, were published.
  • Consumed Web Services (WSDL, SOAP, and UDDI) from third party for authorizing payments to/from customers.
  • Writing/Manipulating the database queries, stored procedures for Oracle9i.
  • Developed the services by following a full flown Test Driven Development.
  • Interacting with the system analysts, business users for design & requirement clarifications.
  • Designed front end pages using Jsp, HTML, Angular JS, JQuery, JavaScript, Css and Ajax calls to get the required data from backend.
  • Designed and developed this application using Spring MVC.
  • Spring Framework IOC (Inversion of Control) design pattern is used to have relationships between application components.
  • Developed the DAO layer for the application using Hibernate and JDBC.
  • Implemented Restful web services with JAX-RS (Jersey).
  • Used JMS (Java Message Service) to send, receive and read messages in the application.
  • Used Junit for testing.
  • Used databases like Oracle 10g, DB2 and wrote complex SQL statements, PL/SQL Procedures, Cursors to retrieve data from DB.
  • Used extensively Eclipse and RAD in development and debugging the application.
  • Used maven as a project build, dependency and management tool.

Environment: Java,/J2EE, Eclipse, Web Logic Application Server, Oracle, JSP1,HTML, JavaScript, JMS, Servlets, UML, XML, Struts, Web Services, WSDL, SOAP, UDDI, ANT, JUnit, Log4j.

Confidential

Java Developer

Responsibilities:

  • Responsible for understanding the business requirement.
  • Worked with Business Analyst and helped representing the business domain details in technical specifications.
  • Also helped developing UML Diagrams: Use cases, Activity diagram, Sequence diagram, class Diagram.
  • Was also actively involved in setting coding standards and writing related documentation.
  • Developed the Java Code using Eclipse as IDE.
  • Developed JSPs and Servlets to dynamically generate HTML and display the data to the client side.
  • Developed application on Struts MVCarchitecture utilizing Action Classes, Action Forms and validations.
  • Tiles were used as an implementation of Composite View pattern
  • Was responsible in implementing various J2EE Design Patterns like Service Locator, Business Delegate, Session Facade and Factory Pattern.
  • Code Review & Debugging using Eclipse Debugger.
  • Involved in the design and decision making for Hibernate OR Mapping.
  • Developed Hibernate Mapping file (.hbm.xml) files for mapping declarations.
  • Configured Queues in WebLogic server where the messages, using JMSAPI, were published.
  • Consumed Web Services (WSDL, SOAP, and UDDI) from third party for authorizing payments to/from customers.
  • Writing/Manipulating the database queries, stored procedures for Oracle9i.

Environment: Java,/J2EE, Eclipse, Web Logic Application Server, Oracle, JSP,HTML, JavaScript, JMS, Servlets, UML, XML, Struts, WSDL, SOAP, UDDI, ANT, JUnit, Log4j.

Hire Now