We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

4.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY

  • HadoopDeveloper with 8+ years of overall IT experience in a variety of industries, which includes hands on experience in Big Data technologies.
  • Have 4+ years of comprehensive experience in Big Data processing using Hadoopand its ecosystem (MapReduce, Pig, Hive, Sqoop, Flume, Spark, Kafka and HBase).
  • Worked on installing, configuring, and administratingHadoopcluster for distributions like Cloudera Distribution.
  • Efficient in writing MapReduce Programs and using ApacheHadoopAPI for analyzing teh structured and unstructured data.
  • Expert in working with Hive data warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing teh HiveQL queries.
  • Debugging Pig and Hive scripts and optimizing MapReduce job and debugging Map reduce job.
  • Administrator for Pig, Hive and Hbase installing updates patches and upgrades.
  • Hands-on experience in managing and reviewingHadooplogs.
  • Good noledge about YARN configuration.
  • Expertise in writingHadoopJobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Hands on experience in configuring and working with Flume to load teh data from multiple sources directly into HDFS.
  • Good working noledge on NoSQL databases such as Hbase, MongoDB and Cassandra.
  • Used Hbase in accordance with PIG/Hive as and when required for real time low latency queries.
  • Knowledge of job workflow scheduling and monitoring tools like Oozie (hive, pig) and Zookeeper (Hbase).
  • Good working experience on Spark (spark streaming, spark SQL), Scala andKafka.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm-Kafka.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm
  • Developed various shell scripts and python scripts to address various production issues.
  • Developed and designed automation framework using Python and Shell scripting
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
  • Experience in AWS EC2, configuring teh servers for Auto scaling and Elastic load balancing.
  • Configuring AWS EC2 instances in VPC network & managing security through IAM and Monitoring servers health through Cloud Watch.
  • Good Knowledge of data compression formats like Snappy, Avro.
  • Hands on experience in developing teh applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g and MS-SQL Server RDBMS.
  • Dealt with huge transaction volumes while interfacing teh front end application written in Java, JSP, Struts, Hibernate, SOAP Web service and with Tomcat Web server.
  • Delivered zero defect code for three large projects which involved changes to both front end (Core Java, Presentation services) and back-end (Oracle).
  • Experience with all stages of teh SDLC and Agile Development model right from teh requirement gathering to Deployment and production support.
  • Involved in daily SCRUM meetings to discuss teh development/progress and was active in making scrum meetings more productive.
  • Also have experience in understanding of existing systems, maintenance and production support, on technologies such as Java, J2EE and various databases (Oracle, SQL Server).

TECHNICAL SKILLS

Big Data: Cloudera Distribution, HDFS, Zookeeper, Yarn, Data Node, Name Node, Resource Manager, Node Manager, Mapreduce, PIG, SQOOP, Hbase, Hive, Flume, Cassandra, MongoDB, Oozie, Kafka, Spark, Storm, Scala, Impala

Operating System: Windows, Linux, Unix.

Languages: Java, J2EE, SQL, PYTHON, Scala

Databases: IBM DB2, Oracle, SQL Server, MySQL, PostGres

Web Technologies: JSP, Servlets, HTML, CSS, JDBC, SOAP, XSLT.

Version Tools: GIT, SVN, CVS

IDE: IBM RAD, Eclipse, IntelliJ

Tools: TOAD, SQL Developer, ANT, Log4J

Web Services: WSDL, SOAP.

ETL: Talend ETL, Talend Studio

Web/App Server: UNIX server, Apache Tomcat

PROFESSIONAL EXPERIENCE

Confidential, Dallas, TX

Sr. Data Engineer

Responsibilities:

  • Handled teh importing of data from various data sources like Oracle and MySQL using SQOOP performed transformation using Hive and loaded teh data into HDFS.
  • Participate in requirement gathering and documenting teh business requirements by conducting grooming sessions/meetings with various business users.
  • Involved in creating Hive tables and written multiple Hive queries to load teh hive tables for analyzing teh market data coming from distinct sources.
  • Created extensive SQL queries for data extraction to test teh data against teh various databases.
  • Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to teh RDBMS servers after aggregations for other ETL operations.
  • Created Partitioning, Bucketing, Map side Join, Parallel execution for optimizing teh hive queries.
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
  • Developed Simple to complexMapReduceJobs using Hive and Hbase.
  • Orchestrated various Sqoop queries, Pig scripts, Hive queries using Oozie workflows and sub-workflows.
  • Responsible for handling different data formats like Avro, Parquet and ORC formats.
  • Involved in generating analytics data using Map/Reduce programs written inPython.
  • UsedKafkato load data in to HDFS and move data into NoSQL databases.
  • Involved in creation and designing of data ingest pipelines using technologies such as Apache Kafka.
  • Responsible for developing multiple Kafka Producers and Consumers from scratch as per teh software requirement specifications.
  • Experience in custom aggregate functions using Spark SQL and performed interactive querying.
  • Implemented Apache Spark data processing project to handle data from RDBMS and streaming sources.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SqlContext.
  • Performed analysis on implementing Spark usingScala.
  • Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to dat of MR jobs.
  • Experience in AWS EC2, configuring teh servers for Auto scaling and Elastic load balancing.
  • Configuring AWS EC2 instances in VPC network & managing security through IAM and Monitoring servers health through Cloud Watch.
  • Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
  • Involved in daily SCRUM meetings to discuss teh development/progress and was active in making scrum meetings more productive.

Environment: Hadoop, Java, AWS, MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Python, Spark, Impala, Scala, Kafka, Shell Scripting, Eclipse, Cloudera, MySQL, Talend, Cassandra

Confidential, Fort Worth, Texas

Data Engineer

Responsibilities:

  • Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
  • Creating Hive tables to import large data sets from various relational databases using Sqoop and export teh analyzed data back for visualization and report generation by teh BI team.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on teh fly for building teh common learner data model which gets teh data from Kafka in near real time.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Java, and Python shell commands as per teh requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Experienced in querying HBase using Impala
  • Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
  • Extracted files fromMongoDBthrough Sqoop and placed in HDFS and processed.
  • Maintained teh cluster securely using Kerberos and making teh cluster up and running all teh times.
  • Implemented optimization and performance testing and tuning of Hive and Pig.
  • Experience in migrating HiveQL intoImpalato minimize query response time.
  • Developed a data pipeline usingKafkato store data into HDFS.
  • Worked on reading multiple data formats on HDFS using Scala
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SqlContext.
  • Performed analysis on implementing Spark usingScala.
  • Implemented spark sample programs inpythonusing pyspark.
  • Responsible for creating, modifying topics (KafkaQueues) as and when required with varying configurations involving replication factors and partitions.
  • Written shell scripts andPythonscripts for automation of job.

Environment: Cloudera, HDFS, Hive, HQL scripts, Map Reduce, Java, HBase, Pig, Sqoop, Kafka, Impala, Shell Scripts, Python Scripts, Spark, Scala, Oozie.

Confidential, Englewood, CO

Hadoop Developer

Responsibilities:

  • Developed MapReduce programs to parse teh raw data, and create intermediate data which would be further used to be loaded into Hive portioned data.
  • Involved in creating Hive ORC tables, loading teh data into it and writing Hive queries to analyze teh data.
  • Involved in data ingestion into HDFS using Sqoop for full load and Flume for incremental load on variety of sources like web server, RDBMS and Data API’s.
  • Performed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing
  • Used different file formats like Text files, Sequence Files, Avro, Optimized Row Columnar (ORC)
  • Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume
  • Expertise in creating TWS Jobs and Jobstreams and automate them as per schedule
  • Worked on Gloden Gate replication tool to get data from various data sources into HDFS
  • Worked on HBase for support enterprise production and loading data into HBASE using SQOOP.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Exported teh data from Avro files and indexed teh documents in ORC file format.
  • Responsible for created Technical Specification documents for teh generated extracts
  • Involved in performance tuning using Partitioning, bucketing of Hive tables
  • Created UDFs to calculate teh pending payment for teh given customer data based on last day of every month and used in Hive Scripts.
  • Involved in writing shell scripts to run teh jobs in parallel and increase teh performance
  • Involved in running TWS jobs for processing millions of records using ITG.

Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, PIG, Oozie, Sqoop, HBase, Flume, Linux, Shell scripting, Java, Eclipse, SQL

Confidential, Dallas, TX

Hadoop Developer

Responsibilities:

  • Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
  • Used Sqoop to export data from HDFS to RDBMS.
  • Having experience on Hadoop eco system components HDFS, MapReduce, Hive, Pig, Sqoop and HBase.
  • Expertise with web based GUI architecture and development using HTML, CSS, AJAX, JQuery, Angular Js, and JavaScript.
  • Developed Map Reduce programs for some refined queries on big data.
  • Involved in loading data from UNIX file system to HDFS.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing teh data onto HDFS.
  • Extracted teh data from Databases into HDFS using Sqoop.
  • Handled importing of data from various data sources, performed transformations using Hive, PIG and loaded data into HDFS.
  • Used PIG predefined functions to convert teh fixed width file to delimited file
  • Used HIVE join queries to join multiple tables of a source system and load them into Elastic Search Tables.
  • Automated teh workflow using shell scripts.
  • Manage and review Hadoop log files. Implemented lambda architecture as s solution to a problem.
  • Involved in analysis, design, testing phases and responsible for documenting technical specifications.
  • Very good understanding of Partitions, bucketing concepts Managed and External tables in Hive to optimize performance.
  • Expertise in writingHadoopJobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
  • Experienced in running Hadoop streaming jobs to process terabytes data in Hive and designed both.
  • Developed workflow in Oozie to automate teh tasks of loading teh data into HDFS and pre-processing with Pig.
  • Created reports for teh BI team using Sqoop to export data into HDFS and Hive.

Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, PIG, Oozie, Sqoop, HBase, Flume, Linux, Shell scripting, Java, Eclipse, SQL

Confidential

Java Developer

Responsibilities:

  • Also helped developing UML Diagrams: Use cases, Activity diagram, Sequence diagram, class Diagram.
  • Developed JSPs and Servlets to dynamically generate HTML and display teh data to teh client side.
  • Developed application on Struts MVC architecture utilizing Action Classes, Action Forms and validations
  • Was responsible in implementing various J2EE Design Patterns like Service Locator, Business Delegate, Session Facade and Factory Pattern.
  • Involved in teh design and decision making for Hibernate OR Mapping.
  • Developed Hibernate Mapping file (.hbm.xml) files for mapping declarations.
  • Configured Queues in Web Logic server where teh messages, using JMS API, were published.
  • Consumed Web Services (WSDL, SOAP, and UDDI) from third party for authorizing payments to/from customers.
  • Writing/Manipulating teh database queries, stored procedures for Oracle9i.
  • Developed teh services by following a full flown Test Driven Development.
  • Interacting with teh system analysts, business users for design & requirement clarifications.
  • Designed front end pages using Jsp, HTML, Angular JS, JQuery, JavaScript, Css and Ajax calls to get teh required data from backend.
  • Designed and developed this application using Spring MVC.
  • Spring Framework IOC (Inversion of Control) design pattern is used to have relationships between application components.
  • Developed teh DAO layer for teh application using Hibernate and JDBC.
  • Implemented Restful web services with JAX-RS (Jersey).
  • Used JMS (Java Message Service) to send, receive and read messages in teh application.
  • Used Junit for testing.
  • Used databases like Oracle 10g, DB2 and wrote complex SQL statements, PL/SQL Procedures, Cursors to retrieve data from DB.
  • Used extensively Eclipse and RAD in development and debugging teh application.
  • Used maven as a project build, dependency and management tool.

Environment: Java,/J2EE, Eclipse, Web Logic Application Server, Oracle, JSP1, HTML, JavaScript, JMS, Servlets, UML, XML, Struts, Web Services, WSDL, SOAP, UDDI, ANT, JUnit, Log4j.

Confidential

Java Developer

Responsibilities:

  • Responsible for understanding teh business requirement.
  • Worked with Business Analyst and helped representing teh business domain details in technical specifications.
  • Also helped developing UML Diagrams: Use cases, Activity diagram, Sequence diagram, class Diagram.
  • Was also actively involved in setting coding standards and writing related documentation.
  • Developed teh Java Code using Eclipse as IDE.
  • Developed JSPs and Servlets to dynamically generate HTML and display teh data to teh client side.
  • Developed application on Struts MVC architecture utilizing Action Classes, Action Forms and validations.
  • Tiles were used as an implementation of Composite View pattern
  • Was responsible in implementing various J2EE Design Patterns like Service Locator, Business Delegate, Session Facade and Factory Pattern.
  • Code Review & Debugging using Eclipse Debugger.
  • Involved in teh design and decision making for Hibernate OR Mapping.
  • Developed Hibernate Mapping file (.hbm.xml) files for mapping declarations.
  • Configured Queues in WebLogic server where teh messages, using JMS API, were published.
  • Consumed Web Services (WSDL, SOAP, and UDDI) from third party for authorizing payments to/from customers.
  • Writing/Manipulating teh database queries, stored procedures for Oracle9i.

Environment: Java,/J2EE, Eclipse, Web Logic Application Server, Oracle, JSP, HTML, JavaScript, JMS, Servlets, UML, XML, Struts, WSDL, SOAP, UDDI, ANT, JUnit, Log4j.

We'd love your feedback!