Sr. Data Engineer Resume Dallas, TX - Hire IT People

PROFESSIONAL SUMMARY:

Hadoop Developer with 8+ years of overall IT experience in a variety of industries, which includes hands on experience in Big Data technologies.
Have 4+ years of comprehensive experience in Big Data processing using Hadoop and its ecosystem (MapReduce, Pig, Hive, Sqoop, Flume, Spark, Kafka and HBase).
Worked on installing, configuring, and administrating Hadoop cluster for distributions like Cloudera Distribution.
Efficient in writing MapReduce Programs and using Apache Hadoop API for analyzing the structured and unstructured data.
Expert in working with Hive data warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
Debugging Pig and Hive scripts and optimizing MapReduce job and debugging Map reduce job.
Administrator for Pig, Hive and Hbase installing updates patches and upgrades.
Hands-on experience in managing and reviewing Hadoop logs.
Good knowledge about YARN configuration.
Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
Extending Hive and Pig core functionality by writing custom UDFs.
Experience in importing and exporting data using Sqoop from HDFS to RelationalDatabase Systems and vice-versa.
Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
Good working knowledge onNoSQL databases such as Hbase, MongoDB and Cassandra.
Used Hbase in accordance with PIG/Hive as and when required for real time low latency queries.
Knowledge of job workflow scheduling and monitoring tools like Oozie (hive, pig) and Zookeeper (Hbase).
Good working experience on Spark (spark streaming, spark SQL), Scala and Kafka.
Worked on reading multiple data formats on HDFS using Scala.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm- Kafka.
Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm
Developed various shell scripts and python scripts to address various production issues.
Developed and designed automation framework using Python and Shell scripting
Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
Configuring AWS EC2 instances in VPC network & managing security through IAM and Monitoring servers’ health through Cloud Watch.
Good Knowledge of data compression formats like Snappy, Avro.
Hands on experience in developing the applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g and MS-SQL Server RDBMS.
Dealt with huge transaction volumes while interfacing the front end application written in Java, JSP, Struts, Hibernate, SOAP Web service and withTomcat Web server.
Delivered zero defect code for three large projects which involved changes to both front end (Core Java, Presentation services) and back-end (Oracle).
Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.
Also have experience in understanding of existing systems, maintenance and production support, on technologies such as Java, J2EE and various databases (Oracle, SQL Server).

TECHNICAL SKILLS:

Big Data: Cloudera Distribution, HDFS, Zookeeper, Yarn, Data Node, Name Node, Resource Manager, Node Manager, Mapreduce, PIG, SQOOP, Hbase, Hive, Flume, Cassandra, MongoDB, Oozie, Kafka, Spark, Storm, Scala, Impala

Operating System: Windows, Linux, Unix.

Languages: Java, J2EE, SQL, PYTHON, Scala

Databases: IBM DB2, Oracle, SQL Server, MySQL, PostGres

Web Technologies: JSP, Servlets, HTML, CSS, JDBC, SOAP, XSLT.

Version Tools: GIT, SVN, CVS

IDE: IBM RAD, Eclipse, IntelliJ

Tools: TOAD, SQL Developer, ANT, Log4J

Web Services: WSDL, SOAP.

ETL: Talend ETL, Talend Studio

Web/App Server: UNIX server, Apache Tomcat

PROFESSIONAL EXPERIENCE:

Confidential, Dallas, TX

Sr. Data Engineer

Responsibilities:

Handled the importing of data from various data sources like Oracle and MySQL using SQOOP performed transformation using Hive and loaded the data into HDFS.
Participate in requirement gathering and documenting the business requirements by conducting grooming sessions/meetings with various business users.
Involved in creating Hive tables and written multiple Hive queries to load the hive tables for analyzing the market data coming from distinct sources.
Created extensive SQL queries for data extraction to test the data against the various databases.
Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations.
Created Partitioning, Bucketing, Map side Join, Parallel execution for optimizing the hive queries.
Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
Developed Simple to complex Map Reduce Jobs using Hive and Hbase.
Orchestrated variousSqoop queries, Pig scripts, Hive queries using Oozie workflows and sub-workflows.
Responsible for handling different data formats like Avro, Parquet and ORC formats.
Involved in generating analytics data using Map/Reduce programs written in Python.
Used Kafka to load data in to HDFS and move data into NoSQL databases.
Involved in creation and designing of data ingest pipelines using technologies such as Apache Kafka.
Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
Experience in custom aggregate functions using Spark SQL and performed interactive querying.
Implemented Apache Spark data processing project to handle data from RDBMS and streaming sources.
Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SqlContext.
Performed analysis on implementing Spark using Scala.
Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to that of MR jobs.
Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
Configuring AWS EC2 instances in VPC network & managing security through IAM and Monitoring servers health through Cloud Watch.
Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.

Environment: Hadoop, Java, AWS, MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Python, Spark, Impala, Scala, Kafka, Shell Scripting, Eclipse, Cloudera, MySQL, Talend, Cassandra

Confidential, Fort Worth, Texas

Data Engineer

Responsibilities:

Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team.
Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
Developed Spark scripts by using Java, and Python shell commands as per the requirement.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Experienced in querying HBase using Impala
Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
Maintained the cluster securely using Kerberos and making the cluster up and running all the times.
Implemented optimization and performance testing and tuning of Hive and Pig.
Experience in migrating HiveQL into Impala to minimize query response time.
Developed a data pipeline using Kafka to store data into HDFS.
Worked on reading multiple data formats on HDFS using Scala
Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SqlContext.
Performed analysis on implementing Spark using Scala.
Implemented spark sample programs in python using pyspark.
Responsible for creating, modifying topics (Kafka Queues) as and when required with varying configurations involving replication factors and partitions.
Written shell scripts and Python scripts for automation of job.

Environment: Cloudera, HDFS, Hive, HQL scripts, Map Reduce, Java, HBase, Pig, Sqoop, Kafka, Impala, Shell Scripts, Python Scripts, Spark, Scala, Oozie.

Confidential, Englewood, CO

Hadoop Developer

Responsibilities:

Developed MapReduce programs to parse the raw data, and create intermediate data which would befurther used to be loaded into Hive portioned data.
Involved in creating Hive ORC tables, loading the data into it and writing Hive queries to analyze thedata.
Involved in data ingestion into HDFS using Sqoop for full load and Flume for incremental load onvariety of sources like web server, RDBMS and Data API’s.
Performed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing
Used different file formats like Text files, Sequence Files, Avro, Optimized Row Columnar (ORC)
Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume
Expertise in creating TWS Jobs and Jobstreams and automate them as per schedule
Worked on Golden Gate replication tool to get data from various data sources into HDFS
Worked on HBase for support enterprise production and loading data into HBASE using SQOOP.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFSfor further analysis.
Exported the data from Avro files and indexed the documents in ORC file format.
Responsible for created Technical Specification documents for the generated extracts
Involved in performance tuning using Partitioning, bucketing of Hive tables
Created UDFs to calculate the pending payment for the given customer data based on last day of everymonth and used in Hive Scripts.
Involved in writing shell scripts to run the jobs in parallel and increase the performance
Involved in running TWS jobs for processing millions of records using ITG.

Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, PIG, Oozie, Sqoop, HBase, Flume,Linux, Shell scripting, Java, Eclipse, SQL

Confidential, Dallas, TX

Hadoop Developer

Responsibilities:

Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
Used Sqoop to export data from HDFS to RDBMS.
Having experience on Hadoop eco system components HDFS, MapReduce, Hive, Pig, Sqoop and HBase.
Expertise with web based GUI architecture and development using HTML, CSS, AJAX, JQuery, Angular Js, and JavaScript.
Developed Map Reduce programs for some refined queries on big data.
Involved in loading data from UNIX file system to HDFS.
Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
Extracted the data from Databases into HDFS using Sqoop.
Handled importing of data from various data sources, performed transformations using Hive, PIG and loaded data into HDFS.
Used PIG predefined functions to convert the fixed width file to delimited file
Used HIVE join queries to join multiple tables of a source system and load them into Elastic Search Tables.
Automated the workflow using shell scripts.
Manage and review Hadoop log files. Implemented lambda architecture as s solution to a problem.
Involved in analysis, design, testing phases and responsible for documenting technical specifications.
Very good understanding of Partitions, bucketing concepts Managed and External tables in Hive to optimize performance.
Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
Experienced in running Hadoop streaming jobs to process terabytes data in Hive and designed both.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Created reports for the BI team using Sqoop to export data into HDFS and Hive.

Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, PIG, Oozie, Sqoop, HBase, Flume, Linux, Shell scripting, Java, Eclipse, SQL

Confidential

Java Developer

Responsibilities:

Also helped developing UML Diagrams: Use cases, Activity diagram, Sequence diagram, class Diagram.
Developed JSPs and Servlets to dynamically generate HTML and display the data to the client side.
Developed application on Struts MVCarchitecture utilizing Action Classes, Action Forms and validations
Was responsible in implementing various J2EE Design Patterns like Service Locator, Business Delegate, Session Facade and Factory Pattern.
Involved in the design and decision making for Hibernate OR Mapping.
Developed Hibernate Mapping file(.hbm.xml) files for mapping declarations.
Configured Queues in WebLogic server where the messages, using JMSAPI, were published.
Consumed Web Services (WSDL, SOAP, and UDDI) from third party for authorizing payments to/from customers.
Writing/Manipulating the database queries, stored procedures for Oracle9i.
Developed the services by following a full flown Test Driven Development.
Interacting with the system analysts, business users for design & requirement clarifications.
Designed front end pages using Jsp, HTML, Angular JS, JQuery, JavaScript, Css and Ajax calls to get the required data from backend.
Designed and developed this application using Spring MVC.
Spring Framework IOC (Inversion of Control) design pattern is used to have relationships between application components.
Developed the DAO layer for the application using Hibernate and JDBC.
Implemented Restful web services with JAX-RS (Jersey).
Used JMS (Java Message Service) to send, receive and read messages in the application.
Used Junit for testing.
Used databases like Oracle 10g, DB2 and wrote complex SQL statements, PL/SQL Procedures, Cursors to retrieve data from DB.
Used extensively Eclipse and RAD in development and debugging the application.
Used maven as a project build, dependency and management tool.

Environment: Java,/J2EE, Eclipse, Web Logic Application Server, Oracle, JSP1,HTML, JavaScript, JMS, Servlets, UML, XML, Struts, Web Services, WSDL, SOAP, UDDI, ANT, JUnit, Log4j.

Confidential

Java Developer

Responsibilities:

Responsible for understanding the business requirement.
Worked with Business Analyst and helped representing the business domain details in technical specifications.
Also helped developing UML Diagrams: Use cases, Activity diagram, Sequence diagram, class Diagram.
Was also actively involved in setting coding standards and writing related documentation.
Developed the Java Code using Eclipse as IDE.
Developed JSPs and Servlets to dynamically generate HTML and display the data to the client side.
Developed application on Struts MVCarchitecture utilizing Action Classes, Action Forms and validations.
Tiles were used as an implementation of Composite View pattern
Was responsible in implementing various J2EE Design Patterns like Service Locator, Business Delegate, Session Facade and Factory Pattern.
Code Review & Debugging using Eclipse Debugger.
Involved in the design and decision making for Hibernate OR Mapping.
Developed Hibernate Mapping file (.hbm.xml) files for mapping declarations.
Configured Queues in WebLogic server where the messages, using JMSAPI, were published.
Consumed Web Services (WSDL, SOAP, and UDDI) from third party for authorizing payments to/from customers.
Writing/Manipulating the database queries, stored procedures for Oracle9i.

Environment: Java,/J2EE, Eclipse, Web Logic Application Server, Oracle, JSP,HTML, JavaScript, JMS, Servlets, UML, XML, Struts, WSDL, SOAP, UDDI, ANT, JUnit, Log4j.

We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

Dallas, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship