Sr. Data Engineer Resume Dallas, TX - Hire IT People

SUMMARY

HadoopDeveloper with 8+ years of overall IT experience in a variety of industries, which includes hands on experience in Big Data technologies.
Have 4+ years of comprehensive experience in Big Data processing using Hadoopand its ecosystem (MapReduce, Pig, Hive, Sqoop, Flume, Spark, Kafka and HBase).
Worked on installing, configuring, and administratingHadoopcluster for distributions like Cloudera Distribution.
Efficient in writing MapReduce Programs and using ApacheHadoopAPI for analyzing teh structured and unstructured data.
Expert in working with Hive data warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing teh HiveQL queries.
Debugging Pig and Hive scripts and optimizing MapReduce job and debugging Map reduce job.
Administrator for Pig, Hive and Hbase installing updates patches and upgrades.
Hands-on experience in managing and reviewingHadooplogs.
Good noledge about YARN configuration.
Expertise in writingHadoopJobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
Extending Hive and Pig core functionality by writing custom UDFs.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Hands on experience in configuring and working with Flume to load teh data from multiple sources directly into HDFS.
Good working noledge on NoSQL databases such as Hbase, MongoDB and Cassandra.
Used Hbase in accordance with PIG/Hive as and when required for real time low latency queries.
Knowledge of job workflow scheduling and monitoring tools like Oozie (hive, pig) and Zookeeper (Hbase).
Good working experience on Spark (spark streaming, spark SQL), Scala andKafka.
Worked on reading multiple data formats on HDFS using Scala.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm-Kafka.
Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm
Developed various shell scripts and python scripts to address various production issues.
Developed and designed automation framework using Python and Shell scripting
Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
Experience in AWS EC2, configuring teh servers for Auto scaling and Elastic load balancing.
Configuring AWS EC2 instances in VPC network & managing security through IAM and Monitoring servers health through Cloud Watch.
Good Knowledge of data compression formats like Snappy, Avro.
Hands on experience in developing teh applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g and MS-SQL Server RDBMS.
Dealt with huge transaction volumes while interfacing teh front end application written in Java, JSP, Struts, Hibernate, SOAP Web service and with Tomcat Web server.
Delivered zero defect code for three large projects which involved changes to both front end (Core Java, Presentation services) and back-end (Oracle).
Experience with all stages of teh SDLC and Agile Development model right from teh requirement gathering to Deployment and production support.
Involved in daily SCRUM meetings to discuss teh development/progress and was active in making scrum meetings more productive.
Also have experience in understanding of existing systems, maintenance and production support, on technologies such as Java, J2EE and various databases (Oracle, SQL Server).

TECHNICAL SKILLS

Big Data: Cloudera Distribution, HDFS, Zookeeper, Yarn, Data Node, Name Node, Resource Manager, Node Manager, Mapreduce, PIG, SQOOP, Hbase, Hive, Flume, Cassandra, MongoDB, Oozie, Kafka, Spark, Storm, Scala, Impala

Operating System: Windows, Linux, Unix.

Languages: Java, J2EE, SQL, PYTHON, Scala

Databases: IBM DB2, Oracle, SQL Server, MySQL, PostGres

Web Technologies: JSP, Servlets, HTML, CSS, JDBC, SOAP, XSLT.

Version Tools: GIT, SVN, CVS

IDE: IBM RAD, Eclipse, IntelliJ

Tools: TOAD, SQL Developer, ANT, Log4J

Web Services: WSDL, SOAP.

ETL: Talend ETL, Talend Studio

Web/App Server: UNIX server, Apache Tomcat

PROFESSIONAL EXPERIENCE

Confidential, Dallas, TX

Sr. Data Engineer

Responsibilities:

Handled teh importing of data from various data sources like Oracle and MySQL using SQOOP performed transformation using Hive and loaded teh data into HDFS.
Participate in requirement gathering and documenting teh business requirements by conducting grooming sessions/meetings with various business users.
Involved in creating Hive tables and written multiple Hive queries to load teh hive tables for analyzing teh market data coming from distinct sources.
Created extensive SQL queries for data extraction to test teh data against teh various databases.
Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to teh RDBMS servers after aggregations for other ETL operations.
Created Partitioning, Bucketing, Map side Join, Parallel execution for optimizing teh hive queries.
Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
Developed Simple to complexMapReduceJobs using Hive and Hbase.
Orchestrated various Sqoop queries, Pig scripts, Hive queries using Oozie workflows and sub-workflows.
Responsible for handling different data formats like Avro, Parquet and ORC formats.
Involved in generating analytics data using Map/Reduce programs written inPython.
UsedKafkato load data in to HDFS and move data into NoSQL databases.
Involved in creation and designing of data ingest pipelines using technologies such as Apache Kafka.
Responsible for developing multiple Kafka Producers and Consumers from scratch as per teh software requirement specifications.
Experience in custom aggregate functions using Spark SQL and performed interactive querying.
Implemented Apache Spark data processing project to handle data from RDBMS and streaming sources.
Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SqlContext.
Performed analysis on implementing Spark usingScala.
Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to dat of MR jobs.
Experience in AWS EC2, configuring teh servers for Auto scaling and Elastic load balancing.
Configuring AWS EC2 instances in VPC network & managing security through IAM and Monitoring servers health through Cloud Watch.
Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
Involved in daily SCRUM meetings to discuss teh development/progress and was active in making scrum meetings more productive.

Environment: Hadoop, Java, AWS, MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Python, Spark, Impala, Scala, Kafka, Shell Scripting, Eclipse, Cloudera, MySQL, Talend, Cassandra

Confidential, Fort Worth, Texas

Data Engineer

Responsibilities:

Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
Creating Hive tables to import large data sets from various relational databases using Sqoop and export teh analyzed data back for visualization and report generation by teh BI team.
Used Spark-Streaming APIs to perform necessary transformations and actions on teh fly for building teh common learner data model which gets teh data from Kafka in near real time.
Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
Developed Spark scripts by using Java, and Python shell commands as per teh requirement.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Experienced in querying HBase using Impala
Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
Extracted files fromMongoDBthrough Sqoop and placed in HDFS and processed.
Maintained teh cluster securely using Kerberos and making teh cluster up and running all teh times.
Implemented optimization and performance testing and tuning of Hive and Pig.
Experience in migrating HiveQL intoImpalato minimize query response time.
Developed a data pipeline usingKafkato store data into HDFS.
Worked on reading multiple data formats on HDFS using Scala
Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SqlContext.
Performed analysis on implementing Spark usingScala.
Implemented spark sample programs inpythonusing pyspark.
Responsible for creating, modifying topics (KafkaQueues) as and when required with varying configurations involving replication factors and partitions.
Written shell scripts andPythonscripts for automation of job.

Environment: Cloudera, HDFS, Hive, HQL scripts, Map Reduce, Java, HBase, Pig, Sqoop, Kafka, Impala, Shell Scripts, Python Scripts, Spark, Scala, Oozie.

Confidential, Englewood, CO

Hadoop Developer

Responsibilities:

Developed MapReduce programs to parse teh raw data, and create intermediate data which would be further used to be loaded into Hive portioned data.
Involved in creating Hive ORC tables, loading teh data into it and writing Hive queries to analyze teh data.
Involved in data ingestion into HDFS using Sqoop for full load and Flume for incremental load on variety of sources like web server, RDBMS and Data API’s.
Performed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing
Used different file formats like Text files, Sequence Files, Avro, Optimized Row Columnar (ORC)
Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume
Expertise in creating TWS Jobs and Jobstreams and automate them as per schedule
Worked on Gloden Gate replication tool to get data from various data sources into HDFS
Worked on HBase for support enterprise production and loading data into HBASE using SQOOP.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Exported teh data from Avro files and indexed teh documents in ORC file format.
Responsible for created Technical Specification documents for teh generated extracts
Involved in performance tuning using Partitioning, bucketing of Hive tables
Created UDFs to calculate teh pending payment for teh given customer data based on last day of every month and used in Hive Scripts.
Involved in writing shell scripts to run teh jobs in parallel and increase teh performance
Involved in running TWS jobs for processing millions of records using ITG.

Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, PIG, Oozie, Sqoop, HBase, Flume, Linux, Shell scripting, Java, Eclipse, SQL

Confidential, Dallas, TX

Hadoop Developer

Responsibilities:

Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
Used Sqoop to export data from HDFS to RDBMS.
Having experience on Hadoop eco system components HDFS, MapReduce, Hive, Pig, Sqoop and HBase.
Expertise with web based GUI architecture and development using HTML, CSS, AJAX, JQuery, Angular Js, and JavaScript.
Developed Map Reduce programs for some refined queries on big data.
Involved in loading data from UNIX file system to HDFS.
Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing teh data onto HDFS.
Extracted teh data from Databases into HDFS using Sqoop.
Handled importing of data from various data sources, performed transformations using Hive, PIG and loaded data into HDFS.
Used PIG predefined functions to convert teh fixed width file to delimited file
Used HIVE join queries to join multiple tables of a source system and load them into Elastic Search Tables.
Automated teh workflow using shell scripts.
Manage and review Hadoop log files. Implemented lambda architecture as s solution to a problem.
Involved in analysis, design, testing phases and responsible for documenting technical specifications.
Very good understanding of Partitions, bucketing concepts Managed and External tables in Hive to optimize performance.
Expertise in writingHadoopJobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
Experienced in running Hadoop streaming jobs to process terabytes data in Hive and designed both.
Developed workflow in Oozie to automate teh tasks of loading teh data into HDFS and pre-processing with Pig.
Created reports for teh BI team using Sqoop to export data into HDFS and Hive.

Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, PIG, Oozie, Sqoop, HBase, Flume, Linux, Shell scripting, Java, Eclipse, SQL

Confidential

Java Developer

Responsibilities:

Also helped developing UML Diagrams: Use cases, Activity diagram, Sequence diagram, class Diagram.
Developed JSPs and Servlets to dynamically generate HTML and display teh data to teh client side.
Developed application on Struts MVC architecture utilizing Action Classes, Action Forms and validations
Was responsible in implementing various J2EE Design Patterns like Service Locator, Business Delegate, Session Facade and Factory Pattern.
Involved in teh design and decision making for Hibernate OR Mapping.
Developed Hibernate Mapping file (.hbm.xml) files for mapping declarations.
Configured Queues in Web Logic server where teh messages, using JMS API, were published.
Consumed Web Services (WSDL, SOAP, and UDDI) from third party for authorizing payments to/from customers.
Writing/Manipulating teh database queries, stored procedures for Oracle9i.
Developed teh services by following a full flown Test Driven Development.
Interacting with teh system analysts, business users for design & requirement clarifications.
Designed front end pages using Jsp, HTML, Angular JS, JQuery, JavaScript, Css and Ajax calls to get teh required data from backend.
Designed and developed this application using Spring MVC.
Spring Framework IOC (Inversion of Control) design pattern is used to have relationships between application components.
Developed teh DAO layer for teh application using Hibernate and JDBC.
Implemented Restful web services with JAX-RS (Jersey).
Used JMS (Java Message Service) to send, receive and read messages in teh application.
Used Junit for testing.
Used databases like Oracle 10g, DB2 and wrote complex SQL statements, PL/SQL Procedures, Cursors to retrieve data from DB.
Used extensively Eclipse and RAD in development and debugging teh application.
Used maven as a project build, dependency and management tool.

Environment: Java,/J2EE, Eclipse, Web Logic Application Server, Oracle, JSP1, HTML, JavaScript, JMS, Servlets, UML, XML, Struts, Web Services, WSDL, SOAP, UDDI, ANT, JUnit, Log4j.

Confidential

Java Developer

Responsibilities:

Responsible for understanding teh business requirement.
Worked with Business Analyst and helped representing teh business domain details in technical specifications.
Also helped developing UML Diagrams: Use cases, Activity diagram, Sequence diagram, class Diagram.
Was also actively involved in setting coding standards and writing related documentation.
Developed teh Java Code using Eclipse as IDE.
Developed JSPs and Servlets to dynamically generate HTML and display teh data to teh client side.
Developed application on Struts MVC architecture utilizing Action Classes, Action Forms and validations.
Tiles were used as an implementation of Composite View pattern
Was responsible in implementing various J2EE Design Patterns like Service Locator, Business Delegate, Session Facade and Factory Pattern.
Code Review & Debugging using Eclipse Debugger.
Involved in teh design and decision making for Hibernate OR Mapping.
Developed Hibernate Mapping file (.hbm.xml) files for mapping declarations.
Configured Queues in WebLogic server where teh messages, using JMS API, were published.
Consumed Web Services (WSDL, SOAP, and UDDI) from third party for authorizing payments to/from customers.
Writing/Manipulating teh database queries, stored procedures for Oracle9i.

Environment: Java,/J2EE, Eclipse, Web Logic Application Server, Oracle, JSP, HTML, JavaScript, JMS, Servlets, UML, XML, Struts, WSDL, SOAP, UDDI, ANT, JUnit, Log4j.

We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

Dallas, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship