Data Engineer Resume Dallas, TX - Hire IT People

SUMMARY

Hadoop Developer with 7+ years of IT experience on designing and implementing complete end - to-end Hadoop Ecosystem which includes HDFS, MAPREDUCE, YARN, PIG, HIVE, HBASE, FLUME, SQOOP, OOZIE, SPARK, KAFKA and ZOOKEEPER.
Excellent hands on knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
Hands on experience with Real time streaming using Spark and Kafka Streaming into HDFS.
Great Experience in Handling large Datasets using Partitions, Spark In-memory capabilities and Broadcasts in Spark
Strong Knowledge on Spark concepts like RDD operations, Caching and Persistence.
Developed Analytical Components using Spark SQL and Spark Stream.
Experience in analyzing the data using Hive UDF and Hive UDTF custom MapReduce programs in Java.
Expert in working with Hive Data Warehouse tool-creating tables, Data Distribution by implementing Partitions and Bucketing.
Expertise in all the Hive functionalities and migration of data from different databases like ORACLE, DB2, MYSQL and MongoDB.
Handled Upgrades of Apache Ambari, CDH and HDP Cluster.
Experienced working with Amazon EMR, Cloudera (CDH3 & CDH4) and Horton Works Hadoop Distributions.
Hands on Experience with AWS infrastructure services, Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2).
Experienced in writing Test cases and implement unit test cases using testing frame works like Junit, Easy mock and Mockito.
Experienced in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
Hands on experience in Application Development by writing MapReduce jobs in JAVA, PYTHON, Hive Queries, RDBMS and Linux shell scripting.
Experienced in build scripts using Maven and do continuous integrations systems like Jenkins.
Well experienced in using applications like WebLogic, Web Sphere and Java tools in client server.
Excellent knowledge in Python, Java, Collections, J2EE, Servlets, JSP, Spring, Hibernate, JDBC/ODBC.
Hands on experience with NoSQL databases like HBase, Cassandra and Relational Databases like Oracle and MySQL
Deep Analytics and understanding of Big Data and Algorithms using Hadoop, MapReduce, NoSQL and distributed computing tools.
Experienced in using Data Warehousing ETL concepts using Informatica Power Center, OLAP, OLTP and AutoSys
Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
Experienced in designing both time driven and data driven automated workflows using OOZIE order to run jobs of Hadoop MapReduce and PIG.
Good experience in doing project impact assessment, Project Schedule Planning, onsite offshore Team coordination, End user coordination starting from requirement gathering to till live support.
Successful in meeting new technical challenges and finding solutions to meet the needs of the customer.
Successfully working in fast-paced environment, both independently and in collaborative team environments.
Strong Business, Analytical and Communication Skills.

TECHNICAL SKILLS

Big Data Technologies: HADOOP, HDFS, SPARK, HIVE, PIG, HBASE, SQOOP, OOZIE, FLUME, KAFKA, ZOOKEEPER.

Real-Time/Stream Processing: Apache Storm, Apache Spark, Flume.

Distributed Message Broker: Apache Kafka

Databases: Oracle9.x, 10g, 11g MS SQL Server, MySQL Server, DB2, HBase, MongoDB, Cassandra.

Database/NoSQL: HBase, Oracle 9i,10g,12c, MySQL

Scripting Languages: JavaScript, shell, python

Network & Protocols: TCP/IP, Telnet, HTTP, HTTPS, FTP, SNMP, LDAP, DNS.

Operating Systems: Linux, UNIX, MAC, Windows NT / 98 /2000/ XP / Vista, Windows 7, Windows 8.

PROFESSIONAL EXPERIENCE

Confidential - Dallas, TX

Data Engineer

Responsibilities:

Involved in Requirement Gathering, Business Analysis and Translated Business requirements into Technical Design in Hadoop and Big Data
Great Hands-on experience working on different Hadoop ecosystem components like Pig, Hive, Sqoop, Spark, Kafka.
In-depth understanding of Spark Architecture including Spark Core , Spark SQL , Data Frames , Spark Streaming , Spark MLlib .
Implemented Incremental load approach in spark for huge tables.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
Worked on different file formats like JSON,CSV,XML using spark SQL.
Imported and exported data into HDFS from database and vice versa using Sqoop .
Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS
Worked with CDH4 as well as CDH5 applications. Performed Data transfer of large data back and forth from development and production clusters.
Created Partitions and Buckets in Hive for both Managed and External tables in Hive for optimizing performance.
Implemented Hive Join queries to join multiple tables of a source system and load them to Elastic search tables
Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying .
Developed Data Lake as a Data Management Platform for Hadoop.
Used Talend to run the ETL processes instead of Hive queries
Successfully moved data from Hadoop to Cassandra using Bulk output format class.
Extracted data from Teradata database and loaded into Data warehouse using spark-JDBC.
Handled Data Movement, Data transformation, Analysis and visualization across the lake by integrating it with various tools.
Experienced in code repositories like GitHub.
Good Understanding of NoSQL database and hands on experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB

Environment: Hadoop YARN, Spark Core, Spark Streaming, Spark SQL, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Talend, Oozie, Jenkins, Cloudera, Oracle 12c, Linux

Confidential - Denver, Co

Data Engineer

Responsibilities:

Responsible for Building Scalable Distributed Data solutions using Hadoop.
Real time data processing (Kafka, Spark Streaming & Spark Structured Streaming), Worked on Spark SQL, Structured Streaming, MLlib and using Core Spark API to explore Spark features to build data pipelines, Implemented Spark streaming applications & fine tune to reduce shuffling.
Handled large datasets using partitions, Spark In-Memory capabilities, Broadcasts in Spark, Effective & Efficient Joins, transformations and other during ingestion process itself.
Worked in Performance Tuning of Spark Applications for setting right batch interval time, correct level of parallelism and memory tuning.
Worked on a Cluster of Size 105 nodes .
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Python.
Handled Importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
Implemented schema extraction for Parquet and Avro file formats in Hive.
Used Hive and created Hive tables, loaded data from Local file system to HDFS
Used hive to do transformations, event joins and pre-aggregations before storing the data to HDFS
Implemented Partitioning, Dynamic Partitions, and Buckets on huge datasets to analyze and compute various metrics for reporting.
Involved in HBase setup and storing data into HBase for future analysis.
Good experience working on Tableau and Spotfire and enabled the JDBC/ODBC data connectivity from those to Hive tables.
Used Oozie workflow to coordinate pig and Hive scripts.
Used Stash Git-Bucket for Code Control and Worked on AWS Components such as Airflow, Elastic Map Reduce (EMR), Athena and Snowflake .
Written Hive Queries for ad hoc data analysis to meet business requirements.

Environment: HDFS, MapReduce, Hive, PIG, Sqoop, HBase, Oozie, Flume, Sqoop, Kafka, Zookeeper, Amazon AWS, SparkSQL, Spark Dataframes, PySpark, Python, Java, JSON, SQL Scripting and Linux Shell Scripting, Avro, Parquet, Hortonworks.

Confidential, Herndon, VA

Data Engineer

Responsibilities:

Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
Configured Sqoop Jobs to import data from RDBMS into HDFS using Oozie workflows.
Experienced in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
Involved in installation and configuration of Hadoop MapReduce, HDFS and Developed multiple MapReduce jobs in Java for Data Cleaning and Processing.
Load and Transform huge datasets of structured and semi-structured data using Hive.
Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
Created Hive tables and Developed Hive queries for De-normalizing the Data.
Created PIG Latin Scripts to sort, group, join and filter the enterprise wise data.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Created batch analysis job prototypes using Hadoop, Pig, Oozie, Hue and Hive.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
Worked on the root cause analysis for all the issues that occur in batch and provide the permanent fixes for the issues.
Involved in Analyzing system failures, identifying root causes, and recommended course of actions
Created and maintained technical documentation for all the workflows.
Created database access layer using JDBC and SQL stored procedures.
Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

Environment: Hadoop YARN, Hive, Sqoop, Amazon AWS, Java, Python, Oozie, Jenkins, Cassandra, Oracle 12c, Linux.

Confidential - Lakewood, Oh

Data Engineer

Responsibilities:

Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
Used Oozie to orchestrate the workflow.
Involved in loading data from LINUX file system to HDFS
Analyzed data using Hadoop components Hive and Pig.
Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations.
Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
Moved the data from Oracle, MSSQL Server in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
Written HBASE Client program in Java and Webservices.
Mentored analyst and test team for writing Hive Queries.
Implemented test scripts to support test driven development and continuous integration
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Gained excellent hands on knowledge on Hadoop cluster, MapReduce Jobs, Data Migration concepts in Hive.

Environment: Hadoop, MapReduce, HDFS, Sqoop, Hive, Java, Cloudera, Pig, HBase, Linux, XML, MySQL Workbench, Java, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS.

We provide IT Staff Augmentation Services!

Data Engineer Resume

Dallas, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship