Big Data Engineer Resume Franklin Lakes, NJ - Hire IT People

SUMMARY:

Over 14+ years of professional, ITES and IT experience including 4+ years in Hadoop/Big data ecosystem.
Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node and Data Node.
Hands on experience in installing and deployment of Hadoop ecosystem components like Hadoop Map Reduce, YARN, HDFS, NoSQL, HBase, Oozie, Hive, Tableau, Sqoop, Zoo Keeper and Flume.
PySpark API's working knowledge.
Strong experience in Hadoop distributions like Cloudera and Hortonworks.
Excellent Hands on Experience in developing Hadoop Architecture within the project in Windows and Linux platforms.
Good technical Skills in Oracle 11i, SQL Server, ETL Development using Informatica tool.
Expert in importing and exporting data from Oracle and MySQL databases into HDFS using Sqoop and Flume.
Experience in ingesting the streaming data to Hadoop clusters using Flume and Kafka.
Performed data analytics using PIG and Hive for Data Architects and Data Scientists within the team.
Experience with NoSQL databases like HBase, and Cassandra as well as other ecosystems like Zookeeper, Oozie, Storm etc.
Experience in Job scheduling using automation tools like Control - M, JAMS, Autosys.
Developed stored procedures and queries using PL/SQL.
Expertise in RDBMS like Oracle, MS SQL Server, TERADATA, MySQL and DB2.
Strong analytical skills with the ability to quickly understand a client’s business needs. Involved in meetings to gather information and requirements from the clients. Leading the Team and involved in Onsite, Offshore co-ordination.

SKILLS & ABILITIES:

Analytical Tools: SQL, Jupiter Notebook, Tableau

NoSQL: Cassandra, HBase, MongoDB

Hadoop Distributions: Cloudera, Hortonworks

Workload Automation Tool: Control-M, JAMS, Autosys.

Big Data: Spark, Hive, Sqoop, HBase, Hadoop, HDFS, Flume, Shell Script PySpark, Scala.

Databases: Oracle 11g/10g, DB2 8.1, MS-SQL Server, My SQL

Operating Systems: Unix / Linux, Windows 2000/NT/XP

PROFESSIONAL EXPERIENCE:

Confidential, Franklin Lakes, NJ

Big Data Engineer

Responsibilities:

Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design in Hadoop and Big Data.
Created Hive Partitioned and Bucketed tables to improve performance.
Involved in capacity planning, the configuration of the Cassandra Cluster on DATASTAX.
Design, development, and implementation of performant ETL pipelines using python API (pySpark) of Apache Spark.
Writing reusable, testable, and efficient code Experience in using Sqoop to import the data on to Cassandra tables from different relational databases.
Importing and exporting data into HDFS from database and vice versa using Sqoop.
Worked on the core and Spark SQL modules of Spark extensively using programming languages like Python and Scala.
Closely work Business Intelligence team for Data Analysis, reporting dashboard and recommending solutions.
Utilizing Spark streaming to receive real-time data from the Kafka and store the stream data to HDFS using Python/PySpark also Scala and databases such as HBase
Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
Performance tuning of pySpark scripts.
Involved in creating Hive tables, loading with data and writing hive queries.
Responsible in exporting analyzed data to relational databases using Sqoop.
Creating the tables in Hive and integrating data between Hive & Spark.
Responsible for tuning Hive to improve performance.
Documented the technical details Hadoop cluster management and daily batch pipeline, which includes several jobs of Hive, Sqoop, Oozie, and other scripts.

Environment: Cassandra, HDFS, Hbase, Spark, pySpark, Hortonworks, Hive, Oozie, YARN, and Sqoop

Confidential, El Segundo, CA

Big Data Developer

Responsibilities:

Defining the metrics for the Big data analytics proof of concepts
Defining the requirements for data lakes/pipelines
Creating the tables in Hive and integrating data between Hive & Spark
Developed python scripts to collect data from source systems and store it on HDFS to run analytics
Created Hive Partitioned and Bucketed tables to improve performance
Created Hive tables with User defined functions
Involved in code review and bug fixing for improving the performance
Worked on the core and Spark SQL modules of Spark extensively using programming languages like Scala
Perform extensive studies of different technologies and capture metrics by running different algorithms
Converting the SAS algorithms into different technologies
Design and implement data ingestion techniques for real-time data coming from various source systems
Defining the data layouts and rules and after consultation with ETL teams
Worked in an AGILE environment and participated in daily Stand-ups/Scrum Meetings

Environment: Python, HDFS, MapReduce, PL/SQL, Hive, Spark, AGILE, Spark SQL, Scala, Sqoop, Oracle, Quality Center, Windows.

Confidential

Big Data Consultant

Responsibilities:

Worked on analyzing Hadoop cluster and different Big Data analytic tools including Hive, HBase and SQOOP.
Coordinated with business customers to gather business requirements and interact with other technical peers to derive Technical requirements.
Extensively involved in the Design phase and delivered Design documents.
Involved in Testing and coordination with business in User testing.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Importing and exporting data into HDFS and Hive using SQOOP.
Transforming the data using Spark applications for analytics consumption.
Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Involved in creating Hive tables, loading with data and writing hive queries.
Experienced in defining job flows.
Experience in developing ETL data pipelines using pyspark.
Used Hive to analyze the partitioned data and compute various metrics for reporting.
Experienced in managing and reviewing the Hadoop log files.
Used ETL tool to do Transformations, even joins and some pre-aggregation.
Load and Transform large sets of structured and semi-structured data.
Responsible to manage data coming from different sources.
Created Data model for Hive tables.
Involved in Unit testing and delivered Unit test plans and results documents.
Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
Worked on Oozie workflow engine for job scheduling.

Environment: Hadoop, HDFS, Spark, Cloudera Manager, Hive, YARN, Sqoop, HBase, Oozie, Flume.

Confidential

Data Processing Analyst

Responsibilities:

Processing the applications from the across India.
Downloading the scanned images from the data source.
Creating a database by sequencing them in an order to be processed.
Digitalizing the data by keying in the data to the SQL database.
Creating the excel to tables with the data to make it available for the users.
Arranging the tables according to the user requirement using oracle database.

Environment: SQL Database, PL/SQL MS SQL.

Confidential, Harrisburg, PA

Process Associate

Responsibilities:

Claim forms will be received in Lotus notes in images.
Doing a batch process by typing the numbers and information from the image to template.
Vertexing (verifying text) by seeing the claim forms and keying in the information in the database.
If an image of multiple pages separating them in batches to join them.
Processing medical claim forms.
Reviewing the database done by other Verterxers.

Environment: IBM Lotus Notes

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Franklin Lakes, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship