Big Data Engineer Resume NJ - Hire IT People

SUMMARY:

Over 8 years of experience in Big Data technologies related to banking and Financial services
Significant expertise in implementing Big Data Ecosystem components like HDFS, Hive, Sqoop, Spark, Spark Core, Spark Streaming, Spark SQL, Zookeeper, Flume, Kafka, Oozie
Created end to end data pipelines using Big Data tools
Productionizing Big Data Applications
Experience in creating the data lakes in consultation with Data Warehousing teams. Defining the data layouts.
Implemented Partitions, bucketing concepts in Hive to optimize performance and developed HiveQL aggregations based on requirements
Experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata and S3
Experience in NOSQL databases like MongoDB, HBase, Cassandra
Hands - on experience with message broker such as Apache Kafka
Extensive experience with SQL, PL/SQL and database concepts
Worked extensively with Spark tools like RDD transformations, Data Frames, Spark MLlib, Spark SQL and Streaming API
Strong experience in writing applications using python using different libraries like Pandas, Numpy
Involved in writing SerDe regular expressions to read unstructured data from various sources
Experience with onboarding new tools or technologies by carrying out different proof of concepts and defining different metrics for evaluation of tools or technologies
Worked on multiple stages of Software Development Life Cycle including Development, Component Integration, Performance Testing, Deployment and Support Maintenance
Experience in working with Cloudera, Hortonworks Hadoop Distributions
Strong communication skills with professional attitude and can take the pressures to drive with enthusiasm to support with full potential

TECHNICAL SKILLS:

Hadoop Core Services: HDFS, Spark, YARN

Hadoop Distribution: CDH 3 and 4, Hortonworks

Hadoop Data Services: Hive, Pig, Sqoop, Spark, Kafka

Hadoop Services: Zookeeper, Oozie

Programming Languages: Python, Scala, SQL, Shell Scripting

Python: Pandas, NumPy, Matplotlib, Plotly, Seaborn

Operating Systems: Windows, Linux, Unix, centos 5,6

IDE Tools: Eclipse, IntelliJ, Net beans

Databases: MySQL, HBase, Mongo DB, Oracle

Others: Git, Putty, Tableau

PROFESSIONAL EXPERIENCE:

Confidential, NJ

Big Data Engineer

Roles & Responsibilities:

Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data from HDFS
Created Partitioned and Bucketed Hive tables in ORC File Formats using Zlib/Snappy Compression from Avro tables
Involved in performance tuning of Hive from design, storage and query perspectives
Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it
Experienced in implementing Spark RDD, Data Frames and performed transformations based on requirements
Involved in writing queries in Spark SQL using PySpark
Performed real-time analysis of the incoming data using Kafka consumer API, Kafka topics, Spark Streaming utilizing Scala
Developed python scripts to collect data from source systems and store it on HDFS to run analytics
Analyzed the SQL scripts and designed the solution to implement using python.
Implemented Oozie Operational Services for batch processing and scheduling workflows dynamically
Involved in designing and developing tables in HBase and storing data
Experienced in troubleshooting errors in HBase Shell/API, Hive
Worked on Hortonworks Data Platform (HDP 2.4) Hadoop distribution for data querying using Hive to store and retrieve data

Environment: HDFS, Hive, Sqoop, Spark, Spark Streaming, Spark SQL, HBase, PySpark, Kafka, Scala, Python, Oracle

Confidential, NY

Big Data Engineer

Roles & Responsibilities:

Importing and exporting data into HDFS and Hive using Sqoop
Experience working on processing unstructured data using Hive
Implemented Partitioning, Dynamic Partitions, Buckets in Hive
Developed Hive queries and Spark SQL queries to analyze large datasets
Exported the result set from Hive to MySQL using Sqoop
Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries
Worked on debugging, performance tuning of Hive
Gained experience in managing and reviewing Hadoop log files
Involved in scheduling Oozie workflow engine to run multiple Hive jobs
Used NoSQL database with Hbase
Actively involved in code review and bug fixing for improving the performance

Environment : Hadoop, HDFS, Hive, Sqoop, Spark, Flume, LINUX, Hbase, Oozie

Confidential

Hadoop Developer

Roles & Responsibilities:

Implemented Proof of concepts on Hadoop Stack and different big data analytic tools, migration from different databases to Hadoop.
Developed the Sqoop scripts to make the interaction between Hive and MySQL Database
Responsible for analyzing and cleansing raw data by performing Hive queries
Used analytical tools including Hive, Spark with Cloudera distribution
Experience in creating Hive tables to store the processed results in a tabular format
Participated in development/implementation of Cloudera Hadoop environment
Implemented optimization techniques like partitions and bucketing to provide better performance with HiveQL queries
Created custom user defined functions in Hive using Python
Involved in moving all log files generated from various sources to HDFS for further processing through Flume
Designed technical solution for real-time analytics using Kafka and Spark
Designed and Modified database tables and used MongoDB queries to insert and fetch data from tables
Used PySpark for streaming, interactive queries, and iterative algorithms
Used Oozie operational services for batch processing and scheduling workflows dynamically

Environment: Hadoop, Spark, YARN, Flume, Hive, Scoop, Oozie, MongoDB, HDFS, Zookeeper, Oracle, MYSQL

Confidential

Hadoop Developer

Roles & Responsibilities:

Importing and exporting data into HDFS from Relational databases using Sqoop
Created partitions, bucketing across state in Hive to handle structured data.
Implemented Dash boards that handle HiveQL queries internally like Aggregation functions, basic hive operations, and different kind of join operations.
Implemented business logic based on state in Hive using Generic UDF's.
Involved in moving all log files generated from various sources to HDFS for processing through Kafka, Flume.
Created Hive tables to store the processed results in a tabular format.
Involved in developing Hive UDFs and reused in some other requirements.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
Extracted files from MongoDB through Sqoop and placed in HDFS and processed

Environment: CDH 3, HDFS, Hive, Spark, Spark Streaming, Kafka, Flume, Sqoop, MongoDB, My SQL.

Confidential

SQL Developer

Roles & Responsibilities:

Designed, Created and maintained database objects like Tables, Views, Stored Procedures, User Defined Functions using SQL server
Applied business rules to perform extensive data scrubbing to maintain data quality and consistency.
Create Tables, Views, Indexes based on the requirements
Created SQL reports, data extraction and data loading scripts for different databases and schemas
Designing the Databases, Developing Business Intelligence Analysis, Design Specifications Implement and Reporting with Microsoft SQL Server
Identified relationships between tables, enforced referential integrity using foreign key constraints.
Worked in Production Support Environment as well as QA/TEST environments for projects, work orders, maintenance requests, bug fixes, enhancements, data changes, etc
Wrote packages to fetch complex data from different tables in remote databases using joins, sub queries and database links
Developed and supported analysis solutions, data transformations, and reports.

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship