Big Data Engineer Resume NJ - Hire IT People

SUMMARY

Over 8 years of experience in Big Data technologies related to banking and Financial services
Significant expertise in implementing Big Data Ecosystem components like HDFS, Hive, Sqoop, Spark, Spark Core, Spark Streaming, Spark SQL, Zookeeper, Flume, Kafka, Oozie
Created end to end data pipelines using Big Data tools
Productionizing Big Data Applications
Experience in creating teh data lakes in consultation wif Data Warehousing teams. Defining teh data layouts.
Implemented Partitions, bucketing concepts in Hive to optimize performance and developed HiveQL aggregations based on requirements
Experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata and S3
Experience in NOSQL databases like MongoDB, HBase, Cassandra
Hands - on experience wif message broker such as Apache Kafka
Extensive experience wif SQL, PL/SQL and database concepts
Worked extensively wif Spark tools like RDD transformations, Data Frames, Spark MLlib, Spark SQL and Streaming API
Strong experience in writing applications using python using different libraries likePandas, Numpy
Involved in writing SerDe regular expressions to read unstructured data from various sources
Experience wif onboarding new tools or technologies by carrying out different proof of concepts and defining different metrics for evaluation of tools or technologies
Worked on multiple stages of Software Development Life Cycle including Development, Component Integration, Performance Testing, Deployment and Support Maintenance
Experience in working wif Cloudera, Hortonworks Hadoop Distributions
Strong communication skills wif professional attitude and can take teh pressures to drive wif enthusiasm to support wif full potential

TECHNICAL SKILLS

Hadoop Core Services: HDFS, Spark, YARN

Hadoop Distribution: CDH 3 and 4, Hortonworks

Hadoop Data Services: Hive, Pig, Sqoop, Spark, Kafka

Hadoop Services: Zookeeper, Oozie

Programming Languages: Python, Scala, SQL, Shell Scripting

Python: Pandas, NumPy, Matplotlib, Plotly, Seaborn

Operating Systems: Windows, Linux, Unix, centos 5,6

IDE Tools: Eclipse, IntelliJ, Net beans

Databases: MySQL, HBase, Mongo DB, Oracle

Others: Git, Putty, Tableau

PROFESSIONAL EXPERIENCE

Confidential, NJ

Big Data Engineer

Responsibilities:

Involved in complete Bigdata flow of teh application starting from data ingestion from upstream to HDFS, processing and analyzing teh data from HDFS
Created Partitioned and Bucketed Hive tables in ORC File Formats using Zlib/Snappy Compression from Avro tables
Involved in performance tuning of Hive from design, storage and query perspectives
Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it
Experienced in implementing Spark RDD, Data Frames and performed transformations based on requirements
Involved in writing queries in Spark SQL using PySpark
Performed real-time analysis of teh incoming data using Kafka consumer API, Kafka topics, Spark Streaming utilizing Scala
Developed python scripts to collect data from source systems and store it on HDFS to run analytics
Analyzed teh SQL scripts and designed teh solution to implement using python.
Implemented Oozie Operational Services for batch processing and scheduling workflows dynamically
Involved in designing and developing tables in HBase and storing data
Experienced in troubleshooting errors in HBase Shell/API, Hive
Worked on Hortonworks Data Platform (HDP 2.4) Hadoop distribution for data querying using Hive to store and retrieve data

Environment: HDFS, Hive, Sqoop, Spark, Spark Streaming, Spark SQL, HBase, PySpark, Kafka, Scala, Python, Oracle

Confidential, NY

Big Data Engineer

Responsibilities:

Importing and exporting data into HDFS and Hive using Sqoop
Experience working on processing unstructured data using Hive
Implemented Partitioning, Dynamic Partitions, Buckets in Hive
Developed Hive queries and Spark SQL queries to analyze large datasets
Exported teh result set from Hive to MySQL using Sqoop
Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries
Worked on debugging, performance tuning of Hive
Gained experience in managing and reviewing Hadoop log files
Involved in scheduling Oozie workflow engine to run multiple Hive jobs
Used NoSQL database wif Hbase
Actively involved in code review and bug fixing for improving teh performance

Environment: Hadoop, HDFS, Hive, Sqoop, Spark, Flume, LINUX, Hbase, Oozie

Confidential

Hadoop Developer

Responsibilities:

Implemented Proof of concepts on Hadoop Stack and different big data analytic tools, migration from different databases to Hadoop.
Developed teh Sqoop scripts to make teh interaction between Hive and MySQL Database
Responsible for analyzing and cleansing raw data by performing Hive queries
Used analytical tools including Hive, Spark wif Cloudera distribution
Experience in creatingHivetables to store teh processed results in a tabular format
Participated in development/implementation of Cloudera Hadoop environment
Implemented optimization techniques likepartitionsandbucketingto provide better performance wif HiveQL queries
Created custom user defined functions in Hive using Python
Involved in moving all log files generated from various sources to HDFS for further processing through Flume
Designed technical solution for real-time analytics using Kafka and Spark
Designed and Modified database tables and used MongoDB queries to insert and fetch data from tables
Used PySpark for streaming, interactive queries, and iterative algorithms
Used Oozie operational services for batch processing and scheduling workflows dynamically

Environment: Hadoop, Spark, YARN, Flume, Hive, Scoop, Oozie, MongoDB, HDFS, Zookeeper, Oracle, MYSQL

Confidential

Hadoop Developer

Responsibilities:

Importing and exporting data into HDFS from Relational databases using Sqoop
Created partitions, bucketing across state in Hive to handle structured data.
Implemented Dash boards dat handle HiveQL queries internally like Aggregation functions, basic hive operations, and different kind of join operations.
Implemented business logic based on state in Hive using Generic UDF's.
Involved in moving all log files generated from various sources to HDFS for processing through Kafka, Flume.
Created Hive tables to store teh processed results in a tabular format.
Involved in developing Hive UDFs and reused in some other requirements.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and has a good experience in using Spark-Shell and Spark Streaming.
Extracted files from MongoDB through Sqoop and placed in HDFS and processed

Environment: CDH 3, HDFS, Hive, Spark, Spark Streaming, Kafka, Flume, Sqoop, MongoDB, My SQL.

Confidential

SQL Developer

Responsibilities:

Designed, Created and maintained database objects like Tables, Views, Stored Procedures, User Defined Functions usingSQL server
Applied business rules to perform extensive data scrubbing to maintain data quality and consistency.
Create Tables, Views, Indexes based on teh requirements
Created SQL reports, data extraction and data loading scripts for different databases and schemas
Designing teh Databases, Developing Business Intelligence Analysis, Design Specifications Implement and Reporting wif Microsoft SQL Server
Identified relationships between tables, enforced referential integrity using foreign key constraints.
Worked in Production Support Environment as well as QA/TEST environments for projects, work orders, maintenance requests, bug fixes, enhancements, data changes, etc
Wrote packages to fetch complex data from different tables in remote databases using joins, sub queries and database links
Developed and supported analysis solutions, data transformations, and reports.

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship