We provide IT Staff Augmentation Services!

Big Data Engineer Resume

5.00/5 (Submit Your Rating)

NJ

SUMMARY

  • Over 8 years of experience in Big Data technologies related to banking and Financial services
  • Significant expertise in implementing Big Data Ecosystem components like HDFS, Hive, Sqoop, Spark, Spark Core, Spark Streaming, Spark SQL, Zookeeper, Flume, Kafka, Oozie
  • Created end to end data pipelines using Big Data tools
  • Productionizing Big Data Applications
  • Experience in creating teh data lakes in consultation wif Data Warehousing teams. Defining teh data layouts.
  • Implemented Partitions, bucketing concepts in Hive to optimize performance and developed HiveQL aggregations based on requirements
  • Experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata and S3
  • Experience in NOSQL databases like MongoDB, HBase, Cassandra
  • Hands - on experience wif message broker such as Apache Kafka
  • Extensive experience wif SQL, PL/SQL and database concepts
  • Worked extensively wif Spark tools like RDD transformations, Data Frames, Spark MLlib, Spark SQL and Streaming API
  • Strong experience in writing applications using python using different libraries likePandas, Numpy
  • Involved in writing SerDe regular expressions to read unstructured data from various sources
  • Experience wif onboarding new tools or technologies by carrying out different proof of concepts and defining different metrics for evaluation of tools or technologies
  • Worked on multiple stages of Software Development Life Cycle including Development, Component Integration, Performance Testing, Deployment and Support Maintenance
  • Experience in working wif Cloudera, Hortonworks Hadoop Distributions
  • Strong communication skills wif professional attitude and can take teh pressures to drive wif enthusiasm to support wif full potential

TECHNICAL SKILLS

Hadoop Core Services: HDFS, Spark, YARN

Hadoop Distribution: CDH 3 and 4, Hortonworks

Hadoop Data Services: Hive, Pig, Sqoop, Spark, Kafka

Hadoop Services: Zookeeper, Oozie

Programming Languages: Python, Scala, SQL, Shell Scripting

Python: Pandas, NumPy, Matplotlib, Plotly, Seaborn

Operating Systems: Windows, Linux, Unix, centos 5,6

IDE Tools: Eclipse, IntelliJ, Net beans

Databases: MySQL, HBase, Mongo DB, Oracle

Others: Git, Putty, Tableau

PROFESSIONAL EXPERIENCE

Confidential, NJ

Big Data Engineer

Responsibilities:

  • Involved in complete Bigdata flow of teh application starting from data ingestion from upstream to HDFS, processing and analyzing teh data from HDFS
  • Created Partitioned and Bucketed Hive tables in ORC File Formats using Zlib/Snappy Compression from Avro tables
  • Involved in performance tuning of Hive from design, storage and query perspectives
  • Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it
  • Experienced in implementing Spark RDD, Data Frames and performed transformations based on requirements
  • Involved in writing queries in Spark SQL using PySpark
  • Performed real-time analysis of teh incoming data using Kafka consumer API, Kafka topics, Spark Streaming utilizing Scala
  • Developed python scripts to collect data from source systems and store it on HDFS to run analytics
  • Analyzed teh SQL scripts and designed teh solution to implement using python.
  • Implemented Oozie Operational Services for batch processing and scheduling workflows dynamically
  • Involved in designing and developing tables in HBase and storing data
  • Experienced in troubleshooting errors in HBase Shell/API, Hive
  • Worked on Hortonworks Data Platform (HDP 2.4) Hadoop distribution for data querying using Hive to store and retrieve data

Environment: HDFS, Hive, Sqoop, Spark, Spark Streaming, Spark SQL, HBase, PySpark, Kafka, Scala, Python, Oracle

Confidential, NY

Big Data Engineer

Responsibilities:

  • Importing and exporting data into HDFS and Hive using Sqoop
  • Experience working on processing unstructured data using Hive
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive
  • Developed Hive queries and Spark SQL queries to analyze large datasets
  • Exported teh result set from Hive to MySQL using Sqoop
  • Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries
  • Worked on debugging, performance tuning of Hive
  • Gained experience in managing and reviewing Hadoop log files
  • Involved in scheduling Oozie workflow engine to run multiple Hive jobs
  • Used NoSQL database wif Hbase
  • Actively involved in code review and bug fixing for improving teh performance

Environment: Hadoop, HDFS, Hive, Sqoop, Spark, Flume, LINUX, Hbase, Oozie

Confidential

Hadoop Developer

Responsibilities:

  • Implemented Proof of concepts on Hadoop Stack and different big data analytic tools, migration from different databases to Hadoop.
  • Developed teh Sqoop scripts to make teh interaction between Hive and MySQL Database
  • Responsible for analyzing and cleansing raw data by performing Hive queries
  • Used analytical tools including Hive, Spark wif Cloudera distribution
  • Experience in creatingHivetables to store teh processed results in a tabular format
  • Participated in development/implementation of Cloudera Hadoop environment
  • Implemented optimization techniques likepartitionsandbucketingto provide better performance wif HiveQL queries
  • Created custom user defined functions in Hive using Python
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume
  • Designed technical solution for real-time analytics using Kafka and Spark
  • Designed and Modified database tables and used MongoDB queries to insert and fetch data from tables
  • Used PySpark for streaming, interactive queries, and iterative algorithms
  • Used Oozie operational services for batch processing and scheduling workflows dynamically

Environment: Hadoop, Spark, YARN, Flume, Hive, Scoop, Oozie, MongoDB, HDFS, Zookeeper, Oracle, MYSQL

Confidential

Hadoop Developer

Responsibilities:

  • Importing and exporting data into HDFS from Relational databases using Sqoop
  • Created partitions, bucketing across state in Hive to handle structured data.
  • Implemented Dash boards dat handle HiveQL queries internally like Aggregation functions, basic hive operations, and different kind of join operations.
  • Implemented business logic based on state in Hive using Generic UDF's.
  • Involved in moving all log files generated from various sources to HDFS for processing through Kafka, Flume.
  • Created Hive tables to store teh processed results in a tabular format.
  • Involved in developing Hive UDFs and reused in some other requirements.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and has a good experience in using Spark-Shell and Spark Streaming.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed

Environment: CDH 3, HDFS, Hive, Spark, Spark Streaming, Kafka, Flume, Sqoop, MongoDB, My SQL.

Confidential

SQL Developer

Responsibilities:

  • Designed, Created and maintained database objects like Tables, Views, Stored Procedures, User Defined Functions usingSQL server
  • Applied business rules to perform extensive data scrubbing to maintain data quality and consistency.
  • Create Tables, Views, Indexes based on teh requirements
  • Created SQL reports, data extraction and data loading scripts for different databases and schemas
  • Designing teh Databases, Developing Business Intelligence Analysis, Design Specifications Implement and Reporting wif Microsoft SQL Server
  • Identified relationships between tables, enforced referential integrity using foreign key constraints.
  • Worked in Production Support Environment as well as QA/TEST environments for projects, work orders, maintenance requests, bug fixes, enhancements, data changes, etc
  • Wrote packages to fetch complex data from different tables in remote databases using joins, sub queries and database links
  • Developed and supported analysis solutions, data transformations, and reports.

We'd love your feedback!