Big Data Engineer Resume
NJ
SUMMARY:
- Over 8 years of experience in Big Data technologies related to banking and Financial services
- Significant expertise in implementing Big Data Ecosystem components like HDFS, Hive, Sqoop, Spark, Spark Core, Spark Streaming, Spark SQL, Zookeeper, Flume, Kafka, Oozie
- Created end to end data pipelines using Big Data tools
- Productionizing Big Data Applications
- Experience in creating the data lakes in consultation with Data Warehousing teams. Defining the data layouts.
- Implemented Partitions, bucketing concepts in Hive to optimize performance and developed HiveQL aggregations based on requirements
- Experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata and S3
- Experience in NOSQL databases like MongoDB, HBase, Cassandra
- Hands - on experience with message broker such as Apache Kafka
- Extensive experience with SQL, PL/SQL and database concepts
- Worked extensively with Spark tools like RDD transformations, Data Frames, Spark MLlib, Spark SQL and Streaming API
- Strong experience in writing applications using python using different libraries like Pandas, Numpy
- Involved in writing SerDe regular expressions to read unstructured data from various sources
- Experience with onboarding new tools or technologies by carrying out different proof of concepts and defining different metrics for evaluation of tools or technologies
- Worked on multiple stages of Software Development Life Cycle including Development, Component Integration, Performance Testing, Deployment and Support Maintenance
- Experience in working with Cloudera, Hortonworks Hadoop Distributions
- Strong communication skills with professional attitude and can take the pressures to drive with enthusiasm to support with full potential
TECHNICAL SKILLS:
Hadoop Core Services: HDFS, Spark, YARN
Hadoop Distribution: CDH 3 and 4, Hortonworks
Hadoop Data Services: Hive, Pig, Sqoop, Spark, Kafka
Hadoop Services: Zookeeper, Oozie
Programming Languages: Python, Scala, SQL, Shell Scripting
Python: Pandas, NumPy, Matplotlib, Plotly, Seaborn
Operating Systems: Windows, Linux, Unix, centos 5,6
IDE Tools: Eclipse, IntelliJ, Net beans
Databases: MySQL, HBase, Mongo DB, Oracle
Others: Git, Putty, Tableau
PROFESSIONAL EXPERIENCE:
Confidential, NJ
Big Data Engineer
Roles & Responsibilities:
- Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data from HDFS
- Created Partitioned and Bucketed Hive tables in ORC File Formats using Zlib/Snappy Compression from Avro tables
- Involved in performance tuning of Hive from design, storage and query perspectives
- Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it
- Experienced in implementing Spark RDD, Data Frames and performed transformations based on requirements
- Involved in writing queries in Spark SQL using PySpark
- Performed real-time analysis of the incoming data using Kafka consumer API, Kafka topics, Spark Streaming utilizing Scala
- Developed python scripts to collect data from source systems and store it on HDFS to run analytics
- Analyzed the SQL scripts and designed the solution to implement using python.
- Implemented Oozie Operational Services for batch processing and scheduling workflows dynamically
- Involved in designing and developing tables in HBase and storing data
- Experienced in troubleshooting errors in HBase Shell/API, Hive
- Worked on Hortonworks Data Platform (HDP 2.4) Hadoop distribution for data querying using Hive to store and retrieve data
Environment: HDFS, Hive, Sqoop, Spark, Spark Streaming, Spark SQL, HBase, PySpark, Kafka, Scala, Python, Oracle
Confidential, NY
Big Data Engineer
Roles & Responsibilities:
- Importing and exporting data into HDFS and Hive using Sqoop
- Experience working on processing unstructured data using Hive
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive
- Developed Hive queries and Spark SQL queries to analyze large datasets
- Exported the result set from Hive to MySQL using Sqoop
- Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries
- Worked on debugging, performance tuning of Hive
- Gained experience in managing and reviewing Hadoop log files
- Involved in scheduling Oozie workflow engine to run multiple Hive jobs
- Used NoSQL database with Hbase
- Actively involved in code review and bug fixing for improving the performance
Environment : Hadoop, HDFS, Hive, Sqoop, Spark, Flume, LINUX, Hbase, Oozie
Confidential
Hadoop Developer
Roles & Responsibilities:
- Implemented Proof of concepts on Hadoop Stack and different big data analytic tools, migration from different databases to Hadoop.
- Developed the Sqoop scripts to make the interaction between Hive and MySQL Database
- Responsible for analyzing and cleansing raw data by performing Hive queries
- Used analytical tools including Hive, Spark with Cloudera distribution
- Experience in creating Hive tables to store the processed results in a tabular format
- Participated in development/implementation of Cloudera Hadoop environment
- Implemented optimization techniques like partitions and bucketing to provide better performance with HiveQL queries
- Created custom user defined functions in Hive using Python
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume
- Designed technical solution for real-time analytics using Kafka and Spark
- Designed and Modified database tables and used MongoDB queries to insert and fetch data from tables
- Used PySpark for streaming, interactive queries, and iterative algorithms
- Used Oozie operational services for batch processing and scheduling workflows dynamically
Environment: Hadoop, Spark, YARN, Flume, Hive, Scoop, Oozie, MongoDB, HDFS, Zookeeper, Oracle, MYSQL
Confidential
Hadoop Developer
Roles & Responsibilities:
- Importing and exporting data into HDFS from Relational databases using Sqoop
- Created partitions, bucketing across state in Hive to handle structured data.
- Implemented Dash boards that handle HiveQL queries internally like Aggregation functions, basic hive operations, and different kind of join operations.
- Implemented business logic based on state in Hive using Generic UDF's.
- Involved in moving all log files generated from various sources to HDFS for processing through Kafka, Flume.
- Created Hive tables to store the processed results in a tabular format.
- Involved in developing Hive UDFs and reused in some other requirements.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed
Environment: CDH 3, HDFS, Hive, Spark, Spark Streaming, Kafka, Flume, Sqoop, MongoDB, My SQL.
Confidential
SQL Developer
Roles & Responsibilities:
- Designed, Created and maintained database objects like Tables, Views, Stored Procedures, User Defined Functions using SQL server
- Applied business rules to perform extensive data scrubbing to maintain data quality and consistency.
- Create Tables, Views, Indexes based on the requirements
- Created SQL reports, data extraction and data loading scripts for different databases and schemas
- Designing the Databases, Developing Business Intelligence Analysis, Design Specifications Implement and Reporting with Microsoft SQL Server
- Identified relationships between tables, enforced referential integrity using foreign key constraints.
- Worked in Production Support Environment as well as QA/TEST environments for projects, work orders, maintenance requests, bug fixes, enhancements, data changes, etc
- Wrote packages to fetch complex data from different tables in remote databases using joins, sub queries and database links
- Developed and supported analysis solutions, data transformations, and reports.