Sr. Big Data Engineer Resume
Minneapolis, MN
SUMMARY:
- Over 10+ years of extensive hands - on Big Data Capacity with teh halp of Hadoop Eco System across internal and cloud-based platforms.
- Expertise inCloud Computing and Hadooparchitecture and its various components -HadoopFile System HDFS, MapReduce, Spark, Name node, Data Node, Job Tracker, Task Tracker, Secondary Name Node.
- Experience working in different Google Cloud Platform Technologies like Big Query, Dataflow, Dataproc, Pubsub, Airflow.
- Design and Development of Ingestion Framework over Google Cloud and Hadoop cluster.
- Good Knowledge onHadoopCluster architecture and monitoring.
- Extensive Experience on importing and exporting data using Kafka.
- Experience in Spark, Scala, Kafka.
- Hands on experience of usingHadoopecosystem components like Oozie, Hive, Sqoop, Flume, HUE, Impala.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice versa.
- Experience in developing and scheduling ETL workflows in Hadoop usingOoziewith teh halp of deployment and managing Hadoop cluster using Cloudera and Hortonworks.
- Experience in importing data from a relational database to Hive metastore using Sqoop.
- Experience creating managed and external tables in Hive.
TECHNICAL SKILLS:
Big Data Ecosystems: Spark, HDFS and Map Reduce, Pig, Hive, Pig, YARN, Oozie, Zookeeper, Apache Spark, Apache NiFi, Apace STORM, Apache Kappa, Apache Kafka, Sqoop, Flume
Cloud Technologies: Google Cloud Platform, Pub/Sub, Dataflow, Big Query
Scripting Languages: Python, shell
Programming Languages: Python, Java, Scala
Databases: Netezza, SQL Server, MySQL, ORACLE, DB2
IDEs / Tools: Eclipse, JUnit, Maven, Ant, MS Visual Studio, Net Beans
Methodologies: Agile, Waterfall
PROFESSIONAL EXPERIENCE:
Confidential
Sr. Big Data Engineer
Responsibilities:
- Performed complex day-to-day development/support for GCPcloudenvironment
- Developed, maintain and tuned highly complex scripts using Python and Big Query
- Managed and designed solutions related to Data Engineering, Data Analysis, Data modelling, Data Warehouse, Data security, ETL.
- Implemented parsing and interpreter for migrating data from IBM mainframe to Google Big Query will be a big advantage.
- Building data integration and preparation tools using cloud technologies like Google Dataflow, Cloud Data prep, Python, etc.
- Assisted with Google Cloud Platform (GCP) operations
- Implemented CI/CD Pipelines initiating code builds from repositories and orchestrated deployment within Google Kubernetes Engine (GKE) containers
- Configured native security tools available within Google Cloud Platform
Confidential, Minneapolis, MN
Sr. Big Data Engineer / Cloud Engineer
Responsibilities:
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from MYSQL into HDFS vice-versa using Sqoop.
- Designed and implemented end to end big data platform on Teradata Appliance
- Performed ETL from multiple sources such as Kafka, NIFI, Teradata, DB2 using Hadoop spark.
- Worked on Apache Spark Utilizing teh Spark, SQL and Streaming components to support teh intraday and real-time data processing
- Developed Python, Bash scripts to automate and provide Control flow
- Moving data from Teradata to a Hadoop cluster Using TDCH/Fast export and Apache NIFI.
- Developed Python, PySpark, Bash scripts logs to Transform, and Load data across on premise and cloud platform.
- Building data pipeline ETLs for data movement to S3, then to Redshift.
- Written AWS Lambda code in Python for nested Json files, converting, comparing, sorting etc.
- Installed and configured apache airflow for workflow management and created workflows in python
- Wrote UDFs in Hadoop PySpark to perform transformations and loads.
- Wrote TDCH scripts and apache NIFI to load data from Mainframes DB2 to Hadoop cluster.
- Working with, ORC, AVRO and JSON, Parquetted file formats and create external tables and query on top of these files Using Big Query
Confidential, Denver, CO
Sr. data analyst/ Big Data Engineer
Responsibilities:
- Used Hive Queries in Spark-SQL for analysis and processing teh data
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters
- Written shell scripts that run multiple Hive jobs which halps to automate different Hive tables incrementally which are used to generate different reports using Tableau for teh Business use
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using teh Spark framework and handled Json Data
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data
- Involved in business analysis and technical design sessions with business and technical staff to develop requirements document and ETL design specifications.
- Wrote complex SQL scripts to avoid Informatica Look-ups to improve teh performance as teh volume of teh data was heavy.
- Responsible for design, development, Data Modelling, of Spark SQL Scripts based on Functional Specifications
- Designed and developed extract, transform, and load (ETL) mappings, procedures, and schedules, following teh standard development lifecycle
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Worked closely with Quality Assurance, Operations and Production support group to devise teh test plans, answer questions and solve any data or processing issues
- Worked on large-scale Hadoop YARN cluster for distributed data processing and analysis using Data Bricks Connectors, Spark core, Spark SQL, GCP, Sqoop, Hive and NoSQL databases
- Worked in writing Spark SQL scripts for optimizing teh query performance
- Concerned and well-informed on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming
- Implemented Hive UDF's and did performance tuning for better results
- Tuned, and developed SQL on HiveQL, Drill and SparkSQL
- Experience in using Sqoop to import and export teh data from Oracle DB into HDFS and HIVE
- Developed Spark code using Spark RDD and Spark-SQL/Streaming for faster processing of data
- Implemented Partitioning, Data Modelling, Dynamic Partitions and Buckets in HIVE for efficient data access
Confidential
Sr. Hadoop/Spark Developer
Responsibilities:
- Involved in teh process of Cassandra data modelling and building efficient data structures.
- Developed MapReduce (YARN) jobs for cleaning, accessing and validating teh data.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Developed optimal strategies for distributing teh web log data over teh cluster importing and exporting teh stored web log data into HDFS and Hive using Sqoop.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
- Analyzed teh web log data using teh HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
- Integrated Oozie with teh rest of teh Hadoop stack supporting several types of Hadoop jobs out of teh box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
- Created Hive tables and working on them using Hive QL.
- Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
- Worked on Cluster co-ordination services through Zookeeper.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
- Exported teh analyzed data to teh RDBMS using Sqoop for to generate reports for teh BI team.
- Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
Confidential
Data Analyst / Hadoop Developer
Responsibilities:
- Involved in full life cycle of teh project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Zookeeper and Sqoop.
- Configured, designed implemented and monitored Kafka cluster and connectors.
- Implemented a proof of concept (Poc's) using Kafka, Strom, Hbase for processing streaming data.
- Used Sqoop to import data into HDFS and Hive from multiple data systems.
- Developed complex queries using HIVE and IMPALA.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Handled importing of data from various data sources, performed transformations using Hive, Mapreduce, and Loaded data into HDFS.
- Helped with teh sizing and performance tuning of teh Cassandra cluster.
- Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDD's.
- Used Hive on Tez to transform teh data as per business requirements for batch processing.
- Developed multiple Poc's using Spark and deployed on teh Yarn cluster, compared teh performance of Spark, and SQL.
- B.S. in Computer Science - Punjab Technical University ( )