Sr. Big Data Engineer Resume
Minneapolis, MN
SUMMARY:
- Over 10+ years of extensive hands - on Big Data Capacity wif the help of Hadoop Eco System across internal and cloud-based platforms.
- Expertise inCloud Computing and Hadooparchitecture and its various components -HadoopFile System HDFS, MapReduce, Spark, Name node, Data Node, Job Tracker, Task Tracker, Secondary Name Node.
- Experience working in different Google Cloud Platform Technologies like Big Query, Dataflow, Dataproc, Pubsub, Airflow.
- Design and Development of Ingestion Framework over Google Cloud and Hadoop cluster.
- Good Knowledge onHadoopCluster architecture and monitoring.
- Extensive Experience on importing and exporting data using Kafka.
- Experience in Spark, Scala, Kafka.
- Hands on experience of usingHadoopecosystem components like Oozie, Hive, Sqoop, Flume, HUE, Impala.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice versa.
- Experience in developing and scheduling ETL workflows in Hadoop usingOoziewif the help of deployment and managing Hadoop cluster using Cloudera and Hortonworks.
- Experience in importing data from a relational database to Hive metastore using Sqoop.
- Experience creating managed and external tables in Hive.
TECHNICAL SKILLS:
Big Data Ecosystems: Spark, HDFS and Map Reduce, Pig, Hive, Pig, YARN, Oozie, Zookeeper, Apache Spark, Apache NiFi, Apace STORM, Apache Kappa, Apache Kafka, Sqoop, Flume
Cloud Technologies: Google Cloud Platform, Pub/Sub, Dataflow, Big Query
Scripting Languages: Python, shell
Programming Languages: Python, Java, Scala
Databases: Netezza, SQL Server, MySQL, ORACLE, DB2
IDEs / Tools: Eclipse, JUnit, Maven, Ant, MS Visual Studio, Net Beans
Methodologies: Agile, Waterfall
PROFESSIONAL EXPERIENCE:
Confidential
Sr. Big Data Engineer
Responsibilities:
- Performed complex day-to-day development/support for GCPcloudenvironment
- Developed, maintain and tuned highly complex scripts using Python and Big Query
- Managed and designed solutions related to Data Engineering, Data Analysis, Data modelling, Data Warehouse, Data security, ETL.
- Implemented parsing and interpreter for migrating data from IBM mainframe to Google Big Query will be a big advantage.
- Building data integration and preparation tools using cloud technologies like Google Dataflow, Cloud Data prep, Python, etc.
- Assisted wif Google Cloud Platform (GCP) operations
- Implemented CI/CD Pipelines initiating code builds from repositories and orchestrated deployment wifin Google Kubernetes Engine (GKE) containers
- Configured native security tools available wifin Google Cloud Platform
Confidential, Minneapolis, MN
Sr. Big Data Engineer / Cloud Engineer
Responsibilities:
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from MYSQL into HDFS vice-versa using Sqoop.
- Designed and implemented end to end big data platform on Teradata Appliance
- Performed ETL from multiple sources such as Kafka, NIFI, Teradata, DB2 using Hadoop spark.
- Worked on Apache Spark Utilizing the Spark, SQL and Streaming components to support the intraday and real-time data processing
- Developed Python, Bash scripts to automate and provide Control flow
- Moving data from Teradata to a Hadoop cluster Using TDCH/Fast export and Apache NIFI.
- Developed Python, PySpark, Bash scripts logs to Transform, and Load data across on premise and cloud platform.
- Building data pipeline ETLs for data movement to S3, then to Redshift.
- Written AWS Lambda code in Python for nested Json files, converting, comparing, sorting etc.
- Installed and configured apache airflow for workflow management and created workflows in python
- Wrote UDFs in Hadoop PySpark to perform transformations and loads.
- Wrote TDCH scripts and apache NIFI to load data from Mainframes DB2 to Hadoop cluster.
- Working wif, ORC, AVRO and JSON, Parquetted file formats and create external tables and query on top of these files Using Big Query
Confidential, Denver, CO
Sr. data analyst/ Big Data Engineer
Responsibilities:
- Used Hive Queries in Spark-SQL for analysis and processing the data
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters
- Written shell scripts dat run multiple Hive jobs which helps to automate different Hive tables incrementally which are used to generate different reports using Tableau for the Business use
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework and handled Json Data
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data
- Involved in business analysis and technical design sessions wif business and technical staff to develop requirements document and ETL design specifications.
- Wrote complex SQL scripts to avoid Informatica Look-ups to improve the performance as the volume of the data was heavy.
- Responsible for design, development, Data Modelling, of Spark SQL Scripts based on Functional Specifications
- Designed and developed extract, transform, and load (ETL) mappings, procedures, and schedules, following the standard development lifecycle
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Worked closely wif Quality Assurance, Operations and Production support group to devise the test plans, answer questions and solve any data or processing issues
- Worked on large-scale Hadoop YARN cluster for distributed data processing and analysis using Data Bricks Connectors, Spark core, Spark SQL, GCP, Sqoop, Hive and NoSQL databases
- Worked in writing Spark SQL scripts for optimizing the query performance
- Concerned and well-informed on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming
- Implemented Hive UDF's and did performance tuning for better results
- Tuned, and developed SQL on HiveQL, Drill and SparkSQL
- Experience in using Sqoop to import and export the data from Oracle DB into HDFS and HIVE
- Developed Spark code using Spark RDD and Spark-SQL/Streaming for faster processing of data
- Implemented Partitioning, Data Modelling, Dynamic Partitions and Buckets in HIVE for efficient data access
Confidential
Sr. Hadoop/Spark Developer
Responsibilities:
- Involved in the process of Cassandra data modelling and building efficient data structures.
- Developed MapReduce (YARN) jobs for cleaning, accessing and validating the data.
- Created and worked Sqoop jobs wif incremental load to populate Hive External tables.
- Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
- Integrated Oozie wif the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
- Created Hive tables and working on them using Hive QL.
- Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
- Worked on Cluster co-ordination services through Zookeeper.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in build applications using Maven and integrated wif CI servers like Jenkins to build jobs.
- Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team.
- Worked collaboratively wif all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
Confidential
Data Analyst / Hadoop Developer
Responsibilities:
- Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Zookeeper and Sqoop.
- Configured, designed implemented and monitored Kafka cluster and connectors.
- Implemented a proof of concept (Poc's) using Kafka, Strom, Hbase for processing streaming data.
- Used Sqoop to import data into HDFS and Hive from multiple data systems.
- Developed complex queries using HIVE and IMPALA.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Handled importing of data from various data sources, performed transformations using Hive, Mapreduce, and Loaded data into HDFS.
- Helped wif the sizing and performance tuning of the Cassandra cluster.
- Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDD's.
- Used Hive on Tez to transform the data as per business requirements for batch processing.
- Developed multiple Poc's using Spark and deployed on the Yarn cluster, compared the performance of Spark, and SQL.
- B.S. in Computer Science - Punjab Technical University ( )