Sr. Big Data Engineer Resume Minneapolis, MN - Hire IT People

SUMMARY:

Over 10+ years of extensive hands - on Big Data Capacity wif the help of Hadoop Eco System across internal and cloud-based platforms.
Expertise inCloud Computing and Hadooparchitecture and its various components -HadoopFile System HDFS, MapReduce, Spark, Name node, Data Node, Job Tracker, Task Tracker, Secondary Name Node.
Experience working in different Google Cloud Platform Technologies like Big Query, Dataflow, Dataproc, Pubsub, Airflow.
Design and Development of Ingestion Framework over Google Cloud and Hadoop cluster.
Good Knowledge onHadoopCluster architecture and monitoring.
Extensive Experience on importing and exporting data using Kafka.
Experience in Spark, Scala, Kafka.
Hands on experience of usingHadoopecosystem components like Oozie, Hive, Sqoop, Flume, HUE, Impala.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice versa.
Experience in developing and scheduling ETL workflows in Hadoop usingOoziewif the help of deployment and managing Hadoop cluster using Cloudera and Hortonworks.
Experience in importing data from a relational database to Hive metastore using Sqoop.
Experience creating managed and external tables in Hive.

TECHNICAL SKILLS:

Big Data Ecosystems: Spark, HDFS and Map Reduce, Pig, Hive, Pig, YARN, Oozie, Zookeeper, Apache Spark, Apache NiFi, Apace STORM, Apache Kappa, Apache Kafka, Sqoop, Flume

Cloud Technologies: Google Cloud Platform, Pub/Sub, Dataflow, Big Query

Scripting Languages: Python, shell

Programming Languages: Python, Java, Scala

Databases: Netezza, SQL Server, MySQL, ORACLE, DB2

IDEs / Tools: Eclipse, JUnit, Maven, Ant, MS Visual Studio, Net Beans

Methodologies: Agile, Waterfall

PROFESSIONAL EXPERIENCE:

Confidential

Sr. Big Data Engineer

Responsibilities:

Performed complex day-to-day development/support for GCPcloudenvironment
Developed, maintain and tuned highly complex scripts using Python and Big Query
Managed and designed solutions related to Data Engineering, Data Analysis, Data modelling, Data Warehouse, Data security, ETL.
Implemented parsing and interpreter for migrating data from IBM mainframe to Google Big Query will be a big advantage.
Building data integration and preparation tools using cloud technologies like Google Dataflow, Cloud Data prep, Python, etc.
Assisted wif Google Cloud Platform (GCP) operations
Implemented CI/CD Pipelines initiating code builds from repositories and orchestrated deployment wifin Google Kubernetes Engine (GKE) containers
Configured native security tools available wifin Google Cloud Platform

Confidential, Minneapolis, MN

Sr. Big Data Engineer / Cloud Engineer

Responsibilities:

Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from MYSQL into HDFS vice-versa using Sqoop.
Designed and implemented end to end big data platform on Teradata Appliance
Performed ETL from multiple sources such as Kafka, NIFI, Teradata, DB2 using Hadoop spark.
Worked on Apache Spark Utilizing the Spark, SQL and Streaming components to support the intraday and real-time data processing
Developed Python, Bash scripts to automate and provide Control flow
Moving data from Teradata to a Hadoop cluster Using TDCH/Fast export and Apache NIFI.
Developed Python, PySpark, Bash scripts logs to Transform, and Load data across on premise and cloud platform.
Building data pipeline ETLs for data movement to S3, then to Redshift.
Written AWS Lambda code in Python for nested Json files, converting, comparing, sorting etc.
Installed and configured apache airflow for workflow management and created workflows in python
Wrote UDFs in Hadoop PySpark to perform transformations and loads.
Wrote TDCH scripts and apache NIFI to load data from Mainframes DB2 to Hadoop cluster.
Working wif, ORC, AVRO and JSON, Parquetted file formats and create external tables and query on top of these files Using Big Query

Confidential, Denver, CO

Sr. data analyst/ Big Data Engineer

Responsibilities:

Used Hive Queries in Spark-SQL for analysis and processing the data
Hands on experience in installation, configuration, supporting and managing Hadoop Clusters
Written shell scripts dat run multiple Hive jobs which helps to automate different Hive tables incrementally which are used to generate different reports using Tableau for the Business use
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework and handled Json Data
Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data
Involved in business analysis and technical design sessions wif business and technical staff to develop requirements document and ETL design specifications.
Wrote complex SQL scripts to avoid Informatica Look-ups to improve the performance as the volume of the data was heavy.
Responsible for design, development, Data Modelling, of Spark SQL Scripts based on Functional Specifications
Designed and developed extract, transform, and load (ETL) mappings, procedures, and schedules, following the standard development lifecycle
Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
Worked closely wif Quality Assurance, Operations and Production support group to devise the test plans, answer questions and solve any data or processing issues
Worked on large-scale Hadoop YARN cluster for distributed data processing and analysis using Data Bricks Connectors, Spark core, Spark SQL, GCP, Sqoop, Hive and NoSQL databases
Worked in writing Spark SQL scripts for optimizing the query performance
Concerned and well-informed on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming
Implemented Hive UDF's and did performance tuning for better results
Tuned, and developed SQL on HiveQL, Drill and SparkSQL
Experience in using Sqoop to import and export the data from Oracle DB into HDFS and HIVE
Developed Spark code using Spark RDD and Spark-SQL/Streaming for faster processing of data
Implemented Partitioning, Data Modelling, Dynamic Partitions and Buckets in HIVE for efficient data access

Confidential

Sr. Hadoop/Spark Developer

Responsibilities:

Involved in the process of Cassandra data modelling and building efficient data structures.
Developed MapReduce (YARN) jobs for cleaning, accessing and validating the data.
Created and worked Sqoop jobs wif incremental load to populate Hive External tables.
Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
Integrated Oozie wif the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
Created Hive tables and working on them using Hive QL.
Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
Worked on Cluster co-ordination services through Zookeeper.
Monitored workload, job performance and capacity planning using Cloudera Manager.
Involved in build applications using Maven and integrated wif CI servers like Jenkins to build jobs.
Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team.
Worked collaboratively wif all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
Involved in Agile methodologies, daily scrum meetings, spring planning.

Confidential

Data Analyst / Hadoop Developer

Responsibilities:

Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Zookeeper and Sqoop.
Configured, designed implemented and monitored Kafka cluster and connectors.
Implemented a proof of concept (Poc's) using Kafka, Strom, Hbase for processing streaming data.
Used Sqoop to import data into HDFS and Hive from multiple data systems.
Developed complex queries using HIVE and IMPALA.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Handled importing of data from various data sources, performed transformations using Hive, Mapreduce, and Loaded data into HDFS.
Helped wif the sizing and performance tuning of the Cassandra cluster.
Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDD's.
Used Hive on Tez to transform the data as per business requirements for batch processing.
Developed multiple Poc's using Spark and deployed on the Yarn cluster, compared the performance of Spark, and SQL.
B.S. in Computer Science - Punjab Technical University ( )

We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Minneapolis, MN

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship