Sr. Big Data Engineer Resume Minneapolis, MN - Hire IT People

SUMMARY:

Over 10+ years of extensive hands - on Big Data Capacity with teh halp of Hadoop Eco System across internal and cloud-based platforms.
Expertise inCloud Computing and Hadooparchitecture and its various components -HadoopFile System HDFS, MapReduce, Spark, Name node, Data Node, Job Tracker, Task Tracker, Secondary Name Node.
Experience working in different Google Cloud Platform Technologies like Big Query, Dataflow, Dataproc, Pubsub, Airflow.
Design and Development of Ingestion Framework over Google Cloud and Hadoop cluster.
Good Knowledge onHadoopCluster architecture and monitoring.
Extensive Experience on importing and exporting data using Kafka.
Experience in Spark, Scala, Kafka.
Hands on experience of usingHadoopecosystem components like Oozie, Hive, Sqoop, Flume, HUE, Impala.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice versa.
Experience in developing and scheduling ETL workflows in Hadoop usingOoziewith teh halp of deployment and managing Hadoop cluster using Cloudera and Hortonworks.
Experience in importing data from a relational database to Hive metastore using Sqoop.
Experience creating managed and external tables in Hive.

TECHNICAL SKILLS:

Big Data Ecosystems: Spark, HDFS and Map Reduce, Pig, Hive, Pig, YARN, Oozie, Zookeeper, Apache Spark, Apache NiFi, Apace STORM, Apache Kappa, Apache Kafka, Sqoop, Flume

Cloud Technologies: Google Cloud Platform, Pub/Sub, Dataflow, Big Query

Scripting Languages: Python, shell

Programming Languages: Python, Java, Scala

Databases: Netezza, SQL Server, MySQL, ORACLE, DB2

IDEs / Tools: Eclipse, JUnit, Maven, Ant, MS Visual Studio, Net Beans

Methodologies: Agile, Waterfall

PROFESSIONAL EXPERIENCE:

Confidential

Sr. Big Data Engineer

Responsibilities:

Performed complex day-to-day development/support for GCPcloudenvironment
Developed, maintain and tuned highly complex scripts using Python and Big Query
Managed and designed solutions related to Data Engineering, Data Analysis, Data modelling, Data Warehouse, Data security, ETL.
Implemented parsing and interpreter for migrating data from IBM mainframe to Google Big Query will be a big advantage.
Building data integration and preparation tools using cloud technologies like Google Dataflow, Cloud Data prep, Python, etc.
Assisted with Google Cloud Platform (GCP) operations
Implemented CI/CD Pipelines initiating code builds from repositories and orchestrated deployment within Google Kubernetes Engine (GKE) containers
Configured native security tools available within Google Cloud Platform

Confidential, Minneapolis, MN

Sr. Big Data Engineer / Cloud Engineer

Responsibilities:

Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from MYSQL into HDFS vice-versa using Sqoop.
Designed and implemented end to end big data platform on Teradata Appliance
Performed ETL from multiple sources such as Kafka, NIFI, Teradata, DB2 using Hadoop spark.
Worked on Apache Spark Utilizing teh Spark, SQL and Streaming components to support teh intraday and real-time data processing
Developed Python, Bash scripts to automate and provide Control flow
Moving data from Teradata to a Hadoop cluster Using TDCH/Fast export and Apache NIFI.
Developed Python, PySpark, Bash scripts logs to Transform, and Load data across on premise and cloud platform.
Building data pipeline ETLs for data movement to S3, then to Redshift.
Written AWS Lambda code in Python for nested Json files, converting, comparing, sorting etc.
Installed and configured apache airflow for workflow management and created workflows in python
Wrote UDFs in Hadoop PySpark to perform transformations and loads.
Wrote TDCH scripts and apache NIFI to load data from Mainframes DB2 to Hadoop cluster.
Working with, ORC, AVRO and JSON, Parquetted file formats and create external tables and query on top of these files Using Big Query

Confidential, Denver, CO

Sr. data analyst/ Big Data Engineer

Responsibilities:

Used Hive Queries in Spark-SQL for analysis and processing teh data
Hands on experience in installation, configuration, supporting and managing Hadoop Clusters
Written shell scripts that run multiple Hive jobs which halps to automate different Hive tables incrementally which are used to generate different reports using Tableau for teh Business use
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using teh Spark framework and handled Json Data
Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data
Involved in business analysis and technical design sessions with business and technical staff to develop requirements document and ETL design specifications.
Wrote complex SQL scripts to avoid Informatica Look-ups to improve teh performance as teh volume of teh data was heavy.
Responsible for design, development, Data Modelling, of Spark SQL Scripts based on Functional Specifications
Designed and developed extract, transform, and load (ETL) mappings, procedures, and schedules, following teh standard development lifecycle
Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
Worked closely with Quality Assurance, Operations and Production support group to devise teh test plans, answer questions and solve any data or processing issues
Worked on large-scale Hadoop YARN cluster for distributed data processing and analysis using Data Bricks Connectors, Spark core, Spark SQL, GCP, Sqoop, Hive and NoSQL databases
Worked in writing Spark SQL scripts for optimizing teh query performance
Concerned and well-informed on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming
Implemented Hive UDF's and did performance tuning for better results
Tuned, and developed SQL on HiveQL, Drill and SparkSQL
Experience in using Sqoop to import and export teh data from Oracle DB into HDFS and HIVE
Developed Spark code using Spark RDD and Spark-SQL/Streaming for faster processing of data
Implemented Partitioning, Data Modelling, Dynamic Partitions and Buckets in HIVE for efficient data access

Confidential

Sr. Hadoop/Spark Developer

Responsibilities:

Involved in teh process of Cassandra data modelling and building efficient data structures.
Developed MapReduce (YARN) jobs for cleaning, accessing and validating teh data.
Created and worked Sqoop jobs with incremental load to populate Hive External tables.
Developed optimal strategies for distributing teh web log data over teh cluster importing and exporting teh stored web log data into HDFS and Hive using Sqoop.
Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
Analyzed teh web log data using teh HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
Integrated Oozie with teh rest of teh Hadoop stack supporting several types of Hadoop jobs out of teh box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
Created Hive tables and working on them using Hive QL.
Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
Worked on Cluster co-ordination services through Zookeeper.
Monitored workload, job performance and capacity planning using Cloudera Manager.
Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
Exported teh analyzed data to teh RDBMS using Sqoop for to generate reports for teh BI team.
Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
Involved in Agile methodologies, daily scrum meetings, spring planning.

Confidential

Data Analyst / Hadoop Developer

Responsibilities:

Involved in full life cycle of teh project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Zookeeper and Sqoop.
Configured, designed implemented and monitored Kafka cluster and connectors.
Implemented a proof of concept (Poc's) using Kafka, Strom, Hbase for processing streaming data.
Used Sqoop to import data into HDFS and Hive from multiple data systems.
Developed complex queries using HIVE and IMPALA.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Handled importing of data from various data sources, performed transformations using Hive, Mapreduce, and Loaded data into HDFS.
Helped with teh sizing and performance tuning of teh Cassandra cluster.
Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDD's.
Used Hive on Tez to transform teh data as per business requirements for batch processing.
Developed multiple Poc's using Spark and deployed on teh Yarn cluster, compared teh performance of Spark, and SQL.
B.S. in Computer Science - Punjab Technical University ( )

We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Minneapolis, MN

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship