Senior Big Data Engineer Resume Boise, ID - Hire IT People

SUMMARY

Over 8+ years of diversified experience in Software Design & Development. Experience as Big Data Engineer solving business use cases for several clients. Experience in the field of software with expertise in backend applications.
Solid experience developing Spark Applications for performing highly scalable data transformations using RDD, Data frame, Spark - SQL, and Spark Streaming.
Hands on experience on Kafka and Flume to load the log data from multiple sources directly in to HDFS
Experience in using the cloud services like Amazon AWS EMR, S3, EC2, Red shift and Athena.
Strong expertise in building scalable applications using various programming languages (Java, Scala, and Python).
Proficient in Core Java concepts like Multi-threading, Collections and Exception Handling concepts.
Experience of developing applications with Model View Architecture (MVC2) using Spring Framework and J2EE Design Patterns.
Strong experience troubleshooting Spark failures and fine-tuning long running Spark applications.
Strong experience working with various configurations of Spark like broadcast thresholds, increasing shuffle partitions, caching, repartitioning etc., to improve the performance of the jobs.
Worked on Spark Streaming and Structured Spark streaming including Kafka for real time data processing.
Strong experience of operating with cloud environments such as EC2 and S3 of Amazon Web Services (AWS).
Continuous Delivery pipeline deployment experience with Maven, Ant, Jenkins, and AWS.
Strong understanding of Distributed systems design, HDFS architecture, internal working details of MapReduce and Spark processing frameworks
Experience in MVC and Microservices Architecture with Spring Boot and Docker, Swamp.
Experience on Migrating SQL database to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory
Solid experience in using the various file formats like CSV, TSV, Parquet, ORC, JSON and AVRO.
Experienced working with various Hadoop Distributions (Cloudera, Hortonworks, Amazon EMR) to fully implement and leverage various Hadoop services.
Well versed in writing complex hive queries using analytical functions.
Knowledge in writing custom UDF’s in Hive to support custom business requirements.
Experienced in working with structured data using HiveQL, join operations, writing custom UDFs and optimizing Hive queries.
Migrating SQL database to Azure Data Lake, Azure data lake analytics, Azure SQL database,
Expertise in using Docker and setting up ELK with Docker and Docker-Compose. Actively involved in deployments on Docker using Kubernetes.
ConfiguredSpark Streamingto receive real time data fromKafkaand store the stream data to HDFS and process it usingSparkandScala.
Strong Experience in working with Databases like Oracle, and MySQL, Teradata, Netezza and proficiency in writing complex SQL queries.
Experience in version control tools like SVN, GitHub and CVS.

TECHNICAL SKILLS

Operating Systems: Unix, Linux, Windows

Programming Languages: Java, Python 3, Scala 2.12.8, PySpark, C, C++

Hadoop Eco System: Hadoop, MapReduce, Spark, HDFS, Sqoop, YARN, Oozie, Hive, Impala, Apache Flume, Apache Storm, Apache Airflow, HBase

Cluster Management & monitoring: CDH, Horton Works Ambari

Data Bases: MySQL, SQL Server, Oracle 12c, MS Access

NoSQL Data Bases: MongoDB, Cassandra, HBase, KairosDB

Workflow mgmt tools: Oozie, Apache Airflow

Visualization & ETL tools: Tableau, BananaUI, D3.js, Informatica, Talend

Cloud Technologies: Azure, AWS

IDE’s: Eclipse, Jupyter notebook, Spyder, PyCharm, IntelliJ

Version Control Systems: Git, SVN

PROFESSIONAL EXPERIENCE

Confidential, Boise, ID

Senior Big Data Engineer

Responsibilities:

Work in a fast-paced agile development environment to quickly analyze, develop, and test potential use cases for the business.
Developed data pipeline using Spark, Hive, Pig, python, Impala, and HBase to ingest customer
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
Designed SSIS Packages to extract, transfer, load (ETL) existing data into SQL Server from different environments for the SSAS cubes (OLAP)
Creating datamodel that correlates all the metrics and gives a valuable output.
Worked on the tuning of SQL Queries to bring down run time by working on Indexes and Execution Plan.
Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.
Developed a detailed project plan and helped manage the data conversion migration from the legacy system to the target snowflake database
Create data pipelines to use for business reports and process streaming data by using Kafka on premise cluster.
Process the data from Kafka pipelines from topics and show the real time streaming in dashboards
Worked on AWS Data pipeline to configure data loads from S3 to into Redshift.
Developing Spark programs with Python, and applied principles of functional programming to process the complex structured data sets.
Developed Python-based API (RESTful Web Service) to track revenue and perform revenue analysis.
Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, PostgreSQL, Data Frame, OpenShift, Talend, pair RDD's
Involved in integration of Hadoop cluster with spark engine to perform BATCH and GRAPHX operations.
Performed data preprocessing and feature engineering for further predictive analytics using Python Pandas.
Create Spark code to processstreaming datafromKafkacluster and load the data to staging area for processing.
Migrate data from on-premises to AWS storage buckets
Developed a python script to hit REST API’s and extract data to AWS S3
Created Tables, Stored Procedures, and extracted data using T-SQL for business users whenever required.
Used PySpark and Pandas to calculate the moving average and RSI score of the stocks and generated them into data warehouse.
Design, develop, and test dimensionaldatamodels using Star andSnowflakeschemamethodologies under the Kimball method.
DesignedAWSCloud Formation templates to create VPC, subnets, NAT to ensure successful deployment of Web applications and database templates.
Created a Lambda Deployment function, and configured it to receive events from S3 buckets
Selected and generated data into csv files and stored them into AWS S3 by using AWS EC2 and then structured and stored in AWS Redshift.
Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.
Worked on developing Pyspark script to encrypting the raw data by using Hashing algorithms
SQL Server reporting services (SSRS). Created & formatted Cross-Tab, Conditional, Drill-down, Top N, Summary, Form, OLAP, Sub reports, ad-hoc reports, parameterized reports, interactive reports & custom reports
Created action filters, parameters and calculated sets for preparing dashboards and worksheets using PowerBI

Environment: Spark, Python, Scala, Kafka, AWS, EC2, Redshift, S3 Buckets, ETL, Tableau, Presto, Hive/Hadoop, Snowflakes, AWS Data Pipeline, IBM Cognos 10.1, Data Stage, Cognos Report Studio 10.1, Cognos Connection, Cognos office Connection, Cognos, Data stage and Quality Stage, Oracle, Sql Server, Shell Scripting, Git

Confidential, Rochester, MN

Big Data Engineer

Responsibilities:

Installed Kafka Producer on different severs and Scheduled to produce data for every 10 seconds Implemented Data Quality in ETL Tool Talend and having good knowledge in Data Warehousing Developed Apache Spark applications by using spark for data processing from various streaming sources.
Good Exposure on Map Reduce programming using Java, PIG Latin Scripting and Distributed Application and HDFS.
Used Airflow to monitor and schedule the work
ConfiguredSpark streamingto get ongoing information from theKafkaand store the stream information to HDFS.
Worked with PowerShell and UNIX scripts for file transfer, emailing and other file related tasks.
CreateSpark Vectorized panda user definedfunctions for data manipulation and wrangling
Involved in creating HDInsight cluster in Microsoft Azure Portal also created Eventshub and Azure SQL Databases.
Took proof of concept projects ideas from business, lead, developed and created production pipelines that deliver business valueusing Azure Data Factory
Strong Knowledge on architecture and components of Tealeaf, and efficient in working with Spark Core, SparkSQL. Designed and developed RDD Seeds using Scala and Cascading. Streaming data to Spark streaming using Kafka
Writing PySpark and spark Sql transformation in Azure Databricks to perform complex transformations for business rule implementation
Developed HIVE UDFs to in corporate external business logic into Hive script and Developed join data set scripts using HIVE join operations.
CreateSelf Servicereportingin Azure Data Lake Store Gen2using an ELT approach.
Built real time pipeline for streaming data using Eventshub/Microsoft Azure Queue and Spark streaming.
Extracted and updated the data into HDFS using Sqoop import and export.
Create and maintain optimal data pipeline architecture in cloudMicrosoft Azure using Data Factory and Azure Databricks
Delivered de normalized data forPower BIconsumers for modeling and visualization from the produced layer in Data lake
Developed various Oracle SQL scripts, PL/SQL packages, procedures, functions, and java code for data
Worked on a clustered Hadoop for Windows Azure using HDInsight and Hortonworks Data Platform for Windows.
Setting up Azure infrastructure likestorage accounts, integration runtime, service principalid, app registrations to enablescalable and optimizedutilization of business user analytical requirements in Azure.
Implemented Kafka producers create custom partitions, configured brokers and implemented High level consumers to implement data platform.
Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, PySpark, Impala, Tealeaf, Pair RDD's, DevOps, Spark YARN
Exposed transformed datain Azure Spark Databricks platformto parquet formats for efficient data storage
Creating Data factory pipelines that can bulk copy multiple tables at once from relational database to Azure data lake gen2
Took proof of concept projects ideas from business, lead, developed and created production pipelines that deliver business valueusing Azure Data Factory
Developed and designeddata integrationandmigration solutionsinAzure.
Responsible to manage data coming from different sources through Kafka.
Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources
Implement Continuous integration/continuous development best practice using Azure DevOps, ensuring code versioning

Environment: Hadoop, Spark, MapReduce, Kafka, Docker, Jenkins, Scala, JAVA, Azure Data Lake Gen2, Azure Data Factory, PySpark, Databricks, Azure DevOps, Agile, Power BI, Python, R, PL/SQL, Oracle 12c, SQL, No SQL, HBase, Scaled Agile team environment

Confidential, Boston, MA

Data Engineer

Responsibilities:

Managed security groups on AWS, focusing on high-availability, fault-tolerance, and auto scaling using Terraform templates. Along with Continuous Integration and Continuous Deployment with AWS Lambda and AWS code pipeline.
Created Data Quality Scripts using SQL and Hive to validate successful das ta load and quality of the data. Created various types of data visualizations using Python and Tableau.
Wrote various data normalization jobs for new data ingested into Redshift.
Created various complex SSIS/ETL packages to Extract, Transform and Load data
UsedZookeeperto store offsets of messages consumed for a specific topic and partition by a specific Consumer Group in Kafka.
UsedKafkafunctionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds and Created applications using Kafka, which monitors consumer lag withinApache Kafkaclusters.
Used Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology for the job to get done.
Migrated on premise database structure to Confidential Redshift data warehouse
Was responsible for ETL and data validation using SQL Server Integration Services.
Worked on Big data on AWS cloud services i.e. EC2, S3, EMR and DynamoDB
Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics.
Involved in the Forward Engineering of the logical models to generate the physical model using Erwin and generate Data Models using ERwin and subsequent deployment to Enterprise Data Warehouse.
Defined and deployed monitoring, metrics, and logging systems on AWS
Developed SSRS reports, SSIS packages to Extract, Transform and Load data from various source systems
Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced referential integrity constraints and created logical and physical models using Erwin.
Created ad hoc queries and reports to support business decisions SQL Server Reporting Services (SSRS).
Analyze the existing application programs and tune SQL queries using execution plan, query analyzer, SQL Profiler and database engine tuning advisor to enhance performance.

Environment: Informatica, RDS, NOSQL, Snow Flake Schema, Apache Kafka, Python, Zookeeper, SQL Server, Erwin, Oracle, Redshift, MySQL, PostgreSQL.

Confidential

Hadoop Developer/ Data Engineer

Responsibilities:

Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems.
Worked extensively on AWS Components such as Airflow, Elastic Map Reduce (EMR), Atana, Snowflake.
DevelopedPythonscripts to find vulnerabilities with SQL Queries by doing SQL injection
Loaded data fromUNIXfile system to HDFS and writtenHive User Defined Functions.
Used Sqoop to load data from DB2 toHBasefor faster querying and performance optimization.
DevelopedHive scriptsfor implementing dynamic partitions.
Experience in writing SQOOP Scripts for importing and exporting data from RDBMS to HDFS.
Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool
Installed and configuredHive, Pig, Sqoop, Flume and Oozieon the Hadoop cluster.
DevelopedSqoopscripts for loading data into HDFS from DB2 and pre-processed with PIG.
Automated the tasks of loading the data into HDFS and pre-processing with Pig by developing workflows using Oozie
DevelopedMapReducejobs in bothPIGandHivefor data cleaning and pre-processing.
Developed suit of Unit Test Cases forMapper, Reducer and Driverclasses using testing library.
Collected the logs data from web servers and integrated in to HDFS using Flume.
Worked on developingETL Workflowson the data obtained using Scala for processing it in HDFS andHBaseusing Oozie.
Written ETL jobs to visualize the data and generate reports from MySQL database using DataStage.

Environment: Hadoop, HDFS, Hive, Pig, Flume, Mapper, AWS, Flume, ETL Workflows, HBase, Python, Sqoop, Oozie, DataStage, Linux, Relational Databases, SQL Server, DB2

Confidential

Hadoop Developer

Responsibilities:

Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce
Creating Hive tables and working on them using Hive QL. Experienced indefining jobflows.
Imported data using Sqoop to load data from Oracle to HDFS on regular basis.
Importing and exporting data into HDFS from Oracle Database and vice versa using Sqoop
Version Controlled using SVN.
Installed and configured Pig and also written Pig Latin scripts.
Developed application in Eclipse IDE. Experience in developingspring Bootapplications for transformations.
The custom FileSystem plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
Wrote MapReduce job using Pig Latin. Involved in ETL, Data Integration and Migration.
Build and deployed war file in WebSphere application server

Environment: Hadoop, MapReduce, HDFS, Hive, Java, Hadoop distribution of Cloudera, Pig, HBase, Oracle, Toad, MS Office, MS Excel.

We provide IT Staff Augmentation Services!

Senior Big Data Engineer Resume

Boise, ID

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship