Sr. Data Engineer Resume
5.00/5 (Submit Your Rating)
Sanfranciso, CA
SUMMARY
- Over all 5 years of experience in IT industry, played major role in implementing, developing and maintenance of various Web Based applications using Java and Big Data Ecosystem.
- Around 3+ years of strong end to end experience in Hadoop, Spark, Cloud development using different Big Data tools.
PROFESSIONAL EXPERIENCE
Confidential, SANFRANCISO, CA
Sr. Data Engineer
Responsibilities:
- Creating Confidential glue tables on existing csv data using AWS crawlers.
- Creating ETL jobs for converting csv data to parquet format.
- Glue job triggered using AWS Lambda.
- Creating lambda functions to trigger ETL jobs.
- Proficient in shell scripting, awk, sed, grep, Perl
- Using cloud formation template to trigger the lambda functions.
- Modified the ETL glue generated scripts as per the client requirement using Scala
- Implemented CICD pipelines to build and deploy the projects in Hadoop environment.
- Performed advanced SQL queries on views, which was used in the metric reporting tools.
- Able to write Shell scripts (Bourne, Korn, C) to automate tedious repetitive tasks and additional hands on with Perl, Python and Ruby would be a plus
- Used Lambda, Dynamo DB, and cloud watch from AWS.
- Experienced in implementing the pipelines in Jenkins.
- Experience with CDC tools to move data from on data sources.
- Experience with AWS cloud services such as: EC2, EMR, RDS, Redshift
- Experience with relational SQL and NoSQL databases.
- Experience working with data, data warehousing & BI.
ENVIRONMENT: Hadoop, Hive, HDFS, AWS, Glue, EMR, spark Streaming, spark - SQL, Scala, Java, Lambda, Step functions, Jenkins.
Confidential, DENVER CO
Data Engineer.
Responsibilities:
- Implemented workflows to process around 400 messages per second and push the messages to the Document DB as well as Event Hubs.
- Built a custom message producer which can produce about 4000 messages per second for scalability testing.
- Implemented call-back architecture and notification architecture for real time data.
- Implemented spark streaming in Scala to process the JSON Messages and push them to the Kafka topic.
- Developed SQL Reports using advanced SQL queries in OLTP system and webFOCUS
- Written Auto scalable functions which will consume the data from Azure Service Bus or Azure Event Hub and send the data to Document DB.
- Created real time streaming dashboards in Power BI using Stream Analytics to push dataset to Power BI.
- Ability to write PERL, python, and shell scripts
- Developed a custom message consumer to consume the data from the Kafka producer and push the messages to service bus and event hub (Azure Components).
- Written spark Application to capture the change feed from the Document DB using java API and write updates to the new Document DB.
- Created Custom Dashboards Using Application Insights and Application Insights Query Language to process metrics sent to AI and create dashboards on top of it in AZURE.
- Implemented Zero Downtime deployment for the entire production pipelines in Azure.
- Implemented CICD pipelines to build and deploy the projects in Hadoop environment.
- Experienced in implementing the pipelines in Jenkins.
- Used Custom Receiver, socket stream, File stream and Directory stream in spark streaming.
- Used Lambda, Kinesis, Dynamo DB, cloud watch from AWS.
- Used APP Insights, Document DB, Service Bus, Azure Data Lake Store, Azure Blob Store, Event HUB, Azure Functions.
- Used Python to run the ansible playbook which will deploy the logic apps to azure.
- Developed Hadoop streaming Jobs using python for integrating python API supported applications.
ENVIRONMENT: Hadoop, Hive, HDFS, Azure, AWS, spark Streaming, spark-SQL, Scala, python, Java, webserver’s, Maven Build, Jenkins, Ansible.
Confidential, SAN DIEGO CA
Big Data Engineer
Responsibilities:
- Developed daily process to do Incremental import of Data from DB2 and Oracle into HDFS using Sqoop.
- Extensively used Pig for Data transformation and Data cleansing.
- Experienced in converting Hive scripts into spark using Scala and optimized spark jobs.
- Experience in writing stored procedures to transform the data in Microsoft SQL server.
- Scheduled multiple spark jobs in Oozie scheduler.
- Developed Error logging script to load the Ingestion logs into HBase table.
- Worked on CI/CD deployments using Jenkins.
- Ingested XML Files captured from RabbitMQ and stored in HBase table.
- Developed Complex Hive queries to analyse Data.
ENVIRONMENT: MAPR, Spark, Hive, Sqoop, Spark, HBase, Oozie, Pig, Scala, UNIX, Agile, Code hub, Splunk, Pyspark.
Confidential
Junior Data Engineer
Responsibilities:
- Developed Hive queries as per required analytics for the report generation
- Involved in developing the Pig scripts to process the data coming from different sources
- Worked on data cleaning using pig scripts and storing in HDFS.
- Worked on PIG user defined functions (UDF) using java language for external functions
- Developed data requirements, performed database queries to identify test data, and to create data procedures with expected results.
- Planned, coordinated and managed internal process documentation and presentations, describing the Process Improvement identified along with the Workflow, Diagrams associated with the Process Flow.
- Scheduling jobs to automate the process for regular executing jobs worked on using OOZIE.
- Worked on UDF in Hive using JAVA.
- Expertise on PIG Joins to handle the data in Different data sets
ENVIRONMENT: Hadoop, Hive, Pig, Sqoop, HBase, MapReduce, Java, Python