We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

5.00/5 (Submit Your Rating)

Sanfranciso, CA

SUMMARY

  • Over all 5 years of experience in IT industry, played major role in implementing, developing and maintenance of various Web Based applications using Java and Big Data Ecosystem.
  • Around 3+ years of strong end to end experience in Hadoop, Spark, Cloud development using different Big Data tools.

PROFESSIONAL EXPERIENCE

Confidential, SANFRANCISO, CA

Sr. Data Engineer

Responsibilities:

  • Creating Confidential glue tables on existing csv data using AWS crawlers.
  • Creating ETL jobs for converting csv data to parquet format.
  • Glue job triggered using AWS Lambda.
  • Creating lambda functions to trigger ETL jobs.
  • Proficient in shell scripting, awk, sed, grep, Perl
  • Using cloud formation template to trigger the lambda functions.
  • Modified the ETL glue generated scripts as per the client requirement using Scala
  • Implemented CICD pipelines to build and deploy the projects in Hadoop environment.
  • Performed advanced SQL queries on views, which was used in the metric reporting tools.
  • Able to write Shell scripts (Bourne, Korn, C) to automate tedious repetitive tasks and additional hands on with Perl, Python and Ruby would be a plus
  • Used Lambda, Dynamo DB, and cloud watch from AWS.
  • Experienced in implementing the pipelines in Jenkins.
  • Experience with CDC tools to move data from on data sources.
  • Experience with AWS cloud services such as: EC2, EMR, RDS, Redshift
  • Experience with relational SQL and NoSQL databases.
  • Experience working with data, data warehousing & BI.

ENVIRONMENT: Hadoop, Hive, HDFS, AWS, Glue, EMR, spark Streaming, spark - SQL, Scala, Java, Lambda, Step functions, Jenkins.

Confidential, DENVER CO

Data Engineer.

Responsibilities:

  • Implemented workflows to process around 400 messages per second and push the messages to the Document DB as well as Event Hubs.
  • Built a custom message producer which can produce about 4000 messages per second for scalability testing.
  • Implemented call-back architecture and notification architecture for real time data.
  • Implemented spark streaming in Scala to process the JSON Messages and push them to the Kafka topic.
  • Developed SQL Reports using advanced SQL queries in OLTP system and webFOCUS
  • Written Auto scalable functions which will consume the data from Azure Service Bus or Azure Event Hub and send the data to Document DB.
  • Created real time streaming dashboards in Power BI using Stream Analytics to push dataset to Power BI.
  • Ability to write PERL, python, and shell scripts
  • Developed a custom message consumer to consume the data from the Kafka producer and push the messages to service bus and event hub (Azure Components).
  • Written spark Application to capture the change feed from the Document DB using java API and write updates to the new Document DB.
  • Created Custom Dashboards Using Application Insights and Application Insights Query Language to process metrics sent to AI and create dashboards on top of it in AZURE.
  • Implemented Zero Downtime deployment for the entire production pipelines in Azure.
  • Implemented CICD pipelines to build and deploy the projects in Hadoop environment.
  • Experienced in implementing the pipelines in Jenkins.
  • Used Custom Receiver, socket stream, File stream and Directory stream in spark streaming.
  • Used Lambda, Kinesis, Dynamo DB, cloud watch from AWS.
  • Used APP Insights, Document DB, Service Bus, Azure Data Lake Store, Azure Blob Store, Event HUB, Azure Functions.
  • Used Python to run the ansible playbook which will deploy the logic apps to azure.
  • Developed Hadoop streaming Jobs using python for integrating python API supported applications.

ENVIRONMENT: Hadoop, Hive, HDFS, Azure, AWS, spark Streaming, spark-SQL, Scala, python, Java, webserver’s, Maven Build, Jenkins, Ansible.

Confidential, SAN DIEGO CA

Big Data Engineer

Responsibilities:

  • Developed daily process to do Incremental import of Data from DB2 and Oracle into HDFS using Sqoop.
  • Extensively used Pig for Data transformation and Data cleansing.
  • Experienced in converting Hive scripts into spark using Scala and optimized spark jobs.
  • Experience in writing stored procedures to transform the data in Microsoft SQL server.
  • Scheduled multiple spark jobs in Oozie scheduler.
  • Developed Error logging script to load the Ingestion logs into HBase table.
  • Worked on CI/CD deployments using Jenkins.
  • Ingested XML Files captured from RabbitMQ and stored in HBase table.
  • Developed Complex Hive queries to analyse Data.

ENVIRONMENT: MAPR, Spark, Hive, Sqoop, Spark, HBase, Oozie, Pig, Scala, UNIX, Agile, Code hub, Splunk, Pyspark.

Confidential

Junior Data Engineer

Responsibilities:

  • Developed Hive queries as per required analytics for the report generation
  • Involved in developing the Pig scripts to process the data coming from different sources
  • Worked on data cleaning using pig scripts and storing in HDFS.
  • Worked on PIG user defined functions (UDF) using java language for external functions
  • Developed data requirements, performed database queries to identify test data, and to create data procedures with expected results.
  • Planned, coordinated and managed internal process documentation and presentations, describing the Process Improvement identified along with the Workflow, Diagrams associated with the Process Flow.
  • Scheduling jobs to automate the process for regular executing jobs worked on using OOZIE.
  • Worked on UDF in Hive using JAVA.
  • Expertise on PIG Joins to handle the data in Different data sets

ENVIRONMENT: Hadoop, Hive, Pig, Sqoop, HBase, MapReduce, Java, Python

We'd love your feedback!