We provide IT Staff Augmentation Services!

Data Engineer Resume

3.00/5 (Submit Your Rating)

Wilmington, DE

SUMMARY

  • Experience in design and development of real - time streaming applications using Spark streaming and Scala.
  • Over 7+ years of hands-on experience Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects and experience in Java/j2EE and Big Data in implementing end-to-end Hadoopsolutions.
  • Extensive experience in installing, configuring, and using ecosystem components like HadoopMapReduce, HDFS, Sqoop, Yarn, Pig, Zookeeper, Hive, Impala & Spark.
  • Good experience in writing Spark applications using Python and Scala, and SQL.
  • ImplementedSpark scriptsusingScala,SparkSQLto access hive tables into spark for faster processing of data.
  • PerformedETL processwithSparkusingScalafor processing and validation of raw data logs.
  • Expertise in publish and subscribe event streaming systems like Apache Kafka and MapR Event Streams and Flink
  • Worked on Oracle, DB2, MySQL database and NoSQL databases like HBase and MapR-DB and Cassandra, MongoDB.
  • Worked on AmazonAWS concepts like EMR and EC2 instance and Lambda, S3buckets web services, Glue, Athena which provides fast and efficient processing.
  • Extensive experience on IDEs My Eclipse and Atom
  • Worked on visualizations tools like Tableau, Power BI, Grafana.
  • Experience in working GIT and Bitbucket
  • Extensively worked on beginner patches for Docker and Build tools like Jenkins.
  • Used Agile (SCRUM) methodologies for Software Development

TECHNICAL SKILLS

Big data Technologies: HDFS, Hive, Map Reduce, Spark, Sqoop, Pig, Apache Flume, HBase, Apache Kafka, Oozie

Languages: Scala, Core Java, Unix Shell scripts, SQL

Databases: Oracle, DB2, SQL Server, MySQL, HBase, MapR-DB

IDEs: Eclipse, IntelliJ, Atom

Other Tools & packages: Cloudera Manager, MapR Control System (MCS), Replicate, SVN, JUnit, Maven, ANT, GitHub, Bitbucket, PuttyAnd Stream sets Data Collector, Power BI,, Grafana, Tableau, FileZilla

SDLC Methodology: Agile

Operating Systems: Linux/UNIX, windows

PROFESSIONAL EXPERIENCE

Confidential, Wilmington, DE

Data Engineer

Responsibilities:

  • Followed Agile Scrum methodology that included iterative application development, weekly Sprints and stand-up meeting.
  • Worked with analyst to determine and understand business requirements.
  • Worked on analyzing Hadoop stack and different big data analytic tools including Pig and Hive, HBase database and Sqoop.
  • Involved in creatingSparkapplicationsinScalausing cache, map reduce. functions to process data.
  • Wrote complex SQL queries using advanced SQL concepts like Aggregate functions.
  • Extracted data using Sqoop Import query from multiple databases and ingest into Hive tables.
  • Nomination QA Application is developed using Scala/spark/data frames to read data from Hive Tables on YARN Framework.
  • Used Kafka streams to configure spark streaming to get information and then store in HDFS
  • Developed Kafka consumer to consume data from Kafka topics.
  • ImplementedSparkCoreinScalato process data in memory.
  • CreatedOozieworkflowsforHadoopbased jobs includingSqoop,HiveandPig.
  • CreatedHive External tablesand loaded the data in to tables and query data usingHQL.
  • SupportedMapReducePrograms that are running on the cluster.Cluster monitoring, maintenance, and troubleshooting.

Environment: Hadoop, spark, SCALA, Amazon EMR, S3, EC2, SQOOP, Kafka, Jira, Jenkins

Confidential, Austin, TX

Data Engineer

Responsibilities:

  • Followed Agile Scrum methodology that included iterative application development, weekly Sprints and stand up meetings.
  • Extracted data using Sqoop Import query from multiple databases and ingest into Hive tables.
  • Documented the data flow from application Kafka storm HDFS HIVE tables.
  • Nomination QA Application is developed using Scala/spark/data frames to read data from Hive Tables on YARN Framework.
  • DevelopedSparkscripts by usingScalashell commands as per the requirement.
  • Closely worked with Kafka Admin team and setup Kafka cluster setup on the QA and production environments
  • Implementing new dimensions into spark application upon on the business requirements.
  • Responsible to store processed data into MongoDB.
  • Wrote queries to fetch data from different table by using JOINs, Sub-queries, Correlated sub-queries and derived tables on SQL Server platform
  • Created/Enhanced Teradata Stored Procedures to generate automated testing SQLs.
  • Designed and Implement test environment on AWS.
  • Experience in creating various views in Tableau (Tree maps, Heat Maps, Scatter plot).
  • Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark.
  • Responsible for Account management, IAM Management and Cost management
  • Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup onAWS.
  • Created monitors, alarms and notifications for EC2 hosts using Cloud Watch, Cloud trail and SNS.
  • Involved in setting upKafkaandZookeeperProducer-Consumer components for the Big Data Environments
  • Installed and configuredHive,Pig,SqoopandOozieon theHadoopcluster.
  • Experienced in Data Modelling in SQL and NoSQL Databases
  • Worked using the tools ofJIRAandJenkinswithin the project

Environment: Hadoop, spark, SCALA, Amazon EMR, S3, EC2, SQOOP, Kafka, Jira, Jenkins

Confidential, McLean, VA

Data Engineer

Responsibilities:

  • Followed Agile Scrum methodology that included iterative application development, weekly Sprints and stand up meetings.
  • As a Developer Worked with the development and operations teams to implement the necessary tools and process to support the builds, deployments, testing, and infrastructure.
  • Worked on the S3 buckets whitelisting in Risk Dev account and fix the security groups in AWS QA environment.
  • Building Data Pipelines to automate the process of accessing the S3 buckets of AWS and to get required information from the buckets. Worked on Aws lambda
  • Create external tables with partitions using Hive, AWS Athena and Redshift
  • Update the security groups using Confidential IAM roles in AWS Risk Development account.
  • Updated numerous Confluence pages.
  • Updated and worked on the Metadata in Snowflake.
  • Worked on Confidential internal technologies such as Bogie (Build tool), Nebula.
  • Developed numerous cloud formation templates to deploy EC2 instances based on requirements
  • Updated the S3 buckets in prod Nebula and Worked on deploy EMR.

Environment: AWS EMR, S3, Lambda, RDS, Bogie, Docker, Jenkins, Git, Python, Atom.

Confidential, Columbus IN

Data Engineer

Responsibilities:

  • Followed Agile Scrum methodology that included iterative application development, weekly Sprints and stand up meetings.
  • Creating External and Managed Hive tables and working on them using HiveQL.
  • Validated the Map reduce, Pig, Hive Scripts by pulling the data from the Hadoopand validating it with the data in the files and reports.
  • Configured real-time streaming pipeline from DB2 to HDFS using Apache Kafka.
  • Used JIRA for the issue tracking and bug reporting.
  • ImplementedSparkusing Scala and Spark SQL for faster testing and processing of data
  • DevelopedSparkcode to usingScalaandSpark -SQLfor faster processing and testing.
  • Wrote Stored Procedures/Triggers/Functions using SQL Navigator to perform operations on Oracle database.
  • Installed Name node, Secondary name node, Yarn (Resource Manager, Node manager, Application master), Data node.
  • Developed ETL process usingJitterbit Harmonycloud Integration tool.
  • Implemented AWS solutions using EC2, S3, RDS, EBS, Elastic Load Balancer, Auto-scaling groups.
  • Exporting data to Teradata using SQOOP.

Environment: Hadoop, HDFS, Pig, Hive, Spark, Scala, MapReduce, AWS

Confidential, Denver CO

Data Engineer

Responsibilities:

  • Used Hue and MapR Control System (MCS) to monitor and troubleshoot Spark jobs.
  • Worked on data validation using HIVE and written Hive UDFs and Importing and exporting data into HDFS and Hive usingSqoop.
  • Experienced in developing scripts for doing transformations usingScala.
  • Used Spark API over HadoopYARNas execution engine for data analytics using Hive.
  • UsedPigandHivein the analysis of data.
  • Worked on UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.
  • Built dashboards and visualizations on top of MapRDB and MapR-Hive using Tableau and Oracle data visualizer desktop.
  • Built real-time visualizations on top of Open TSDB using Grafana.

Environment: Hadoop, HDFS, Pig, Hive, Spark, Scala, MapReduce

Confidential, Alpharetta, GA

Big Data Developer

Responsibilities:

  • Configured Stream sets data collector with MapR Event Streams to stream real time data from different sources (database & files) into MapR topics.
  • Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
  • Worked on data validation using HIVE and written Hive UDFs.
  • Built dashboards and visualizations on top of Hive using Power BI.
  • Worked onData Lakeprocessing for the manufacturing data withSpark Scalaand storing inHivetables for further analysis withTableau.
  • Developed SQL queries to perform joins on the tables in MySQL.
  • WrittenHiveUDFs to extract data from staging tables.
  • Involved in creatingHive tables, loading with data.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Created stored procedures in MySQL to improve data handling and ETL Transactions.

Environment: Hadoop, HDFS, Pig, Hive, Spark, MapReduce, Java

Confidential

Java Developer

Responsibilities:

  • Involved in requirement, design and development phases of the application.
  • Worked with DBA for the creation of new tables and new fields in the database.
  • Developed custom tags, STLD to support custom User Interfaces.
  • Developed the application using Struts Framework that leverages classical Model View Controller (MVC) architecture.
  • Created new Action Forms to access the form data.
  • Used Multithreading in programming to improve overall performance.
  • Created RESTful Web service for updating customer data from sent from external systems.
  • Data was converted into JSON using JSP tags.
  • Developed the front end User Interface using HTML5, JavaScript, CSS3, JSON, jQuery.

Environment: Java, Oracle DB, HTML, JavaScript, and CSS

We'd love your feedback!