Data Engineer Resume
Wilmington, DE
SUMMARY
- Experience in design and development of real - time streaming applications using Spark streaming and Scala.
- Over 7+ years of hands-on experience Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects and experience in Java/j2EE and Big Data in implementing end-to-end Hadoopsolutions.
- Extensive experience in installing, configuring, and using ecosystem components like HadoopMapReduce, HDFS, Sqoop, Yarn, Pig, Zookeeper, Hive, Impala & Spark.
- Good experience in writing Spark applications using Python and Scala, and SQL.
- ImplementedSpark scriptsusingScala,SparkSQLto access hive tables into spark for faster processing of data.
- PerformedETL processwithSparkusingScalafor processing and validation of raw data logs.
- Expertise in publish and subscribe event streaming systems like Apache Kafka and MapR Event Streams and Flink
- Worked on Oracle, DB2, MySQL database and NoSQL databases like HBase and MapR-DB and Cassandra, MongoDB.
- Worked on AmazonAWS concepts like EMR and EC2 instance and Lambda, S3buckets web services, Glue, Athena which provides fast and efficient processing.
- Extensive experience on IDEs My Eclipse and Atom
- Worked on visualizations tools like Tableau, Power BI, Grafana.
- Experience in working GIT and Bitbucket
- Extensively worked on beginner patches for Docker and Build tools like Jenkins.
- Used Agile (SCRUM) methodologies for Software Development
TECHNICAL SKILLS
Big data Technologies: HDFS, Hive, Map Reduce, Spark, Sqoop, Pig, Apache Flume, HBase, Apache Kafka, Oozie
Languages: Scala, Core Java, Unix Shell scripts, SQL
Databases: Oracle, DB2, SQL Server, MySQL, HBase, MapR-DB
IDEs: Eclipse, IntelliJ, Atom
Other Tools & packages: Cloudera Manager, MapR Control System (MCS), Replicate, SVN, JUnit, Maven, ANT, GitHub, Bitbucket, PuttyAnd Stream sets Data Collector, Power BI,, Grafana, Tableau, FileZilla
SDLC Methodology: Agile
Operating Systems: Linux/UNIX, windows
PROFESSIONAL EXPERIENCE
Confidential, Wilmington, DE
Data Engineer
Responsibilities:
- Followed Agile Scrum methodology that included iterative application development, weekly Sprints and stand-up meeting.
- Worked with analyst to determine and understand business requirements.
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig and Hive, HBase database and Sqoop.
- Involved in creatingSparkapplicationsinScalausing cache, map reduce. functions to process data.
- Wrote complex SQL queries using advanced SQL concepts like Aggregate functions.
- Extracted data using Sqoop Import query from multiple databases and ingest into Hive tables.
- Nomination QA Application is developed using Scala/spark/data frames to read data from Hive Tables on YARN Framework.
- Used Kafka streams to configure spark streaming to get information and then store in HDFS
- Developed Kafka consumer to consume data from Kafka topics.
- ImplementedSparkCoreinScalato process data in memory.
- CreatedOozieworkflowsforHadoopbased jobs includingSqoop,HiveandPig.
- CreatedHive External tablesand loaded the data in to tables and query data usingHQL.
- SupportedMapReducePrograms that are running on the cluster.Cluster monitoring, maintenance, and troubleshooting.
Environment: Hadoop, spark, SCALA, Amazon EMR, S3, EC2, SQOOP, Kafka, Jira, Jenkins
Confidential, Austin, TX
Data Engineer
Responsibilities:
- Followed Agile Scrum methodology that included iterative application development, weekly Sprints and stand up meetings.
- Extracted data using Sqoop Import query from multiple databases and ingest into Hive tables.
- Documented the data flow from application Kafka storm HDFS HIVE tables.
- Nomination QA Application is developed using Scala/spark/data frames to read data from Hive Tables on YARN Framework.
- DevelopedSparkscripts by usingScalashell commands as per the requirement.
- Closely worked with Kafka Admin team and setup Kafka cluster setup on the QA and production environments
- Implementing new dimensions into spark application upon on the business requirements.
- Responsible to store processed data into MongoDB.
- Wrote queries to fetch data from different table by using JOINs, Sub-queries, Correlated sub-queries and derived tables on SQL Server platform
- Created/Enhanced Teradata Stored Procedures to generate automated testing SQLs.
- Designed and Implement test environment on AWS.
- Experience in creating various views in Tableau (Tree maps, Heat Maps, Scatter plot).
- Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
- Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark.
- Responsible for Account management, IAM Management and Cost management
- Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup onAWS.
- Created monitors, alarms and notifications for EC2 hosts using Cloud Watch, Cloud trail and SNS.
- Involved in setting upKafkaandZookeeperProducer-Consumer components for the Big Data Environments
- Installed and configuredHive,Pig,SqoopandOozieon theHadoopcluster.
- Experienced in Data Modelling in SQL and NoSQL Databases
- Worked using the tools ofJIRAandJenkinswithin the project
Environment: Hadoop, spark, SCALA, Amazon EMR, S3, EC2, SQOOP, Kafka, Jira, Jenkins
Confidential, McLean, VA
Data Engineer
Responsibilities:
- Followed Agile Scrum methodology that included iterative application development, weekly Sprints and stand up meetings.
- As a Developer Worked with the development and operations teams to implement the necessary tools and process to support the builds, deployments, testing, and infrastructure.
- Worked on the S3 buckets whitelisting in Risk Dev account and fix the security groups in AWS QA environment.
- Building Data Pipelines to automate the process of accessing the S3 buckets of AWS and to get required information from the buckets. Worked on Aws lambda
- Create external tables with partitions using Hive, AWS Athena and Redshift
- Update the security groups using Confidential IAM roles in AWS Risk Development account.
- Updated numerous Confluence pages.
- Updated and worked on the Metadata in Snowflake.
- Worked on Confidential internal technologies such as Bogie (Build tool), Nebula.
- Developed numerous cloud formation templates to deploy EC2 instances based on requirements
- Updated the S3 buckets in prod Nebula and Worked on deploy EMR.
Environment: AWS EMR, S3, Lambda, RDS, Bogie, Docker, Jenkins, Git, Python, Atom.
Confidential, Columbus IN
Data Engineer
Responsibilities:
- Followed Agile Scrum methodology that included iterative application development, weekly Sprints and stand up meetings.
- Creating External and Managed Hive tables and working on them using HiveQL.
- Validated the Map reduce, Pig, Hive Scripts by pulling the data from the Hadoopand validating it with the data in the files and reports.
- Configured real-time streaming pipeline from DB2 to HDFS using Apache Kafka.
- Used JIRA for the issue tracking and bug reporting.
- ImplementedSparkusing Scala and Spark SQL for faster testing and processing of data
- DevelopedSparkcode to usingScalaandSpark -SQLfor faster processing and testing.
- Wrote Stored Procedures/Triggers/Functions using SQL Navigator to perform operations on Oracle database.
- Installed Name node, Secondary name node, Yarn (Resource Manager, Node manager, Application master), Data node.
- Developed ETL process usingJitterbit Harmonycloud Integration tool.
- Implemented AWS solutions using EC2, S3, RDS, EBS, Elastic Load Balancer, Auto-scaling groups.
- Exporting data to Teradata using SQOOP.
Environment: Hadoop, HDFS, Pig, Hive, Spark, Scala, MapReduce, AWS
Confidential, Denver CO
Data Engineer
Responsibilities:
- Used Hue and MapR Control System (MCS) to monitor and troubleshoot Spark jobs.
- Worked on data validation using HIVE and written Hive UDFs and Importing and exporting data into HDFS and Hive usingSqoop.
- Experienced in developing scripts for doing transformations usingScala.
- Used Spark API over HadoopYARNas execution engine for data analytics using Hive.
- UsedPigandHivein the analysis of data.
- Worked on UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.
- Built dashboards and visualizations on top of MapRDB and MapR-Hive using Tableau and Oracle data visualizer desktop.
- Built real-time visualizations on top of Open TSDB using Grafana.
Environment: Hadoop, HDFS, Pig, Hive, Spark, Scala, MapReduce
Confidential, Alpharetta, GA
Big Data Developer
Responsibilities:
- Configured Stream sets data collector with MapR Event Streams to stream real time data from different sources (database & files) into MapR topics.
- Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
- Worked on data validation using HIVE and written Hive UDFs.
- Built dashboards and visualizations on top of Hive using Power BI.
- Worked onData Lakeprocessing for the manufacturing data withSpark Scalaand storing inHivetables for further analysis withTableau.
- Developed SQL queries to perform joins on the tables in MySQL.
- WrittenHiveUDFs to extract data from staging tables.
- Involved in creatingHive tables, loading with data.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Created stored procedures in MySQL to improve data handling and ETL Transactions.
Environment: Hadoop, HDFS, Pig, Hive, Spark, MapReduce, Java
Confidential
Java Developer
Responsibilities:
- Involved in requirement, design and development phases of the application.
- Worked with DBA for the creation of new tables and new fields in the database.
- Developed custom tags, STLD to support custom User Interfaces.
- Developed the application using Struts Framework that leverages classical Model View Controller (MVC) architecture.
- Created new Action Forms to access the form data.
- Used Multithreading in programming to improve overall performance.
- Created RESTful Web service for updating customer data from sent from external systems.
- Data was converted into JSON using JSP tags.
- Developed the front end User Interface using HTML5, JavaScript, CSS3, JSON, jQuery.
Environment: Java, Oracle DB, HTML, JavaScript, and CSS