Data Engineer Resume
Dallas, UsA
PROFESSIONAL SUMMARY:
- Around 5 years of experience in software analysis, design, development, testing and implementation of Big Data, Spark, Hadoop, and Java projects.
- Had experience in Information Technology which includes experience in Big data, HADOOP Ecosystem, and strong in Design, Software processes, Requirement gathering, Analysis and development of software applications
- Excellent Hands on Experience in developing Hadoop Architecture in Windows and Linux platforms.
- Experience in building bigdata solutions using Lambda Architecture using Cloudera distribution of Hadoop, TwitterStorm, Trident, MapReduce, Cascading, HIVE, PIG and Sqoop..
- Hands on experience in designing and implementing data engineering pipelines and analyzing data using Hadoop ecosystem tools like HDFS, MapReduce, Yarn, Spark, Sqoop, Hive, Pig, Flume, Kafka, Impala, Oozie and HBase.
- Designed and implemented end - to-end data pipelines to extract, cleanse, process and analyze huge amounts of behavioral data and log data.
- Good experience working with various data analytics and big data services in AWS Cloud like EMR, Redshift, S3, Athena, Glue etc.,
- Expertise in Amazon Web Services including Elastic Cloud Compute (EC2) and Dynamo DB.
- Expertise in Automating deployment of large Cassandra Clusters on EC2 using EC2 APIs.
- Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.
- Experienced in developing production ready spark application using Spark RDD APIs, Data frames, Spark-SQL, and Spark-Streaming API's.
- Worked extensively on fine tuning spark applications to improve performance and troubleshooting failures in spark applications.
- Experienced in J2EE Design Patterns such as MVC, Business Delegate, Service Locator, Singleton, Transfer Object, Singleton, Session Façade, and Data Access Object.
- Good knowledge in RDBMS concepts (Oracle 11g, MS SQL Server 2000) and strong SQL, PL/SQL query writing skills (by using TOAD & SQL Developer tools), Stored Procedures and Triggers.
- Experienced in importing and exporting data using Sqoop from HDFS (Hive & HBase) to Relational Database Systems (Oracle &Teradata) and vice-versa.
- Experienced in developing and designing Web Services (SOAP and Restful Web services).
- Highly Proficient in writing complex SQL Queries, stored procedures, triggers and very well experienced in PL/SQL or T-SQL.
- Experienced in developing Web Interface using Servlets, JSP and Custom Tag Libraries
- Absolute knowledge of software development life cycle (SDLC), database design, RDBMS, data warehouse.
- Experience in writing ComplexSQLQueries involving multiple tables inner and outer joins.
- Good understanding and experience with Software Development methodologies like Agile and Waterfall.
TECHNICAL SKILLS:
Big Data Ecosystem: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, KafkaFlume, Cassandra, Impala, Oozie, Zookeeper, MapR, Amazon Web Services (AWS), EMR
Cloud Technologies: AWS
IDE s: IntelliJ, Eclipse, Spyder, Jupyter
Operating Systems: Windows, Linux
Programming languages: Python, Scala, Linux shell scripts, PL/SQL, Java
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, HBASE
Java&J2EE Technologies: Core Java, Servlets, JSP, JDBC, Java Beans
Business Tools: We Intelligence, Crystal Reports, Dashboard Design, Tableau
WORK EXPERIENCE:
Confidential, Dallas, USA
Data Engineer
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Flume, Pig, Hive, HBase, Oozie, ZooKeeper, Sqoop, Spark and Kafka.
- Developed Sparkcode using Scala and Spark-SQL/Streaming for faster testing and processing ofdata.
- Used SparkAPI over ClouderaHadoopYARN to perform analytics on data in Hive.
- As a Big Data Developer implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, MongoDB, Hive, Oozie, Flume, Sqoop and Talend etc.
- Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Data Frame, PairRDD's, SparkYARN.
- Deployed application to AWS and monitored the load balancing of different EC2 instances
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from SQL into HDFS using Sqoop.
- Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
- Deployed application to AWS and monitored the load balancing of different EC2 instances
- Installed Hadoop, Map Reduce, and HDFS and developed multiple MapReduce jobs in PIG and Hive fordata cleaning and pre-processing.
- Worked on Big Data Integration &Analytics based on Hadoop, SOLR, Spark, Kafka, Storm, and web Methods.
- Extensively worked on Python and build the custom ingest framework and w orked on Rest API using python.
- Developed Kafka producer and consumers, Spark and HadoopMapReduce jobs.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Imported the data from different sources like HDFS/Hbase into SparkRDD.
- Configured deployed and maintained multi-node Dev and Test KafkaClusters.
- Strongly recommended to bring in Elastic Search and was responsible for installing, configuring and administration.
- Created Elastic Map Reduce (EMR) clusters and Configured the Data pipeline with EMR clusters for scheduling the task runner and provisioning of Ec2 Instances on both Windows and Linux.
- Worked on AWS Relational Database Services, AWS Security Groups and their rule and implemented Reporting, Notification services using AWS API.
- Implemented AWS EC2, Key Pairs, Security Groups, Auto Scaling, ELB, SQS, and SNS using AWS API and exposed as the Restful Web services.
- Involved in converting MapReduce programs into Sparktransformations using Spark RDD's on Scala.
- Developed Sparkscripts by using ScalaShell commands as per the requirement.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Implemented using SCALA and SQL for faster testing and processing of data. Real time streaming the data using with KAFKA.
- Developed and designed automation framework using Python and Shell scripting.
- Involved in writing Java API for Amazon Lambda to manage some of the AWS services.
- Design & implement ETL process using Talend to load data from Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa. Loading data into HDFS.
- Load the data into SparkRDD and do in memory data Computation to generate the Output response.
- Worked on major components in Hadoop Ecosystem including Hive, PIG, HBase, HBase-Hive Integration, Scala, Sqoop and Flume.
- Developed Hive Scripts, Pig scripts, UNIX Shell scripts, programming for all ETL loading processes and converting the files into parquet in the Hadoop File System.
Environment: Hadoop, J2EE, JavaScript, Python, HDFS, Spark, MapReduce, Pig, Hive, Sqoop, Kafka, HBase, Oozie, Flume, Scala, Python, Java, SQL Scripting and Talend, Linux Shell Scripting, Cassandra, Zookeeper, HBase, MongoDB, Cloudera, Cloudera Manager, EC2, EMR, S3, Oracle, MySQL.
Confidential, Kansas, USA
Big Data Engineer
Responsibilities:
- Involvement in working with Azure cloud stage (HDInsight, Databricks, DataLake, Blob, Data Factory, Synapse, SQL DB and SQL DWH).
- Performed information purging and applied changes utilizing Databricks and Spark information analysis.
- Designed and mechanized Custom-constructed input connectors utilizing Spark, Sqoop and Oozie to ingest and break down informational data from RDBMS to Azure Data lake.
- Included myself in making database components like tables, views, triggers utilizing T-SQL to give structure and keep up information effectively.
- Broad involvement in working with SQL, with profound knowledge on T-SQL (MS SQL Server).
- Created Automated ETL jobs in Talend and pushed the information to Azure SQL data warehouse.
- Used Azure Synapse to oversee handling outstanding workloads and served data for BI and predictions.
- Developed Spark Scala scripts for mining information and performed changes on huge datasets to handle ongoing insights and reports.
- Extensively utilized Databricks notebooks for interactive analysis utilizing Spark APIs.
- Supported analytical phases, dealt with data quality, and improved performance utilizing Scala's higher order functions, lambda expressions, pattern matching and collections.
- Implemented versatile microservices to deal with simultaneousness and high traffic. Advanced existing Scala code and improved the cluster execution.
- Managed assets and scheduling over the cluster utilizing Azure Kubernetes Service.
- Involved in building an Enterprise DataLake utilizing Data Factory and Blob storage, empowering different groups to work with more perplexing situations and ML solutions.
- Extensive information in Data changes, Mapping, Cleansing, Monitoring, Debugging, execution tuning and investigating Hadoop clusters.
- Worked with data science group to do preprocessing and include feature engineering, helped Machine Learning algorithm in production.
- Used Azure Data Factory, SQL API and MongoDB API and integrated data from MongoDB, MS SQL, and cloud (Blob, Azure SQL DB).
- Reduced access time by refactoring information models, query streamlining and actualized Redis store to help Snowflake.
- Facilitated information for interactive Power BI dashboards and reporting.
Environment: Azure (HDInsight, Databricks, DataLake, Blob Storage, Data Factory, SQL DB, SQL DWH, AKS), Scala, Python, Hadoop 2.x, Spark v2.0.2, NLP, Airflow v1.8.2, Hive v2.0.1, Sqoop v1.4.6, HBase, Oozie, Talend, CosmosDB, MS SQL, MongoDB, Ambari, PowerBI, Azure DevOps, Ranger, Git.
Confidential
Hadoop Developer
Roles and Responsibilities
- Importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Worked on installing clusters, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
- Hands on experience in installing configuring and using Hadoop ecosystem components like Hadoop, MapReduce, HDFS, HBase, Hive, Sqoop, Pig.
- Gained Good Exposure on Apache Hadoop, Map Reduce programming, PIG Scripting and Distribute Application and HDFS.
- Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
- Written Hive queries for data analysis to meet the business requirements.
- Creating Hive tables and working on them using Hive QL.
- Participating in development/implementation of Cloudera Hadoop environment.
- Got good experience with NOSQL database.
- Performed data validation with Redshift and constructed pipelines designed over 100TB per day.
- Worked with business users on the new Tableau versions features and explaining self-service capabilities.
- Designed & Developed logical & physical data model using data warehouse methodologies.
- Created Summary and detail dashboards for identifying mismatch of the data in Source and reporting systems using Tableau Desktop.
- Performed Data profiling, preliminary data analysis and handle anomalies such as missing, duplicates, outliers, and imputed irrelevant data.
Environment: Hadoop, HDFS, Hortonworks, Hive, Sqoop, Python, Unix, Shell Scripting, Spark SQL
Confidential
ETL Developer
Roles and Responsibilities
- Played a key role in gathering business requirements, system and design requirements, gap analysis, use case diagrams and flow charts.
- Performed ETL operations using with Informatica power center to - data extraction, staging, apply transformations and stored in target data centers.
- Parsed complex files using Informatica Data Transformations (normalizer, Lookup, Source Qualifier, Expression, Aggregator, Sorter, Rank and Joiner) and loaded them into databases.
- Created complex SQL queries and scripts to extract, aggregate and validate data from MS SQL, Oracle, and flat files using Informatica and loaded into a single data warehouse repository.
- Involve in creating database objects like tables, views, stored procedures, triggers, packages, and functions using T-SQL to provide structure and maintain data efficiently.
- Designed SQL, SSIS, and Python based batch and real-time ETL pipelines to extract data from transactional and operational databases and load the data into target databases/data warehouses.
- Involved in writing python scripts to extract data from different API's.
- Responsible for collecting, scrubbing, and extracting data, generated compliance reports using SSRS, analyzed and identified market trends to improve product sales.
- Performed data profiling, answered complex business questions by providing data to business users.
- Generated DDL and created the tables and views in the corresponding architectural layers.
- Extract, transform and analyze measures/indicators from multiple sources to generate reports, dashboards, and analytical solutions.
Environment: Python, Informatica v9.x, MS SQL SERVER, T-SQL, SSIS, SSRS, SQL Server Management Studio, Oracle, Excel.
