Big Data Developer/engineer Resume
MN
PROFESSIONAL SUMMARY
- 7+ Years of experience in various IT related technologies, which includes 4+ years of hands - on experience in Big Data technologies. Implementation and extensive working experience in wide array of tools in the Big Data Stack like HDFS, YARN, Spark, Map Reduce, Hive, Flume, Oozie, Sqoop, Kafka, Zookeeper and HBase with Cloudera and Horton works platform in Financial, Retail and Health-care sector.
- Experience in importing data from existing relational databases (Oracle, MySQL, and Teradata) that provide SQL interfaces using Sqoop.
- Good experience in working with cloud environment like Amazon Web Services (AWS), Microsoft Azure, GCP
- Hands on experience in Avro, Parquet, ORC files and Combiners, Counters, Dynamic Partitions, bucketing for best practice and performance improvement, worked on different Compression Codecs (GZIP, SNAPPY, BZIP).
- Designed HIVE queries to perform data analysis, data transfer and table design to load data into Hadoop environment.
- Extensive experience on importing and exporting data using stream processing platforms like Flume and Kafka.
- Experience in data workflow scheduler Zookeeper and Oozie to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with the control flows.
- Integrated BI tool like Tableau with Impala and analyzed the data.
- Experience with NoSQL databases like HBase, Cassandra and MongoDB.
- Experience in collecting the log data from different sources (webservers and social media) using Flume, Kafka and storing in HDFS to perform the MapReduce jobs.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RRD's and Scala.
- Implemented Spark using Scala and utilized data frames and Spark SQL API for faster processing of Data.
- Capable at using AWS utilities such as EMR, S3 and Cloud watch to run and monitor Hadoop/Spark jobs on AWS.
- Worked on extensive migration of Hadoop and Spark Clusters to GCP, AWS and Azure
- Knowledge in creating dashboards with the help of business intelligence tool such as Tableau.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
TECHNICAL SKILLS
Big Data Ecosystem: HDFS and Map Reduce, Hive, Impala, HUE, Oozie, Zookeeper, Apache Spark, Apache STORM, Apache Kafka, Pig, Sqoop, Flume
NoSQL Databases: HBase, Cassandra, and MongoDB
Hadoop Distributions: Cloudera, Horton works
Programming languages: Java, SCALA, Pig Latin, HiveQL
Scripting Languages: Shell Scripting
Databases: MySQL, oracle, Teradata, DB2
Build Tools: Maven, Ant
Reporting Tool: Tableau
Version control Tools: SVN, Git, GitHub
Cloud: AWS, Azure
App/Web servers: WebSphere, WebLogic
Operating Systems: WINDOWS 10/8
Development IDEs: Eclipse IDE, Python (IDLE) Packages Microsoft Office, putty, MS Visual Studio.
PROFESSIONAL EXPERIENCE
Big Data Developer/Engineer
Confidential, MN
Responsibilities:
- Developed Sqoop scripts for importing and exporting data into HDFS and Hive.
- Used Zookeeper to store offsets of messages consumed for a specific topic and partition by a specific Consumer Group in Kafka.
- Familiar with AWS Components like EC2, S3.
- Working knowledge on AWS technologies like S3 and EMR for storage, big data processing and analysis
- Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
- Extensively involved in creating and designing programs which involves in procedures, triggers, and sequences to access oracle and used Microservices architecture with Spring boot services for interacting through a combination of REST and grasping AWS to build, test and deploy Microservices.
- Used Tableau to convey the results by using dashboards to communicate with team members and with other data science teams, marketing, and engineering teams.
- Generated the data cubes using Hive, Pig, Java Map-Reducing on provisioning Hadoop cluster in AWS.
- Expertise in Performance Tuning Tableau Dashboards and Reports built on huge sources.
- AWS EMR to process big data across Hadoop clusters of virtual servers on Amazon Simple Storage Service (S3).
- Expertise in AWS data migration between different database platforms like Local SQL Server to Amazon RDS, EMR HIVE and experience in managing and reviewing Hadoop log files in AWS S3.
- Built and supported several AWS, multi-server environments using Amazon EC2, EMR, EBS, and Redshift deployed the Big Data Hadoop application on AWS cloud.
- Provided support on AWS Cloud infrastructure automation with multiple tools including Gradle, Chef, Nexus, Docker and monitoring tools such as Splunk and CloudWatch.
- Used Jenkins pipelines to drive all microservices builds out to the Docker registry and then deployed to Kubernetes.
- Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
- Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
- Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark.
- Implemented Server less architecture using AWS Lambda with Amazon S3 and Amazon Dynamo DB.
- Scheduled clusters with Cloud watch and created Lambda to generate operational alerts for various workflows.
- Worked on AWS EC2, IAM, S3, LAMBDA, EBS, Elastic Load balancer (ELB), auto scaling group services.
- Involved in Agile Methodologies, Daily Scrum meetings, Sprint planning's and strong experience in SDLC.
- Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- Improving performance and optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Involved in creating Hive Tables, loading with data, and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and used Apache NIFI to ingest persist it to HBase.
- Worked with HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Hands on experience on AWS infrastructure services Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2).
- Load consumer response data in AWS S3 bucket into Hive external tables in HDFS location to serve as feed for tableau dashboards.
- Worked in Agile and used JIRA for maintain the stories about project.
Environment: Hadoop, Map Reduce, Hive, Spark, Oracle, GitHub, Tableau, Unix, Cloudera, Kafka, Sqoop, Scala, NIFI, HBase, AWS, amazon EC2, S3
Big Data Developer/Engineer
Confidential, Chicago, IL
Responsibilities:
- Worked with SQOOP import and export functionalities to handle large data set transfer between DB2 database and HDFS.
- Performed data migrations from on-prem to Azure Data Factory and Azure Data Lake.
- Created hive queries for extracting data and sending them to clients.
- Implemented OLAP multi-dimensional cube functionality using Azure SQL Data Warehouse.
- Transformation and Analysis in Hive, Parsing the raw data using Map reduce and SPARK.
- Created SCALA programs to develop the reports for Business users.
- Experience in working with Cloudera (CDH4 &CDH5), Horton Works, Amazon EMR, Azure HDINSIGHT on multi-node cluster.
- Extensive experience in the implementation of Continuous Integration (CI), Continuous Delivery and Continuous Deployment (CD) on various Java based Applications using Jenkins, TeamCity, Azure DevOps, Maven, Git, Nexus, Docker, and Kubernetes.
- Created Pipelines in ADF using Linked Services/Datasets/Pipeline to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
- Worked on ingesting data from different sources, Followed agile methodology during project delivery.
- Proactively involved in ongoing maintenance, support, and improvements in Hadoop cluster.
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Azure cloud.
- Configured Azure Container Registry for building and publishing Docker container images and deployed them into Azure Kubernetes Service (AKS).
- Experience in working with NoSQL database HBase in getting real time data analytics.
- Configured various big data workflows to run on the top of Hadoop using Oozie and these workflows comprise of heterogeneous jobs like Pig, Hive, Sqoop and MapReduce.
- Knowledge of Code Hub and GIT, Worked/Coordinated with Offshore to complete the tasks.
- Worked on Microsoft Azure toolsets including Azure Data Factory Pipelines, Azure Data bricks, Azure Data Lake Storage
Environment: Hadoop 2.x, HDFS, MapReduce, PySpark, Spark SQL, ETL, Hive, Pig, Oozie, Databricks, Java, spring, Sqoop, Azure, Star Schema, Python, Nifi, Cassandra, Scala, Power BI, Machine Learning.
Big Data Developer/Enginee
Confidential, Malvern, PA
Responsibilities:
- Implemented technical architecture and developed various Big Data workflows using custom MapReduce, Hive, Sqoop.
- Deployed on premise cluster and tuned the cluster for optimal performance for job execution needs and processes large data sets.
- The logs that are stored on HDFS are analyzed and the cleaned data is imported into Hive warehouse which enabled end business analysts to write Hive queries.
- Good experience in building pipelines using Azure Data Factory and moving the data into Azure Data Lake Store.
- Worked on development of Confidential Data Lake and in building Confidential Data Cube on Microsoft Azure HDInsight cluster.
- Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
- Assigned the tasks of resolving defects found in testing the new application and existing applications.
- Analyzing the requirements, designing, and developing solutions.
- Managing Project team in achieving the project goals including resource allocation, resolving
- Developed Batch processing solutions with Azure Databricks and Azure Event.
- Analyzed, designed, and built modern data solutions using Azure PaaS service to support visualization of data.
- Used Linux (Ubuntu) machine for designing, developing, and deploying of Java modules.
Environment: MapReduce, Pig, Hive, Sqoop, FLUME, HBase, JDK 1.6, Maven, Linux.
SQL/PLSQL Developer
Confidential
Responsibilities:
- Requirements analysis, application design, coding, testing, maintenance, and support.
- Created Stored Procedures, functions, Data base triggers, Packages and SQL Scripts based on requirements.
- Created complex SQL queries using views, sub queries, correlated sub queries.
- Developed UNIX shells/scripts to support and maintain the implementation.
- Developed ad-clicks based data analytics, for keyword analysis and insights.
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop.
- Designed docs and specs for the near real-time data analytics using Hadoop and HBase.
- Crawled public posts from Facebook and tweets.
- Hands on experience in MapReduce jobs with the Data Science team to analyze this data.
- Converted output to structured data and imported to Tableau with analytics team.
- Defined problems to look for right data and analyze results to make room for new project.
- Created Shell Scripts for invoking SQL scripts and scheduled them using crontab.
- Defect management involving discussion with Business, Process Analysts, and team.
- Defect Tracking and Prepare Test Summary Reports
- Responsible for requirements analysis, coding, testing, and maintenance.
- Performed requirements analysis and object-oriented design.
- Created new Tables, Indexes, Synonyms and Sequences needed as per new requirements.
- Implemented complex SQLs using joins, sub queries and correlated sub queries.
- Created Shell Scripts for invoking SQL scripts and scheduled them using corn tab.
- Prepare Unit Test Cases based on Functional Requirements.
Environment: C++, Oracle PL/SQL (MS Visual Studio, SQL Developer), UNIX Shell Scripts.
Confidential
Jr Java Developer
Responsibilities:
- Involved in the analysis, design, implementation, and testing of the project.
- Implemented the presentation layer with HTML and JavaScript.
- Developed web components using JSP, Servlets and JDBC.
- Implemented database using SQL Server .
- Designed tables and indexes.
- Wrote complex SQL and stored procedures .
- Involved in fixing bugs and unit testing with test cases using JUnit.
- Developed user and technical documentation .
Environment: Java, JSP, Servlets, JDBC, HTML, JavaScript, MySQL, JUnit, Eclipse IDE.
