Hadoop / Aws Developer Resume
5.00/5 (Submit Your Rating)
PhiladelphiA
SUMMARY
- Strong experience working with real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Kafka, Flume, MapReduce, Hive.
- Comprehensive experience in Big Data processing using Hadoop Ecosystem including Hive, Hdfs, Map Reduce (MRV1 and YARN), HBase, Sqoop, Kafka, Oozie, Scala, Impala.
- Experience in importing and exporting data from various databases like RDBMS, MYSQL, Teradata, Oracle and DB2 into HDFS using Sqoop.
- Involved in designing a data model in Hive for migrating ETL process into Hadoop.
- Excellent Programming skills at a higher level of abstraction using Scala and Spark.
- Capable of designing Spark SQL based on functional specifications.
- Experienced in working with Hadoop/Big - Data storage and analytical frameworks over Amazon AWS cloud using tools like SSH, Putty.
- Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
- Proficient at using Spark API’s to explore, cleanse, aggregate, transform and store machine sensor data.
- Very strong in bash (shell), and Python scripting.
- Experience in UNIX/Linux along with SQL development in designing and implementing Relational Database model as per business needs in different domains.
- Developed monitoring and notification tools using Python.
- Involved in creating mappings, active transformations and also reusable transformations.
- Extensively development experience in different IDE's like Eclipse and STS.
- Worked on different file formats like Avro, Parquet, RC file format, JSON format.
- Knowledge of using Routed Protocols: FTP, SFTP, SSH, HTTP, HTTPSand Connect direct.
- Worked with different Bug tracking tools like JIRA.
- Proficient in using all amazon web services like EC2, EMR, Glue, EBS, IAM, S3, ELB, RDS, VPC, Route 53, Cloud Watch, Cloud Trail, Cloud Formation, Lambda, SNS, SQS, Api Gateway, Auto Scaling, Service Catalog and Storage Gateway etc.
- Expertise in migrating key systems from on premise hosting to Amazon Web Services.
- Involved in managing identity of the users using single sign on service such as LDAP.
- Built customized Amazon Machine Images (AMIs) and deployed these customized images base on requirements.
- Experience in real-time monitoring and alerting of applications deployed in AWS using Cloud Watch, Cloud Trail and Simple Notification Service.
- Deploying, managing, and operating scalable, highly available, and fault tolerant systems on AWS.
- Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
- Configured and managed AWS Glacier to move old data to archives, based on retention policy of database/applications.
- Good Interpersonal Skills, team-working attitude, takes initiatives and very proactive in solving problems and providing best solutions.
- Estimating AWS usage costs and identifying operational cost control mechanisms.
- Experience supporting 24x7 production computing environments. Experience providing on-call and weekend support.
- TECHNICAL PROFICIENCY
- Big Data Technologies: HDFS, Spark, MapReduce, Apache Kafka, Hive, Pig, Sqoop, Oozie, EMR/Cloudera/Hortonworks
- Cloud Providers: Amazon Web Services, EC2, S3, Glacier, Route 53, CloudWatch, CloudTrail, Service Catalog, Storage gateway, Data Sync, ELB, Dynamo DB, AWS Glue, Fargate, IAM, SNS
- Programming Languages: Java, Scala, Python, Unix, Shell Scripting, PL/SQL
- Database Platforms: SQL Server, MySQL, Oracle
- Operating Systems: Windows, Linux
- DevOps Tools: Git, SVN, CVS, Jira, Confluence, Maven, Bitbucket, Bamboo
- Integrated Development Environments : Eclipse, Visual Studio, Python IDLE, SQL Server Management Studio, STS
- Application Servers: JBoss, Apache Tomcat
- Development Methodologies: Agile Methodologies (Scrum), Waterfall
PROFESSIONAL EXPERIENCE
Confidential - Philadelphia
Hadoop / AWS Developer
Responsibilities:
- Performing as Big Data developer on Amazon AWS and OnPrem Hadoop System.
- Tracking deliverables and tasks through JIRA and in-house Agility Enterprise tool.
- Working with the Product Owner and the Business System Analysts to define business and functional needs, review requirements.
- Assessing Client business functions and technology needs. Applying Confidential ’s technologies, tools, and applications, including those that interface with business area and systems.
- Utilized Atlassian Bitbucket and Bamboo for Code repository, reviews, builds, and deployments to AWS.
- Designing and building Big Data ingestion and query platforms with Spark, Hadoop, Oozie, Sqoop, Hive, Presto, Amazon EMR, Amazon S3, Amazon CloudFormation, Amazon IAM, and Control-M .
- Performed data analysis on the raw data residing on HDFS and local storage using Hive queries and Sqoop to import data between Hadoop and RDBMS.
- Ingestion of data from Relational Database Systems to AWS S3 Cloud environment using Sqoop. Performing data cleansing and validation before it gets ingested.
- Performing transformations over the data using spark and AWS Elastics Map Reduce (EMR) according to the business requirement.
- Developed Spark application to deliver transformed and acceptable Brokerage Data to our Business Users.
- Created Spark program to compare Brokerage Model results against SAS results for accuracy.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Pair RDDs, Spark YARN.
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
- Used Hive optimization techniques during joins and best practices in writing Hive scripts.
- Load the data into Spark RDD and do in memory data Computation to generate the Output response.
- Used Apache Kafka for importing real time network log data into HDFS .
- Built various jobs using AWS Glue’s crawler capabilities to perform data cataloging and building the ETL pipelines for the target data mart.
- Responsible in building the data ingestion pipelines using AWS EMR and Spark Scala as the Data Processing Engine and AWS Athena as the Consumption Layer.
- Used Python SDK BOTO3 for AWS functions, integrated them to Automate most of the complex task's through Python scripting.
- Automated Oozie workflow upon EMR Cluster spin up to ETL Brokerage Data to Hive/Data Lake.
- Job automation with Ctrl-M to spin up EMR Cluster when Mainframe Data file landed on-prem.
- Utilized Splunk to pick up logs from EMR/CloudWatch for email alerts on Brokerage Data’s ETL status.
- Created Performance Testing program in Python Spark to compare between two big data result sets.
- Tracking code quality through SonarQube .
- Create build and deployment plans and schedule per requirement. Attend and update risks to Change & release board meetings. Identify the automation opportunities from the day to operations. Automate the tasks that can be automated by helping developers. Documented the processes in confluence.
- Created and Managed AWS environment and used the features EC2, VPC, IAM, ELB, EBS, SNS , cloud watch, S3, creating AMIs and snapshots, RDS, creating security groups, Subnets and Storage Gateway.
- Troposphere and CloudFormation template for handling S3 Bucket’s and EMR Cluster spin up.
- Applying security policies using AWS IAM (Identity and Access Management) to control access to the data in cloud.
- Created and modified Ansible Roles for a Playbook used on EMR Step Actions.
- Applied Least Privilege Access solutions for Query/Ingestion EMR Clusters.
- Setting-up Template only deployment on the AWS using Service Catalog .
- Performed application server builds in EC2 environment and monitoring them using cloud watch.
- Utilized CloudWatch to monitor resources such as EC2, CPU memory, Amazon RDS DB services , DynamoDB tables, EBS volumes; to set alarms for notification or automated actions; and to monitor logs for a better understanding and operation of the system.
- Used the AWS-CLI to suspend an AWS Lambda function processing an Amazon Kinesis stream, then to resume it again.
- Used AWS Redshift, S3, Spectrum and Athena services to query large amount data stored on S3 to create a Virtual Data Lake without having to go through ETL process.
- Used Amazon Route53 to manage DNS zones and also give public DNS names to elastic load balancers ip’s.
- Created functions in Lambda that aggregates the data from incoming events, then stores resulting data in Amazon Dynamo DB and S3.
- Experience in creating Task definitions, which specifies the tasks, Resource allocation ( Fargate ), services and docker image on which application is built for Elastic Container Service and ALB
- Expert Knowledge in Bash Shell Scripting, Automation of Cron Jobs .
- Scheduled Control-M runs for Brokerage processes to run on EMR through Service Catalog.
- Enabled and maintained EMR Query Clusters equipped with Hue, Hive, Presto, JupyterHub , and Oozie.
- Good knowledge on integrating the BI tools like Tableau with the Hadoop stack and extracting the required Data.
- Participating in design, code, and test inspections throughout life cycle to identify issues.
- Explaining technical considerations at related meetings, including those with internal clients
- Designing and building Continuous Integration and Continuous Delivery pipeline with Bitbucket, Bamboo, Docker, Sonar, Nexus , ServiceNow, and Amazon AWS.
- Elevating code into the development, test, and production environments on schedule. Providing follow up Production support. Submitting change control requests and documents.
- Experienced in CI/CD Continuous integration and Continuous delivery.
- Auto scaling the instances to design cost effective, fault tolerant and highly reliable systems.
- Estimating AWS usage costs and identifying operational cost control mechanisms
- Tracking defects and issues, and production supports.
- Participate in 24x7 on-call incident escalation rotations
Technical Environment: Apache Spark, Git, AWS Cloud, AWS EMR, AWS S3, Sqoop, Apache Parquet, AWS Athena, AWS Glue, Service Catalog, Storage Gateway, Data Sync, Fargate, ECS, Oozie, Git, Scala, Tableau, HDFS, Hive, Oozie, Python, Kafka, EC2, Route 53, CloudWatch, ELB, EBS, DynamoDB, Cloud Trail, Cloud Formation, Ansible, IAM, AWS CLI, SNS, SQS, Auto Scaling, Jira, Confluence, Bamboo.
Confidential
Big Data / Aws Developer
Responsibilities:
- Worked with the key stakeholders of different business groups to identify the core requirements in building the next generation analytic solution using impala as the processing framework and Hadoop for storage on the current dealer data lake.
- Involved in migrating MapReduce jobs into Spark jobs and used Spark SQL and load structured and semi-structured data into Spark clusters.
- Developed Kafka producer and consumers, Spark and Hadoop MapReduce jobs.
- Orchestrated hundreds of Sqoop scripts, Python scripts, hive queries using Oozie workflows and sub- workflows.
- Analyzed HBase data in Hive by creating external partitioned and bucketed tables.
- Involved in using HCATLOG to access Hive table metadata from MapReduce And Pig Code.
- Uploaded streaming data from Kafka to Hdfs, HBase and Hive by integrating With Storm.
- Written generic extensive data quality check framework to be used by the application using impala.
- Performance tuning in Hive, Impala using multiple methods but not limited to Dynamicpartitioning, bucketing, indexing, file compressions, vectorization, and cost based optimization.
- Implemented Fair Scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Developed Oozie workflow for scheduling and orchestrating the ETL process.
- Data and financial histories into Hdfs for Used Apache Hue web interface to monitor the Hadoop cluster and run the jobs.
- Ingested the Log data into ETL pipeline which transforms and loads the text format data to Hdfs.
- Implemented automatic failover Zookeeper and zookeeper failover controller.
- Developed data pipeline using Flume, Sqoop, pig and MapReduce to ingest customer behavioral.
- Participated in daily scrum meetings and iterative development.
- Involved in troubleshooting and performance tuning of reports and resolving issues within Tableau Server and Reports.
- Used AWS cloud services to launch Linux and windows machines, created security groups and written basic PowerShell scripts to take backups and mount network shared drives.
- Performed S3 buckets creation, policies and also on the IAM role based polices and customizing the JSON template.
- Used Amazon IAM to grant fine access of AWS resources to users. Also managed roles and permissions of users to AWS account through IAM.
- Configured and implemented the Amazon EC2 instances for our application teams.
- Extensively used Cloud Formation templates for deploying the infrastructures. Written the Cloud Formation scripts for datalake components that uses various AWS services such as Data pipeline, Lambda, Elastic Beanstalk, SQS, SNS and RDS database.
- Configured Ansible to manage AWS environments and automate the build process for core AMIs used by all application deployments including Autoscaling, and Cloud formation scripts.
- Used DNS management in Route53, Amazon S3 to backup database instances to save snapshots of data and Manage Network allocation in VPC to create new public networks.
- Creating alarms in CloudWatch service for monitoring the server's performance, CPU Utilization, and disk usage.
- Used Jira as ticket tracking and work flow tool.
- Execute and maintain internal and external SLAs developed with business stakeholders
Technical Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Flume, HBase, Spark, Zookeeper,AWS, MYSQL, Impala, Python, S3, IAM, Route 53, SNS, UNIX.