Hadoop Developer Resume
Kansas City, KS
SUMMARY
- 9+ years of IT experience in Analysis, Design, Implementation, Development, Maintenance, and test large scale applications using SQL, Hadoop, Java and Splunk, Elastic Search, Kibana, Logstash other Big Data technologies.
- Expertise in using major components of Hadoop ecosystem components like HDFS, YARN, MapReduce, Hive, Impala, Pig, Sqoop, HBase, Spark, Spark SQL, Kafka, Spark Streaming, Flume, Oozie, Zookeeper, Hue.
- Knowledge of ETL methods for data extraction, transformation and loading in corporate - wide ETL Solutions and Data Warehouse tools for reporting and data analysis.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa and load into Hive tables, which are partitioned.
- Hands on experience inVPN Putty and WinSCP.
- Experience in Data load management, importing & exporting data using SQOOP & FLUME.
- Having good knowledge in writing MapReduce jobs through Pig, Hive, and Sqoop.
- Extensive knowledge in writing Hadoop jobs for data analysis as per the business requirements using Hive and worked on HiveQL queries for required data extraction, join operations, writing custom UDF's as required and having good experience in optimizing Hive Queries.
- Pleasant experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files. Mastered in using different columnar file formats like RC, ORC and Parquet formats. Has good understanding of various compression techniques used in Hadoop processing like G-zip, Snappy, LZO etc.
- Hands-on Experience in NoSQL databases like Mongo DB, HBase and Cassandra.
- Working experience on NoSQL databases like HBase, Azure,MongoDB and Cassandra with functionality and implementation.
- Experience in extracting files from MongoDB through Sqoop and placed in HDFS and processed.
- Experience in developing and automating application’s usingUnix Shell Scriptingin the field ofBig Data using Map-Reduce Programming for batch processing of jobs on a HDFS cluster,Hive and Pig
- Experience in various technologies like Talend, Big Data, Pentaho, Informatica, Amazon redshift, S3 cloud, EC 2, Tableau, Business Objects with different data bases like Oracle, DB2, Vertica, MySQL, Redshift etc.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Hands on experiences in Hadoop, Eco - system components like HDFS, MapReduce, Cloudera, (MRV1, YARN), Pig, Hive, HBase, Sqoop, Flume, Kafka, Impala, Oozie and Programming in Spark using Python and Scala
- Design and build scalable Hadoop distributed data solutions using native, Cloudera and Hortonworks, Spark, and Hive.
- Skilled in phases of data processing (collecting, aggregating, moving from various sources) using Apache Flume and Kafka.
- Skilled in Serverless Technologies likeAWS Elastic Beanstalk, API Gateway, Lambda.
- Experienced in Amazon Web Services (AWS), and cloud services such as EMR, EC2, S3, EBS and IAM entities, roles, and users.
- Excellent technical and analytical skills with clear understanding of design goals of ER modeling forOLTPand dimension modeling forOLAP
- Experience in Performance Tuning in Vertica which includes creation of projection, partition swapping.
- Experience onBI reportingwith at ScaleOLAPforBig Data.
- Experienced in Ansible, Jenkins, and PySpark.
- Designed, configured and deployed Amazon Web Services (AWS) for a multitude of applications utilizing theAWSstack (Including EC2, Route53, S3, RDS, CloudFormation, Cloud Watch, SQS, IAM), focusing on high-availability, fault tolerance, and auto-scalingSet-up databases in AWS usingRDS, storage usingS3bucket and configuring instance backups to S3 bucket.
- Working knowledge ofSpark RDD, Data Frame API, Data set API, Data Source API, Spark SQL and Spark Streaming.
- Hands-on knowledge in Core Java concepts like Exceptions, Collections, Data-structures, I/O, Multithreading, Serialization and deserialization of streaming applications
TECHNICAL SKILLS
BigData/Hadoop Technologies: MapReduce, Spark, SparkSQL,Azure,Spark Streaming, Kafka,PySpark,, Pig, Hive,HBase, Flume, Yarn, Oozie, Zookeeper, Hue, Ambari Server.
Languages: HTML5, DHTML, WSDL, CSS3, C, C++, XML, R/R Studio, SAS Enterprise Guide, SAS, R (Caret, Weka, ggplot), Perl, MATLAB, Mathematica, FORTRAN, DTD, Schemas, Json, Ajax, Java, Scala, Python (NumPy, SciPy, Pandas, Gensim, Keras), Java Script, Shell Scripting.
NO SQL Databases: Cassandra, HBase, MongoDB, MariaDB.
Development Tools: Microsoft SQL Studio, IntelliJ,Azure Databricks, Eclipse, NetBeans.
Public Cloud: EC2, IAM, S3, Autoscaling, CloudWatch, Route53, EMR, RedShift.
Development Methodologies: Agile/Scrum, UML, Design Patterns, Waterfall.
Build Tools: Jenkins, Toad, SQL Loader,PostgreSql, Talend,Maven, ANT, RTC, RSA, Control-M, Oziee, Hue, SOAP UI.
Reporting Tools: MS Office (Word/Excel/Power Point/ Visio/Outlook), Crystal reports XI, SSRS, cognos.
Databases: Microsoft SQL Server 2008,2010/2012, MySQL 4.x/5.x, Oracle 11g, 12c, DB2, Teradata, Netezza and Snowflake.
Operating Systems: All versions of Windows, UNIX, LINUX, Macintosh HD, Sun Solaris.
PROFESSIONAL EXPERIENCE
Hadoop Developer
Confidential
Responsibilities:
- Creating data model for bringing in the new data source for supporting various process.
- Maintain the code in GitHub, well versed knowledge in using the GitHub, Gitbash
- Bundling and compiling the code for creating jars using SBT
- Experienced in analysing data with Hive and Pig
- Experienced in querying data using SparkSQL on top of Spark engine
- Unit testing on data and improvement of performance, turned over to production
- Downloading data from AWS S3 to HDF using spark and Scala.
- Implemented Vault process to store all secret information like passwords.
- AWS Athena used to query the tables in S3 and used EMR Cluster, Glue jobs.
- Good knowledge on headwaters3(kinesis)
- Have good knowledge on Data Analytics tool (Tableau) Building data lakes in Hadoop using Sqoop and hive.
- Using spark streaming capturing the real time data and processing in Hadoop.
- Dynamic refresh of data to tableau reports to see the trends for every scenarios built as part of the business requirement.
- Exposure on Spark Architecture and how RDD’s work internally by involving and processing the data from Local files, HDFS and RDBMS sources by creating RDD and optimizing for performance.
- Experience in writing Sqoop scripts to import andexportdata from RDBMS into HDFS and
- HDFS to Microsoft SQL server and handled incremental loading on the customer and transaction information data dynamically.
- Involved in converting Map Reduce programs into Spark transformations using Spark RDD’s on Scala.
- Experience creating reports using BI Reporting Tools likeTableau and Power Bi.
- Evaluated usage of Oozie for Workflow Orchestration.
- Embedding the required Tableau reports to websites using JavaScript API.
- Acting as a billing data SME for consent degree program to support the reconciliation process.
- Worked on consent degree s reconciliation process to reduce the fallout percentage of failure consents
- Good data understanding of order data from billing and order management.
- Creating Tableau reports for the different Modules and scenarios to support business needs and bringing the burn down charts for the reconciliation process
- Configure data mappings and transformations to orchestrate data integration & Validations
Environment: Pyspark, EMR,Redshift,Athena,AWS S3,Kinesis,Kerberos, Scala, Spark SQL, Spark Streaming, Kafka, Shell Scripting, Intellij, GitHub, SBT, Jira, Python,Databricks, GitHub, SBT, Rally, Python, Netezza,snowflake.
Hadoop Developer
Confidential, kansas City, KS
Responsibilities:
- Develop New Spark-SQL ETL logics in Big Data for the migration and availability of the Facts and Dimensions used for the Analytics.
- Indexed logs from Data Lake on Elastic search using spark for visualization on Kibana.
- Installed and configured elastic search and managed the system for data ingestion
- ShippedHDFSIndexed documents toElastic searchand writtenScala scriptsfor Querying and ingesting Dataframes in bulk transport using embeddedElastic4s(Scala) module for Crud.
- Develop of PySpark SQL application, Big Data Migration from Teradata to Hadoop and reduce Memory utilization in Teradata analytics.
- Develop Pig scripts to establish the data flow to achieve the desired watch list at store and item level exception reporting.
- Develop shell script to fetch the store, max timestamp, date combinations for Hive tables to pass them as parameters to pig script and establish connection to MySQL database.
- Designed, configured and deployed Amazon Web Services (AWS) for applications utilizing the AWS stack (Including EC2, Route53, S3, RDS, Cloud Formation, Cloud Watch, SQS, IAM), focusing on high-availability, fault tolerance, auto-scaling, load-balancing capacity monitoring and alerting.
- Created Pipelines inADFusingLinked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources likeAzure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
- Exploring with Spark, improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
- Involved in creating Hive Tables, loading with data and writing Hive queries to do analytics on the data
- Developed and implemented real-time data pipelines with Spark Streaming, Kafka, and Cassandra to replace existing lambda architecture without losing the fault-tolerant capabilities of the existing architecture.
- Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources and developed Spark Applications by using Scala, Java.
- Created a Spark Streaming application to consume real-time data from Kafka sources and applied real-time data analysis models that we can update on new data in the stream as it arrives.
- Used Spark-Structured-Streaming to perform necessary transformations and data models which gets the data from Kafka in real time and persists into Cassandra.
- Written transformations and actions on Data Frames, used Spark SQL on data frames to access hive tables into spark for faster processing of data.
- Involved in configuringElastic search, Log stash & Kibana (ELK)stacks andElasticsearch performance and optimization
- Converted features in JSON to Elasticsearch Stack: Logstash to Kibana
- Configured flume to log file movement from servers to elastic search and analyze the data using Kibana
- Architected solutions on AWS Cloud platform using various services offered by Amazon like EC2, ELB, Auto Scaling, EBS, S3, VPC, RDS, SNS, VPN, CloudWatch & IAM.
- Strong knowledge in NOSQL column-oriented databases like HBase and their integration with Hadoop cluster using connectors
- AWS Services used APIGateway, Lambda, EMR, Kinesis, IAM, EC2, S3, EBS, Data Pipeline, VPC, Glacier & Redshift
- Integrated Hadoop with Tableau to generate visualizations like Tableau Dashboards.
Environment: Cloudera, Spark, Hive, Pig, Spark SQL, Spark Streaming, HBase, Sqoop, Kafka, AWS EC2, S3, EMR, RDS, Linux Shell Scripting, Postgres, MySQL, Big query, Cloud Storage, Cloud- ML, Data-Proc, Data-Lab, IAM, Cloud SQL, IAM, Eclipse, Java/J2EE, Oracle, HTML, PL/SQL, Oracle, XML, SQL
Big Data Developer
ConfidentialResponsibilities:
- Involved in requirement gathering from the Clients for implementing and delivered it by following the waterfall process.
- All the business logic and front-end designs handled based on the blueprint.
- Actively involved in the requirement gather and requirement planning for each quarter.
- Involved in UI development based on the given blueprints from the client.
- Application completely based on the Content Management System, involved in configuring the components in Site core and business layer coding.
- Regression testing made after every development phase before delivering to Quality Assurance team.
- Appreciation from the client for on time delivery with the quality of test cases automation by using selenium.
- Performed Database testing using SQL Developer.
- Experience in Web service testing using SOAP UI.
- Working knowledge of waterfall development models.
- Developed Test Automation Scripts using Selenium WebDrivers.
- Integrated tests cases with Continuous Integration Tool (JENKINS).
- Developed and maintained Regression/Sanity test suite in HP ALM/CA Agile Central test management tool.
- Experience building automation framework using Selenium,
Environment: Visual Studio, Excel, Rally, Shell Scripting, Selenium Cucumber Framework, CSS, HTML, Apache Maven, Junit and Eclipse.