Hadoop Developer Resume Duluth, GA - Hire IT People

SUMMARY:

5+ years of total IT experience which includes Java Development, Web application Development, Database Management and Big Data ecosystem technologies
Around 3+ years of Leveraged strong Skills in developing applications involving Big Data technologies like Hadoop, Spark, MapReduce, Yarn, Flume, Hive, Pig, Kafka, Storm, Sqoop, HBase, Cassandra,Hortonworks, Cloudera, Mahout, Avro and Scala.
Worked on migrating Map Reduce programs into Spark transformations using PySpark and Scala.
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Hands - on experience with Hadoop applications (such as administration, configuration management, monitoring, debugging, and performance tuning)
Skilled programming in Map-Reduce framework and Hadoop ecosystems.
Very good experience in designing and implementing MapReduce jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster.
Good experience in transferring the data to Solr from MYSQL by using dataimport handler in stand alone mode.
Experience in setting up the Hadoop Environment on cloud(AWS,Oracle) and spinning of the new services on Existing nodes.
Good experience in sqooping the data to Solr from MYSQL by using Sqoop in Hadoop Environment.
Worked on a live 60 nodes Hadoop cluster running Cloudera CDH4
Worked with highly unstructured and semi structured data of 90 TB in size (270 TB with replication factor of 3)
Implemented Commissioning and Decommissioning of new nodes to existing cluster.
Extracted the data from Relational Database (SQL, Oracle, MYSQL) into HDFS using Sqoop.
Created and worked Sqoop jobs with incremental load to populate Hive External tables.
Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis
Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
Developed UDF’s in Java & Python as and when necessary to use in PIG and HIVE queries
Experience in using Sequence files, RCFile, Parquet, AVRO and HAR file formats.
Developed Oozie workflow for scheduling and orchestrating the ETL process
Implemented authentication using Kerberos and authentication using Apache Sentry.
Worked with the admin team in designing and upgrading CDH 4 to CDH 5
Good knowledge of Amazon Web Service components like EC2, EMR, S3 etc
Experience and knowledge NoSQL database like HBase, Cassandra or MongoDB.
Good Working knowledge of BI Tools like Tableau, Birst.
Extensive experience with SQL, PL/SQL and database concepts
Proactive and well organized with effective time management skills and problem solving skills
Experience and knowledge in Microservices, Docker container and Linux commands.
Experienced in shell scripting to execute the commands and to notify the result to group via IM.
Good Interpersonal skills and ability to work as part of a team. Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines
Participated in requirement analysis, reviews and working sessions to understand the requirements and system design
Good Experience in designing Relational Databases and creating ERP diagrams.
Trained in Machine Learning concepts in python and NPL
Worked in private project on machine learning to improve the seat filling.
Good Experience in website scraping and loading data into RDBMS.
Advanced knowledge in performance troubleshooting and tuning MYSQL clusters.
Very good experience in Query tuning to decrease the latency.
Strong knowledge on Oracle Database administration issues (i,e schema, user, security)
Experienced in bulk insert, bulk copy, dynamic sql, xml query methods (XML parsing)indexing and optimization and stored procedures
Experienced in DBA skills like db resize, encryption, security.
Expertise in relational database administration including configuration, implementation, data modeling, maintenance, redundancy/HA, security, troubleshooting/performance tuning, upgrades, database, data and server migrations, SQL Tune and troubleshoot database operations
Experienced in capacity planning and disk space management.
Experience in Troubleshoot Linux/Unix operating systems for database performance.

TECHNICAL SKILLS:

Operating Systems: Linux (Ubuntu, CentOS), Windows, Mac OS

Hadoop ECO Systems: Hadoop, MapReduce,Solr, Yarn, HDFS, HBase, Impala, Hive, Pig, Sqoop, Oozie, Flume, ZooKeeper, Spark, Scala, Zeppelin

NOSQL Databases: HBase, Cassandra, MongoDB,GraphDB

Programming Languages: C, Python, Scala, Core Java, J2EE (SERVLETS, JSP, JDBC,JAVA BEANS,EJB), C#, ASP.NET

Frameworks: Spring, Hibernate

Documentation: Atlassian-Confluence & JIRA

Repositories: BitBucket

Web Technologies: HTML, CSS, XML, JavaScript, Maven

Scripting Languages: Javascript, UNIX, Python

Databases: Oracle 11g&12c, MS-Access, MySQL, SQL-Server 2000/2005/2008/2012, Teradata

SQL Server Tools: SQL Server Management Studio, Enterprise Manager, Query Analyzer, Profiler, Export & Import (DTS).

IDE: Eclipse, Visual Studio, IDLE, IntelliJ

Web Services: Restful, SOAP

Tools Methodologies: Bugzilla, QuickTest Pro (QTP) 9.2, Selenium, Quality Center, Test Link, TWS, SPSS, SAS, Documentum, Tableau, Mahout Agile, UML, Design Patterns

PROFESSIONAL EXPERIENCE:

Confidential, Duluth, GA

Hadoop Developer

Responsibilities:

Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS
Worked extensively with Sqoop for importing metadata from Oracle.
Analyzed the SQL scripts and designed the solution to implement using Pyspark.
Implemented Spark using Pyspark and SparkSQL for faster testing and processing of data.
Imported data from AWS S3 and into Spark DataFrames and performed transformations and actions on DataFrames.
Used the JSON and XML SerDe's for serialization and deserialization to load JSON and XML data into Hive tables.
Used HBase to store Audit data, which will be used for future analysis.
Used Spark for interactive queries, Processing of streaming data and integration with popular NoSQL database for huge volume of data.
Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents
Optimization of existing algorithms in Hadoop using Sparkcontext, Spark-SQL, Data Frames and Pair RDD's
Extensively involved in Design phase and delivered Design documents
Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive HBase database and SQOOP
Involved in validating the aggregate table based on the rollup process documented in the data mapping. Developed Hive QL, Spark RDD SQL and automated the flow using shell scripting
Developed MapReduce programs to parse the raw data and store the refined data in tables.
Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Created Hive tables, loaded data and wrote Hive queries that run within the map.
Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
Developed database triggers and procedures to update the real-time cash balances.
Worked closely with the testing team in creating new test cases and also created the use cases for the module before the testing phase.
Coordinated work with DB team, QA team, Business Analysts and Client Reps to complete the client requirements efficiently.
Involved in the performance tuning of PL/SQL statements.
Involved in fetching brands data from social media applications like Facebook, twitter.
Developed and updated social media analytics dashboards on regular basis.
Performed data mining investigations to find new insights related to customers.
Involved in forecast based on the present results and insights derived from data analysis.
Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
Manage and review Hadoop log files.
Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages.
Involved in identification of topics and trends and building context around that brand.
Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.

Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, Oracle 11g & 12c, Core Java Cloudera HDFS, Eclipse.

Confidential, Mobile, Al

Oracle Developer

Responsibilities:

Design and develop Oracle packages, functions, and procedure and Postgres functions.
Participated in the analysis phase, preparing the technical specification, development and testing documents.
Involved in SQL Query tuning and provided tuning recommendations to ERP jobs, time/CPU consuming queries.
Created database objects like tables, views, procedures, packages, and cursor using TOAD, PL SQL developer, SQL developer and SQL navigator.
Ensure data warehouse and data mart designs efficiently support BI and end user
Collaborate with BI teams to create reporting data structures.
Build and maintain Oracle PL/SQL packages and stored procedures (using TOAD and Pl SQL developer), Workflows, and SQR reports as requested by client, for processing and analysis.
Used Ref cursors and Collections for accessing complex data through joining many tables.
Extensively involved in Performing database, SQL's, API's tuning and optimization.
Handled Exception extensively for the ease of debugging and displaying the error messages in the application.
Created numerous simple to complex queries involving self joins, correlated sub-queries, functions, cursors, dynamic SQL.
Developed Oracle partitions, and performance tuning/query tuning.
Experience in creating ETL transformations and jobs using Oracle using dynamic packages and procedures using oracle code.
Responsible for Internet Native Banner Development and Self-Service Banner Programming.
SQL and PL SQL Development
Responsible to create new student,faculty,Administration APIs to enhance registration process,Academic records, Alumni,payment process,to submit course work etc..

Environment: Oracle 10g/11g, Oracle SQL Developer, TOAD, SQL*Loader, FTP, Linux, SQL.

Confidential

Oracle Developer

Responsibilities :

Collected business requirements from team and translated them as technical specs and design docs for development.
Created Stored Procedures, Triggers, Indexes, User defined Functions, Constraints etc. on various database objects to obtain the required results.
Created Complex ETL Packages using Oracle to extract data from staging tables to partitioned tables with incremental load.
Developed, monitored and deployed Oracle packages.
Responsible for Scheduling Jobs, Alerting and Maintaining Oracle packages.
Ensure performance, security, and availability of databases.
Worked on optimizing MYSQL Server performance, monitoring and queries tuning
Responsible for creating MYSQL user Accounts
Responsible to capture long running Queries.

Environment: Oracle 11g, SQL, PL/SQL, TOAD, Oracle Developer, SQL*Loader, SQL*PLUS.

Confidential

Data Analyst

Responsibilities:

Responsible to load data from Excel to database by using SQL loader.
Worked on creating views on sales, expenditure, transport
Responsible to create customized reports in SSRS using SQL Server 2008 R2 with different type of properties like Chart Controls, filters, interactive sorting, SQL Parameters.
Added / Changed Tables, columns, triggers, and T-SQL code. Using Indexes, Views, Trigger and Store Procedures for updating and cleaning the existing data and also the new data which was to be added
Created SSRS Report Model Projects in BI Studio as well as created, modified and managed various report models with multiple model objects, source fields and expressions.
Interacted with business users for gathering requirements for Project Dashboards.
Deployed SSRS Reports in SharePoint Sites for web based access for multiple end users
Designed and developed Matrix and Tabular reports with Cascaded, Parameterized, and drill down, drill through, Custom Groups reports using SSRS.

Environment: SQL Server 2008/2005 Enterprise Edition, SQL BI Suite (SSAS, SSIS, SSRS), VB Script, Enterprise manager, XML, MS PowerPoint, OLAP, OLTP, MOSS 2007, MS Project, MS Access 2008 & Windows Server 2008, Oracle.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Duluth, GA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship