Big Data Developer Resume New York - Hire IT People

SUMMARY

Over 8+ years of IT experience in software analysis, design, development, testing and implementation of Big Data, Hadoop, No SQL technologies in OnPrem and Cloud.
Extensive skillset in cloud - based architecture, cost effective solutioning and performance tuning in Big Data platforms.
Skilled experience in installation, configuration and using Apache Hadoop ecosystems such as MapReduce, Hive, Pig, Sqoop, Flume, Yarn, Spark, Kafka and Oozie.
Strong understanding of Hadoop daemons and Map-Reduce concepts.
Strong experience in importing-exporting data into HDFS format.
Expertise in Java and Scala
Experienced in developing UDFs for Hive using Java.
Worked with Apache Falcon which is a data governance engine that defines, schedules, and monitors data management policies.
Experience in Amazon AWS services such as EMR, EC2, S3, Cloud Formation, RedShift, and Dynamo DB which provides fast and efficient processing of Big Data.
Hands on experience with Hadoop, HDFS, MapReduce and Hadoop Ecosystem (Pig, Hive, Oozie, Flume and HBase).
Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Sparkcore, Spark Streaming and Spark SQL.
Strong understanding and strong knowledge in NoSQL databases like HBase, MongoDB& Cassandra.
Experience in working with Angular 4, Nodejs, Bookshelf, Knex, and Maria DB.
Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores
Good skills in developing reusable solution to maintain proper coding standard across different java project.
Good knowledge on Python Collections, Python Scripting and Multi-Threading.
Written multiple Map Reduce programs in Python for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file EJB, Hibernate, Java WebService, SOAP, REST Services, Java Thread, Java Socket, Java Servlet, JSP, JDBC formats.
Proficient in R Programming Language, Data extraction, Data cleaning, Data Loading, Data Transformation, and Data visualization.
Used Pandas, Numpy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression.
Expertise in debugging and optimizing Oracle and java performance tuning with strong knowledge in Oracle 11g and SQL
Ability to work effectively in cross-functional team environments and experience of providing training to business users.
Good experience in using Sqoop for traditional RDBMS data pull.
Worked with Apache Ranger console to create and manage policies for access to files, folders, databases, tables, or columns.
Worked with YarnQueue Manager to allocate queue capacities for different service accounts.
Hands on experience on Hortonworks and ClouderaHadoop environments.
Familiar with handling complex data processing jobs using Cascading.
Strong database skills in IBM- DB2, Oracle and Proficient in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors.
Extensive experience in Shell scripting.
Leading the testing efforts in support of projects/programs across a large landscape of technologies ( Unix, Angular JS, AWS, sauseLABS, Cucumber JVM, Mongo DB, GITHub, SQL, NoSQL database, API, Java, Jenkins)
Testing automation by using Cucumber JVM to develop a world class ATDD process.
Setup JDBC connection for database testing using cucumber framework.
Experience in component design using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
Expertise in installation, configuration, supporting and managing HadoopClusters using Apache, Cloudera (CDH3, CDH4) distributions, Hortonworks and on Amazonweb services (AWS).
Excellent analytical and programming abilities in using technology to create flexible and maintainable solutions for complex development problems.
Good communication and presentation skills, willing to learn, adapt to new technologies and third party products.

TECHNICAL SKILLS

Languages/Tools: Python, Scala, Java, C, C++, XML, HTML/XHTML, HDML, DHTML.

Big Data: HDFS, MapReduce, HIVE, PIG, HBase, SQOOP, Oozie, Zookeeper, Spark, Mahout, Kafka, Storm, Cassandra, Solr, Impala,Greenplum, MongoDB

Web/Distributed Technologies: J2EE, Servlets 2.1/2.2, JSP 2.0, Struts 1.1, Hibernate 3.0, JSF, JSTL1.1,EJB 1.1/2.0, RMI,JNI, XML,JAXP,XSL,XSLT, UML, MVC,STRUTS,Spring 2.0, Corba, Java Threads.

Browser Languages/Scripting: HTML, XHTML, CSS, XML, XSL, XSD, XSLT, Java script, HTML DOM, DHTML, AJAX.

App/Web Servers: IBM Websphere … BEA Web logic 5.1/7.0, Jdeveloper, Apache Tomcat, JBoss.

GUI Environment: Swing, AWT, Applets.

Messaging & Web Services Technology: SOAP, WSDL,UDDI, XML, SOA, JAX-RPC, IBM WebSphere MQ v5.3, JMS.

Testing &Case Tools: Junit, Log4j, Rational Clear case, CVS, ANT, Maven, JBuilder.

Configuration Management: Chef, Puppet, Ansible, Docker.

Build Tools: CVS, Subversion, GIT, Ant, Maven, Gradle, Hudson, TeamCity, Jenkins, Chef, Puppet, Ansible, Docker.

CI Tools: Jenkins, Bamboo

Scripting Languages: Python, Shell (Bash), Perl, PowerShell, Ruby, Groovy, PowerShell.

Monitoring Tools: Nagios, Cloud Watch, JIIRA, Bugzilla and Remedy.

Databases: NO SQL Oracle, MS SQL Server 2000, DB2, MS Access &MySQL.Teradata, Cassandra, Greenplum and MongoDB

Operating systems: Windows, Solaris, Unix, Linux (Red Hat 5.x, 6.x, 7.x'SUSELinux 10), Sun Solaris, Ubuntu, CentOS.

PROFESSIONAL EXPERIENCE

Confidential, New York

Big Data Developer

Responsibilities:

As an AWS Big Data/Hadoop Developer worked on Hadoop eco-systems including Hive, MongoDB, Zookeeper, Spark Streaming with MapR distribution.
Developed Apache Spark related jobs in AWS Glue, DataPipeline and ML models in SageMaker.
Developed Big Data solutions focused on pattern matching and predictive modelling.
Involved in Agile methodologies, daily scrum meetings, sprint planning
Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing
Worked on MongoDB, HBase databases which differ from classic relational databases
Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
Maintain and work with our data pipeline that transfers and processes several terabytes of data using Spark, Scala, Python, Apache Kafka, Pig/Hive & Impala
Integrated Kafka-Spark streaming for high efficiency throughput and reliability
Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
Leading large scale Big data projects from inception to completion using Python, Scala,Spark and Hive.
Worked in tuning Hive&Pig to improve performance and solved performance issues in both scripts
Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
Developed Nifi flows dealing with various kinds of data formats such as XML, JSON, and Avro.
Developed and designed data integration and migration solutions in Azure.
Worked on Proof of concept with Spark with Scala and Kafka.
Worked on visualizing the aggregated datasets in Tableau.
Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP.
Implemented Map Reduce jobs in HIVE by querying the available data.
Configured HiveMeta store with MySQL, which stores the metadata for Hive tables.
Performance tuning of Hive queries, Map Reduce programs for different applications.
Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
Involved in identifying job dependencies to design workflow for Oozie&YARN resource management.
Designed solution for various system components using Microsoft Azure.
Worked on data ingestion using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
Used Cloudera Manager for installation and management of Hadoop Cluster.
Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest customer behavioural data and financial histories into HDFS for analysis.
Collaborated with business users/product owners/developers to contribute to the analysis of functional requirements.
Primarily involved in Data Migration process using Azure by integrating with Github repository and Jenkins.
Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams.
Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
Worked on analysing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.

Environment: Agile, Hadoop, Pig, HBase, Sqoop, Azure, Hive, HDFS, NoSQL, Impala, YARN, PL/SQL, Nifi, XML, JSON, Avro, Spark Kafka, Tableau, MySQL, Apache Flume.

Confidential, Virginia

Big Data Developer

Responsibilities:

Implementation of Big Data eco system (Hive, Impala, Sqoop, Flume, Spark, Lambda) with Cloud.
Developed Big Data solutions focused on pattern matching and predictive modeling. utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive
Experience in AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
Worked as a Hadoop consultant on (Map Reduce/Pig/HIVE/Sqoop)
Configured Performance Tuning and Monitoring for CassandraRead and Write processes for fast I/O operations and low latency time.
Worked using ApacheHadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce.
Worked on analyzing Hadoop cluster and different Big Data Components including Pig, Hive, Spark,HBase, Kafka, Elastic Search, database and SQOOP.
Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
Troubleshooting, debugging & altering Talend issues, while maintaining the health and performance of the ETL environment.
Performed data profiling and transformation on the raw data using Pig, Python.
Experienced with batch processing of data sources using Apache Spark.
Developing predictive analytic using Apache Spark Scala APIs.
Created Hive External tables and loaded the data into tables and query data using HQL.
Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
Creating dashboard on Tableu and Elasticsearch with Kibana.
Implement enterprise grade platform (mark logic) for ETL from mainframe to NOSQL (cassandra)
Experience on BI reporting with At ScaleOLAP for Big Data.
Responsible for importing log files from various sources into HDFS using Flume
Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs.
Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
Loading data from different source (database & files) into Hive using Talend tool.
Extracted the data from MySQL, AWS RedShift into HDFS using Sqoop.
Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
Used Spark SQL to process the huge amount of structured data.
Implemented SparkGraphX application to analyze guest behavior for data science segments.
Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.

Environment: HDFS, Map Reduce, Hive, Sqoop, Pig, Flume, Vertica, Oozie Scheduler, Java, Shell Scripts, Teradata, Oracle, HBase, MongoDB, Cassandra, Cloudera, AWS, Javascript, JSP, Kafka, Spark, Scala and ETL, Python.

Confidential -Calverton, MD

Big Data developer

Responsibilities:

Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its performance over MR jobs.
Built pipelines to move hashed and un-hashed data from Azure Blob to Data lake.
Utilized Azure HDInsight to monitor and manage the Hadoop Cluster.
Collaborated on insights with Data Scientists, Business Analysts and Partners.
Performed advanced procedure like text analytics and processing, using the in-memory computing capabilities of Spark using Python.
Created pipelines to move data from on-premise servers to Azure Data Lake.
Utilized Python Panda Frame to provide data analysis.
Enhanced and optimized Spark scripts to aggregate, group and run data mining tasks.
Loaded the data into Spark RDD and do in memory data Computation to generate the output response.
Involved in converting Hive/SQL queries into Spark Transformations using Spark RDD’s and PySpark.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frames and Pair RDD’s.
Used Spark API over Hadoop Yarn to perform analytics on data and monitor scheduling.
Implemented schema extraction for Parquet and Avro file formats.
Experienced in performance tuning of Spark Applications for setting right Batch Interval Time, correct level of Parallelism and memory tuning.
Developed Hive queries to process the data and generate the data cubes for visualization.
Built specific functions to ingest columns into Schemas for Spark Applications.
Experienced in handling large data sets using Partitions, Spark in memory capabilities, effective and efficient Joins, Transformations and other during ingestion process itself.
Developed data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional data sources for data access and analysis.
Analyzed SQL scripts and designed the solution to implement using PySpark.
Used reporting tools like Power BI for generating data reports daily.
Handled several techno-functional responsibilities including estimates, identifying functional and technical gaps, requirements gathering, designing solutions, development, developing documentation and production support.

Environment: Hadoop(HDFS/Azure HDInsight), HIVE, YARN, Python/Spark, Linux, MS SQL Server, Power BI.

Confidential - Bronx, NY

Hadoop developer

Responsibilities:

Executed Hive queries that helped in analysis of market trends by comparing the new data with EDW reference tables and historical data.
Managed and reviewed Hadoop log files job tracker, NameNode, secondary NameNode, data node, and task tracker.
Tested raw market data and executed performance scripts on data to reduce the runtime.
Involved in loading the created Files into HBase for faster access of large sets of customer data without affecting the performance.
Importing and exporting the data from HDFS to RDBMS using Sqoop and Kafka.
Executed the Hive jobs to parse the logs and structure them in relational format to provide effective queries on the log data.
Created Hive tables (Internal/external) for loading data and have written queries that will run internally in MapReduce and queries to process the data.
Developed PigScripts for capturing data change and record processing between new data and already existed data in HDFS.
Creating scalable perform ant machine learning applications using the Mahout.
Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
Involved in importing of data from different data sources, and performed various queries using Hive,MapReduce, and PigLatin.
Involved in loading data from local file system to HDFS using HDFSShell commands.
Experience on UNIXshellscripts for process and loading data from various interfaces to HDFS.
Develop different components of Hadoop ecosystem system process that involves Map Reduce, and Hive.

Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Big Data, Java, Flume, Kafka, Yarn, HBase, Kafka Oozie, Java, SQL scripting, Linux shell scripting, Mahout, Eclipse and Cloudera.

Confidential

Hadoop developer

Responsibilities:

Worked on Distributed/Cloud Computing (Map Reduce/Hadoop, Hive, Pig, HBase, Sqoop, Spark AVRO, Zookeeper etc.), Cloudera distributed Hadoop (CDH4)
Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and processing.
Involved in installing Hadoop Ecosystem components.
Importing and exporting data into HDFS, Pig, Hive and HBase using SQOOP.
Responsible to manage data coming from different sources.
Flume and from relational database management systems using SQOOP.
Responsible to manage data coming from different data sources.
Involved in gathering the requirements, designing, development and testing.
Worked on loading and transformation of large sets of structured, semi structured data into Hadoop system.
Developed simple and complex Map Reduce programs in Java for Data Analysis.
Load data from various data sources into HDFS using Flume.
Developed the Pig UDF'S to pre-process the data for analysis.
Worked on Hue interface for querying the data.
Created Hive tables to store the processed results in a tabular format.
Developed Hive Scripts for implementing dynamic Partitions.
Developed Pig scripts for data analysis and extended its functionality by developing custom UDF's.
Extensive knowledge on PIG scripts using bags and tuples.
Experience in managing and reviewing Hadoop log files.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Exported analyzed data to relational databases using SQOOP for visualization to generate reports for the BI team.

Environment: Hadoop (CDH4), UNIX, Eclipse, HDFS, Java, MapReduce, Apache Pig, Hive, HBase, Oozie, SQOOP and MySQL.

Confidential

SQL Programmer

Responsibilities:

Actively involved in Normalization & De-normalization of database.
Installed and Configured SQL Server 2000 on servers for designing and testing.
Designed DDL and DML for MS SQL Server 2000/2005.
Using SQL Server Integration Services (SSIS) to populate data from various data sources
Developed web based front-end screens using MS FrontPage, HTML and Java Script.
Actively designed the database to fasten certain daily jobs, stored procedures.
Optimized query performance by creating indexes.
Involved in writing SQL batch scripts.
Created scripts for tables, stored procedures, and DTS and SSIS,
Involved in merging existing databases and designed new data models to meet the requirements.
Create joins and sub-queries for complex queries involving multiple tables.
Used DML for writing triggers, stored procedures, and data manipulation.
Taking Database full Backup, Transaction log backup & differential backup in daily Routine
Monitor production Server Activity
Worked with DTS packages to load the massaged data into Datawarehousing system
Tuned the SQL queries using SQL profiler and involved in tuning the database
Very pro actively identifying the problems before user complaints

Environment: Windows 2003 server, SQL Server 2000/2005, SSIS, FrontPage, IIS 5.0

We provide IT Staff Augmentation Services!

Big Data Developer Resume

New, YorK

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship